Saturday, October 17, 2020

Posh Publishing and why Recognition and Reward must be without

File:Posh.jpg from WikiCommons.
Earlier this week I read an article in Nature about the high APC being a problem. Yes, the publishing system is very expensive, but we also know that some publisher increase(d) the APC based on demand. Yes, the publishers do not differentiate prices between regions. Yes, the waiver concept, just by the name, is a problem. Publishing in high-impact journals is posh publishing (there is no evidence your article actually becomes more scientifically sound).

The posh publishing is a direct result of human behavior. We like posh. We learn to dress posh, to act posh. This is strongly embedded on Western culture. It's gender independent. We all like posh. The posh-fetish goes deep. Very deep.

Why do we want to be posh. Well, that answer is given in the comment: Why does a high-impact publication matter so much for a career in research? As long as we keep seeing posh as better for your career, plenty of people will be more than happy to pay for it. We're all humans. We're suckers for pain.

Therefore, in the VSNU Recognition and Reward we must compensate for this human behavior. Is that weird? Not at all. Many academic habits are important to overcome human nature. For example, we have to force ourselves to overcome our flaws. What can we do?

  1. recognize and reward all research output (software, data, standards, policies, advises, grant deliverables, open standards)
  2. recognize and reward particularly activities that focus on removing the posh from the science
  3. learn to recognize your flaws, your biases, your poshness
Happy saturday!

Saturday, August 29, 2020

What is wrong with FAIR today.

Image of kevlar that has nothing to do with this blog post,
except that it is Openly licensed. Source: cacyle 2005, GFDL.

In the past year, we have been working in the NanoSafety Cluster on FAIR research output (for our group, via NanoCommons, RiskGONE, NanoSolveIT, collaborating with other projects, such as ACENano and particularly Gov4Nano), analyzing resources, deciding where the next steps are. Of course, in the context of GO FAIR (e.g. via the Chemistry Implementation Network), ELIXIR, RDA, EOSC, etc.

But there seems to be something going wrong. For example, adoption by some Open Science communities making FAIR the highest priority (but formally FAIR != Open; arguably it should be: O'FAIR), but also in the strong positioning of the data steward that will make research data FAIR. I never felt too comfortable, but we're about to submit an article discussing this. I like to stress, this is not about how to interpret the guidance. The original paper defines these pretty well, and the recent interpretations and implementation considerations give a lot of context.

What is wrong with FAIR today is that we are loosing focus. The real aim is reuse of data. Therefore, FAIR data without an Open license is silly. Therefore, data that cannot be found does not help. Therefore, we want clear access, allowing us to explain our Methods sections properly. Therefore, interoperability, because data that cannot be understood (in enough detail) is useless.

On the Data stewardship

So, when EOSC presents essential skills, I find it worrying that data stewardship is separated from research. I vehemently disagree with that separation. Data stewardship is a core activity of doing research and something is seriously wrong if that is left to others. It will be p-hacking discussions for the next 100 years.

On the Open license manager

A second major problem is one important missing skill: an open license manager. For this one, I'm perfectly fine leaving that to specialists. The license, after all, does not effect the research. But not having this explicitly in the diagram violates our Open Science ideas (e.g. inclusiveness, collaboration, etc).

Not having open license as core of Open Science is just development of more paywalled Open Science. Look, there is time and place for closed data, but that it totally irrelevant. Bringing up that argument is a fallacy. (If you are a scholar and disagree, you just created an argument that open license management should be a core task of a researcher.)

Computable figures. eLife's Executable Research Article

When I did my PhD and wrote my articles, Ron Wehrens introduced me to R as open source alternative to MatLab which was the standard in the research group otherwise. At some point, I got tired of making new plots, and I started saving the R code to make the plot. I still have this under version control (not public; will be release 70 years after my death; I mean, that's still the industry standard at this moment </sarcasm>).

Anyway, I'm delighted that the published behind eLife keeps on innovating and introduced their Executable Research Article. The idea live figures still excited me very much and you can find many examples of that in my blog, and we actively use it in our Scholia project (full proposal). In fact, I still teach Maastricht University students this too, in the Maastricht Science Programme PRA3006 course.

I really wish we had something like this at BMC too, because I'm sure a good number of Journal of Cheminformatics authors would be excited with such functionality. This is their workflow:

Workflow of publishing an ERA in eLife. Image license: CC-BY, source.

One of the tools they mention is Stencila which I really need to look at in detail. It is the kind of Open Science infrastructure that universities should embrace. I'm also excited to see that citation.js is mentioned in the source code, one of the projects Lars Willighagen has been working on, see this publication.

Monday, August 17, 2020

Research line: Interactions and effects of nanomaterials and living matter


Because it is hard to get funded
for interdisciplinary work, in a domain
that does not regular publish in glossy
journals, I found my funding not with
national, but with the European
As a multidisciplinary researcher I had to wait long to become an expert. Effectively, you have to be expert in more than one field. An, as I am experiencing right now, staying expert is a whole other dimension. Bit with a solid chemistry, computing science, cheminformatics, and chemometrics education, I found myself versatile that I at some point landed that grant proposal (the first few failed). Though, I have had my share of travel grants and Google Summer of Code projects (microfunding).

So, while I am trying to establish a research line in metabolomics (also with some microfunding) and one PhD candidate (own funding), my main research line is nanosafety. Because my background fits in well, and while data quality for predictive toxicology leaves to be desired, there is a lot of work we can do here, to make the most of what is being measured.

Indeed, there are many interesting biological, chemical, and chemo-/bioinformatics research questions here (just to name a few):

  • does the mechanism of cell entry differ for different engineered nanomaterials?
  • does it differ from how "natural" nanomaterials enter the cell?
  • does the chemical composition of the nanomaterial change when it comes into contact with living matter? (yes, but how? is it stable?)
  • how do we represent the life cycle of nanomaterials in a meaning full way?
  • does each cell type respond in the same way to the same material? is this predominantly defined by the cell's epigenetics or by the chemical nature of the material?
  • given the sparseness of physicochemical and biological characterization of nanomaterials, what is the most appropriate representation of a material: based on physicochemical description, ontological description, or chemical graph theory?
  • can ontologies help us group data from different studies to give an over view of the whole process from molecular initiating event to adverse outcome?
  • can these insights be used to reliably and transparently advice the European people about risk?
We try to define answers to these questions in a series of FP7/H2020 projects using an Open Science approach, allowing our analyses to be updated frequently when new data or new knowledge comes in. These are the funded projects for which I am (was) PI:
  • eNanoMapper (EC FP7, ended, but since this project developed Open solutions, it is reused a lot)
  • NanoCommons (EC H2020, our work focussing on continuing the common ontology)
  • RiskGONE (EC H2020, focusing on reach regulatory advising based on scientific facts)
  • NanoSolveIT (EC H2020, computational nanosafety)
  • Sbd4Nano (EC H2020, disseminating nanosafety research to the EU industry)
In these projects (two PhD candidates, one postdoc), open science has been important to what we do. And while not all partners in all projects use Open Science approaches, what our groups does tries to be as Open as possible. Some open science projects involved:
If we want to read more about these projects in the scientific literature, check the websites which often have a page with publications and deliverables. Or check my Google Scholar profile. And for an overview of our group, see this page.

Friday, July 31, 2020

New Editorial: "Adoption of the Citation Typing Ontology by the Journal of Cheminformatics"

My first blog post about the Citation Typing Ontology was already more than 10 years go. I have been fascinated with finally being able to add some semantics to why we cite a certain article. For years, I had been tracking why people were citing the Chemistry Development Kit articles. Some were citing the article because the Chemistry Development Kit was an important thing to mention, while other articles cited it because they actually used the Chemistry Development Kit. I also started using CiTO predicates in RDF models, and you might find them in various ongoing semantic web projects.

Unfortunately, scholarly publishers did not show much interest. One project that did, was CiteULike. I had posted it as a feature request and it was picked up by CiteULike, something I am still grateful for. CiteULike also no longer exists, but I had a lot of fun with it while it existed:
  1. CiteULike CiTO Use Case #1: Wordles
  2. CiTO / CiteULike: publishing innovation
But I like to also stress it has more serious roles in our scientific dissemination workflow:
  1. "What You're Doing Is Rather Desperate"
So, I am delighted that we are now starting a pilot with the Journal of Cheminformatics to use CiTO annotation at the journal side. You can read it in this new editorial.

It is a first step of a second attempt to get CiTO of the ground. Had CiteULike still existed, this would have been a wonderful mashup, but Wikidata might be a good alternative. In fact, I already trialed a data model and developed several SPARQL queries. Support in Scholia is a next step on this front.

Now, citation networks in general have received a lot of attention. And with projects like OpenCitations we increasingly have access to this information. That allows visualisation, for example, with Scholia, here for the 2010 paper:

More soon!

For now, if you like to see the CiTO community growing too, please tweet, blog, message your peers about our new editorial:

Willighagen, E. Adoption of the Citation Typing Ontology by the Journal of CheminformaticsJ Cheminform 12, 47 (2020).