Saturday, October 24, 2020

new paper: "A Semi-Automated Workflow for FAIR Maturity Indicators in the Life Sciences"

Figure 1 from the Nanomaterials article.
In a collaboration orchestrated via the NanoSafety Cluster (NSC) Work Group F on Data Management (WGF is formerly known as WG4) between a few H2020 projects (NanoCommons, Gov4Nano, RiskGONE, and NanoSolveIT), we just published an article about the using of Jupyter Notebooks to assess how FAIR (see doi:10.1162/dint_r_00024) several databases are.

It is important to realize this is not to judge these database, but to provide them with a map of how they can make there databases more FAIR. After all, the notebook explains in detail how the level of FAIR was assessed and what they database can do become from "mature". This is what the maturity indicators are about. In doing so, we also discovered that existing sets of maturity indicators do not always benefit the community, often because they are currently focusing more the F and the A, than the I and the R (see What is wrong with FAIR today.). Another really neat feature is the visual representation of the map, proposed by Serena (now at Transparent Musculoskeletal Research) in this paper (shown on the top right).

I like to thank everyone involved in the project, the NSC projects involved in the discussions (Joris Quik, Martine Bakker, Dieter Maier, Iseult Lynch), Serena for starting this work (see this preprint), Laurent for reviewing the notebook, and Ammar and Jeaphianne for their hard work on finishing the paper into a this, now published, revision (the original paper was rejected).

Navigating the academic system. Or, will I face an evolution or a revolution?

Image from the time of the Batavian Revolution.
Img: Rijksmuseum, public domain.
When I started this blog, I was still a PhD student myself. Now, I am assistant professor ("universitair docent" in Dutch), tenured, and I've acquired funding to fund others researchers. I have not got used to this yet. I don't like this hierarchical system, but am forced to get used to this. So, when I write "new paper", this paper is no longer "mine". I'm more like a sponsor, attempting to give the researchers that work "for" me advice but also the opportunity to do their own thing.

Of course, this is complicated when the funding comes from grants, where the person I start collaborating with, is greatly reduced in their academic freedom. Talking about navigation: balancing scientific impact and project deliverables requires a good amount of flexibility and creativity.

There is a lot happening. The system is broken. Basically, there is no upper limit and as long as selection is based on volume and not quality (#blasphemy), more is better. So, what do I tell and teach the people working on my grants, those that are under my supervision? Do I teach them how to succeed in the academic system, or do I teach them how to do good research? Ideally, those would be the same. But it is not. The system is broken.

The example of this is well-known. Publishing in journals with an high impact factor. That has for at least two decades seen as a measure of quality and has therefore long been used in assessments, at many levels. Of course, the quality of any paper is not a function of the place where it gets published. It can be argued that research is done better for higher impact journals, but I still need to see the data that confirms that. But there is a far more worrying thing, that exactly proves my point that volume is not the same as quality: apparently, it is acceptable to submit inferior work to journals (with a lower impact factor). The system is broken.

The system is breaking down. People are looking for solutions. Some solutions are questionable. Some good solutions are not taken, because of being problematic in the short term. But one way or another, the system is broken and must be fixed. I hope it can go via evolution, but when then establishment is fighting the evolution, a revolution may be the only option left.

Here are some possible evolutions and some revolutions people are talking about.


  1. all research output will get recognized and rewarded
  2. people will stop using fake measures of success like the journal impact factor
  3. journal articles actually in detail describe the experiment (reproducibility)
  1. the journal will stop to exist and the venue will be the research output itself
  2. we start recognizing rewarding research output instead of researchers
  3. research no longer is a competition for funding


Now, where will either happen? In the whole discussion about, for example, cOAlition S people are hiding behind "but then ..., because in X ...". For example, they argue that careers are damaged if they do not publish in high impact journals. This argument values researchers over research (output). As if this person would not find a job without that article. As if the research itself has no value. Sadly, this is partly true. So, folliwing this reasoning, doing a PhD in The Netherlands damages your career too. You can better do it in the USA (or Oxford, Cambridge) in the first place. 

So, what better place to give this evolution or revolution shape, then in a country like the Netherlands, where the research quality (at least on average) is high and only volume is lacking to compete with the Harvards of the world. Without the billions in extra funding, that volume is not going to happen. We managed to do that, by working 60 hours/week instead of 40 hours/week. Adding another 20 hours a week is not going to happen (not without deaths; if you think that's an exaggeration, just check the facts, plz).

Fortunately, The Netherlands has a good track records with revolutions. A good track record in highly impactful research. The country has shown how strong our research is, with the National Plan Open Science, strong involvement in cOAlition S, evolution is very clearly tried. Not all solution equally well, and also with strong opposition from some. Some years ago, some believed it inconceivable that Nature would start allowing Open Access, but the change is coming. Willingly? Well, with the APC they are going to charge, I cannot really say they do so willingly. Evolution, not revolution.

The real evolution/revolution we need, however, is not about open access. It is about fair and open science.

Saturday, October 17, 2020

Posh Publishing and why Recognition and Reward must be without

File:Posh.jpg from WikiCommons.
Earlier this week I read an article in Nature about the high APC being a problem. Yes, the publishing system is very expensive, but we also know that some publisher increase(d) the APC based on demand. Yes, the publishers do not differentiate prices between regions. Yes, the waiver concept, just by the name, is a problem. Publishing in high-impact journals is posh publishing (there is no evidence your article actually becomes more scientifically sound).

The posh publishing is a direct result of human behavior. We like posh. We learn to dress posh, to act posh. This is strongly embedded on Western culture. It's gender independent. We all like posh. The posh-fetish goes deep. Very deep.

Why do we want to be posh. Well, that answer is given in the comment: Why does a high-impact publication matter so much for a career in research? As long as we keep seeing posh as better for your career, plenty of people will be more than happy to pay for it. We're all humans. We're suckers for pain.

Therefore, in the VSNU Recognition and Reward we must compensate for this human behavior. Is that weird? Not at all. Many academic habits are important to overcome human nature. For example, we have to force ourselves to overcome our flaws. What can we do?

  1. recognize and reward all research output (software, data, standards, policies, advises, grant deliverables, open standards)
  2. recognize and reward particularly activities that focus on removing the posh from the science
  3. learn to recognize your flaws, your biases, your poshness
Happy saturday!

Saturday, August 29, 2020

What is wrong with FAIR today.

Image of kevlar that has nothing to do with this blog post,
except that it is Openly licensed. Source: cacyle 2005, GFDL.

In the past year, we have been working in the NanoSafety Cluster on FAIR research output (for our group, via NanoCommons, RiskGONE, NanoSolveIT, collaborating with other projects, such as ACENano and particularly Gov4Nano), analyzing resources, deciding where the next steps are. Of course, in the context of GO FAIR (e.g. via the Chemistry Implementation Network), ELIXIR, RDA, EOSC, etc.

But there seems to be something going wrong. For example, adoption by some Open Science communities making FAIR the highest priority (but formally FAIR != Open; arguably it should be: O'FAIR), but also in the strong positioning of the data steward that will make research data FAIR. I never felt too comfortable, but we're about to submit an article discussing this. I like to stress, this is not about how to interpret the guidance. The original paper defines these pretty well, and the recent interpretations and implementation considerations give a lot of context.

What is wrong with FAIR today is that we are loosing focus. The real aim is reuse of data. Therefore, FAIR data without an Open license is silly. Therefore, data that cannot be found does not help. Therefore, we want clear access, allowing us to explain our Methods sections properly. Therefore, interoperability, because data that cannot be understood (in enough detail) is useless.

On the Data stewardship

So, when EOSC presents essential skills, I find it worrying that data stewardship is separated from research. I vehemently disagree with that separation. Data stewardship is a core activity of doing research and something is seriously wrong if that is left to others. It will be p-hacking discussions for the next 100 years.

On the Open license manager

A second major problem is one important missing skill: an open license manager. For this one, I'm perfectly fine leaving that to specialists. The license, after all, does not effect the research. But not having this explicitly in the diagram violates our Open Science ideas (e.g. inclusiveness, collaboration, etc).

Not having open license as core of Open Science is just development of more paywalled Open Science. Look, there is time and place for closed data, but that it totally irrelevant. Bringing up that argument is a fallacy. (If you are a scholar and disagree, you just created an argument that open license management should be a core task of a researcher.)

Computable figures. eLife's Executable Research Article

When I did my PhD and wrote my articles, Ron Wehrens introduced me to R as open source alternative to MatLab which was the standard in the research group otherwise. At some point, I got tired of making new plots, and I started saving the R code to make the plot. I still have this under version control (not public; will be release 70 years after my death; I mean, that's still the industry standard at this moment </sarcasm>).

Anyway, I'm delighted that the published behind eLife keeps on innovating and introduced their Executable Research Article. The idea live figures still excited me very much and you can find many examples of that in my blog, and we actively use it in our Scholia project (full proposal). In fact, I still teach Maastricht University students this too, in the Maastricht Science Programme PRA3006 course.

I really wish we had something like this at BMC too, because I'm sure a good number of Journal of Cheminformatics authors would be excited with such functionality. This is their workflow:

Workflow of publishing an ERA in eLife. Image license: CC-BY, source.

One of the tools they mention is Stencila which I really need to look at in detail. It is the kind of Open Science infrastructure that universities should embrace. I'm also excited to see that citation.js is mentioned in the source code, one of the projects Lars Willighagen has been working on, see this publication.