Saturday, August 29, 2020

What is wrong with FAIR today.

Image of kevlar that has nothing to do with this blog post,
except that it is Openly licensed. Source: cacyle 2005, GFDL.

In the past year, we have been working in the NanoSafety Cluster on FAIR research output (for our group, via NanoCommons, RiskGONE, NanoSolveIT, collaborating with other projects, such as ACENano and particularly Gov4Nano), analyzing resources, deciding where the next steps are. Of course, in the context of GO FAIR (e.g. via the Chemistry Implementation Network), ELIXIR, RDA, EOSC, etc.

But there seems to be something going wrong. For example, adoption by some Open Science communities making FAIR the highest priority (but formally FAIR != Open; arguably it should be: O'FAIR), but also in the strong positioning of the data steward that will make research data FAIR. I never felt too comfortable, but we're about to submit an article discussing this. I like to stress, this is not about how to interpret the guidance. The original paper defines these pretty well, and the recent interpretations and implementation considerations give a lot of context.

What is wrong with FAIR today is that we are loosing focus. The real aim is reuse of data. Therefore, FAIR data without an Open license is silly. Therefore, data that cannot be found does not help. Therefore, we want clear access, allowing us to explain our Methods sections properly. Therefore, interoperability, because data that cannot be understood (in enough detail) is useless.

On the Data stewardship

So, when EOSC presents essential skills, I find it worrying that data stewardship is separated from research. I vehemently disagree with that separation. Data stewardship is a core activity of doing research and something is seriously wrong if that is left to others. It will be p-hacking discussions for the next 100 years.

On the Open license manager

A second major problem is one important missing skill: an open license manager. For this one, I'm perfectly fine leaving that to specialists. The license, after all, does not effect the research. But not having this explicitly in the diagram violates our Open Science ideas (e.g. inclusiveness, collaboration, etc).

Not having open license as core of Open Science is just development of more paywalled Open Science. Look, there is time and place for closed data, but that it totally irrelevant. Bringing up that argument is a fallacy. (If you are a scholar and disagree, you just created an argument that open license management should be a core task of a researcher.)

Computable figures. eLife's Executable Research Article

When I did my PhD and wrote my articles, Ron Wehrens introduced me to R as open source alternative to MatLab which was the standard in the research group otherwise. At some point, I got tired of making new plots, and I started saving the R code to make the plot. I still have this under version control (not public; will be release 70 years after my death; I mean, that's still the industry standard at this moment </sarcasm>).

Anyway, I'm delighted that the published behind eLife keeps on innovating and introduced their Executable Research Article. The idea live figures still excited me very much and you can find many examples of that in my blog, and we actively use it in our Scholia project (full proposal). In fact, I still teach Maastricht University students this too, in the Maastricht Science Programme PRA3006 course.

I really wish we had something like this at BMC too, because I'm sure a good number of Journal of Cheminformatics authors would be excited with such functionality. This is their workflow:

Workflow of publishing an ERA in eLife. Image license: CC-BY, source.

One of the tools they mention is Stencila which I really need to look at in detail. It is the kind of Open Science infrastructure that universities should embrace. I'm also excited to see that citation.js is mentioned in the source code, one of the projects Lars Willighagen has been working on, see this publication.

Monday, August 17, 2020

Research line: Interactions and effects of nanomaterials and living matter


Because it is hard to get funded
for interdisciplinary work, in a domain
that does not regular publish in glossy
journals, I found my funding not with
national, but with the European
As a multidisciplinary researcher I had to wait long to become an expert. Effectively, you have to be expert in more than one field. An, as I am experiencing right now, staying expert is a whole other dimension. Bit with a solid chemistry, computing science, cheminformatics, and chemometrics education, I found myself versatile that I at some point landed that grant proposal (the first few failed). Though, I have had my share of travel grants and Google Summer of Code projects (microfunding).

So, while I am trying to establish a research line in metabolomics (also with some microfunding) and one PhD candidate (own funding), my main research line is nanosafety. Because my background fits in well, and while data quality for predictive toxicology leaves to be desired, there is a lot of work we can do here, to make the most of what is being measured.

Indeed, there are many interesting biological, chemical, and chemo-/bioinformatics research questions here (just to name a few):

  • does the mechanism of cell entry differ for different engineered nanomaterials?
  • does it differ from how "natural" nanomaterials enter the cell?
  • does the chemical composition of the nanomaterial change when it comes into contact with living matter? (yes, but how? is it stable?)
  • how do we represent the life cycle of nanomaterials in a meaning full way?
  • does each cell type respond in the same way to the same material? is this predominantly defined by the cell's epigenetics or by the chemical nature of the material?
  • given the sparseness of physicochemical and biological characterization of nanomaterials, what is the most appropriate representation of a material: based on physicochemical description, ontological description, or chemical graph theory?
  • can ontologies help us group data from different studies to give an over view of the whole process from molecular initiating event to adverse outcome?
  • can these insights be used to reliably and transparently advice the European people about risk?
We try to define answers to these questions in a series of FP7/H2020 projects using an Open Science approach, allowing our analyses to be updated frequently when new data or new knowledge comes in. These are the funded projects for which I am (was) PI:
  • eNanoMapper (EC FP7, ended, but since this project developed Open solutions, it is reused a lot)
  • NanoCommons (EC H2020, our work focussing on continuing the common ontology)
  • RiskGONE (EC H2020, focusing on reach regulatory advising based on scientific facts)
  • NanoSolveIT (EC H2020, computational nanosafety)
  • Sbd4Nano (EC H2020, disseminating nanosafety research to the EU industry)
In these projects (two PhD candidates, one postdoc), open science has been important to what we do. And while not all partners in all projects use Open Science approaches, what our groups does tries to be as Open as possible. Some open science projects involved:
If we want to read more about these projects in the scientific literature, check the websites which often have a page with publications and deliverables. Or check my Google Scholar profile. And for an overview of our group, see this page.