Sunday, November 13, 2016

OpenTox Euro 2016: "Data integration with identifiers and ontologies"

Results from a project by MSP students.
J. Windsor et al. (2016): Volatile Organic Compounds:
A Detailed Account of Identity, Origin,
Activity and Pathways
. Figshare.
A few weeks ago OpenTox Euro 2016 meeting was held in Rheinfelden at the German/Swiss border (which allowed me a nice stroll across the Rhine into Switzerland and by a nice x-mas countdown clock. The meeting was co-located with eNanoMapper-hosted meetings, where we discussed, among other things the nanoinformatics roadmaps, that outline where research in this area should go to.

There were many interesting talks, around various data initiatives, adverse outcome pathways (AOPs) and their links to molecular initiating events (MIEs), and ontologies (like the AOP ontology talk by ). In fact, I quite enjoyed the discussion with Chris Grulke about ontologies during the panel discussion. Central was, where is the border between data and ontological concepts. Some slides are available via Lanyrd.

During the Emerging Methods and Practice session hosted by Ola Spjuth, I presented the work at the BiGCaT department into identifier mapping and the use of ontologies for linking data sets.

The presentation integrates a lot of things I have been working on in the last few years, and please note the second slide with all people I have worked with on things presented in these slides.

Recent presentation: "Open Access: a practical perspective"

Source: MediaWiki Commons
For a local grant acquisition course I recently gave a presentation about Open Access (OA). My interest in OA started from my Open Science background and lack of access to literature was a serious problem. Journals were invented to make knowledge dissemination easier, but many publishers are stuck with outdated technologies that make their knowledge dissemination not caught up with the 21st century. BTW, OA to me is the one that actually really helps knowledge dissemination and allows:
  1. download and use (text mining!)
  2. modification (format change!)
  3. redistribute (allow others to read it to! share your modifications!)
There are several stories around showing that fast knowledge exchange saves lives (is there an overview of well-documented examples?). Honestly, I would be surprised that people do not also die because of disseminated knowledge, but then it is of misuse of knowledge, and not because of knowledge denied. And this is what access to knowledge can mean:
It shows that you can get far with access to the right knowledge (here in the form of data). This must be a right every human has. In fact, it is part, but as often, legal wording complicates things. Wikipedia has a good overview. Like with free speech, it tries to find a balance between rights of all people: the right of one cannot restrict the rights of others. Well, I don't know if "caching in" is a human right, but surely many people believe so.

And not every human has this opportunity that Pepke had. Access to knowledge is a serious problem. A problem I am facing every week myself, and then I find myself at a relatively well equipped Maastricht University Library. A recent study found that even researchers at my university found Sci-Hub an important resource, as can be seen in the below slides. I do not encourage Sci-Hub. The legal basis in unclear, but at least it's not found illegal at this moment (as far as I could keep up with the process). And there are many alternatives, which I blogged about earlier.

Fact is, we have a knowledge dissemination issue. And that was the main message of my presentation. Because it is easy to solve as author: don't give away your IP to publishers and by choosing an Open Access license of your work (the gold OA version, as green OA is like the Rolex you by for 10 euro at the black market).

And I'll end with this quote from John Oliver:

"Knowledge dissemination: a topic you know so little about, you think the best kind of dissemination if a Nature journal ReadCube."

Pepke, S., Steeg, G. V., Sep. 2016. Comprehensive discovery of subsample gene expression components by information explanation: therapeutic implications in cancer. bioRxiv, 043257+.

Friday, November 11, 2016

New paper: "SPLASH, a hashed identifier for mass spectra"

I'm excited to have contributed to this important (IMHO) interoperability paper around metabolomics data: "SPLASH, a hashed identifier for mass spectra" (doi:10.1038/nbt.3689, readcube:msZj). A huge thanks to all involved in the great collaborative project! The source code project is fully open source and coordinated by Gert Wolgemuth, the lead author on this paper. It provides an implementation of the algorithm in various programming languages and I'm happy that the splash functionality is available in the just released Bioclipse 2.6.2 (taking advantage of the Java library). An R package by Steffen Neumann is also available.

This new identifier greatly simplifies linking between spectral databases and will in the end contribute to a Linked Data network. Furthermore, journals can start adopting this identifier and list the 'splash' for mass spectra in document, allowing for simplified dereplication and finding additional information around spectra.

There are several databases that have adopted the SPLASH already, such as MassBank, HMDB, MetaboLights, and the OSDB published in JCheminf recently (doi:10.1186/s13321-016-0170-2).

Screenshot snippet of a spectrum in the OSDB.

PS. I personally don't like the idea of ReadCubes (which I may blog about at some point) and how they have been pitched as a "legal" way of sharing papers, but this journal does not have a gold Open Access option, unfortunately.

Wohlgemuth, G., Mehta, S. S., Mejia, R. F., Neumann, S., Pedrosa, D., Pluskal, T., Schymanski, E. L., Willighagen, E. L., Wilson, M., Wishart, D. S., Arita, M., Dorrestein, P. C., Bandeira, N., Wang, M., Schulze, T., Salek, R. M., Steinbeck, C., Nainala, V. C., Mistrik, R., Nishioka, T., Fiehn, O., Nov. 2016. SPLASH, a hashed identifier for mass spectra. Nature Biotechnology 34 (11), 1099-1101.