Monday, December 17, 2018

From the "Annalen der Pharmacie" to the "European Journal of Organic Chemistry"

2D structure of caffeine, also
known as theine.
One of my hobbies is the history of chemistry. It has a practical use to my current research, as a lot of knowledge about human metabolites is actually quite ancient. One thing I have trouble understanding that in a time where Facebook knows you better then your spouse, we have trouble finding relevant literature without expensive, expert databases, not generally available.

Hell, even the article that established that some metabolite is actually a human metabolite is not found within reasonable time (less than a minute).

This is one of the reasons I started working on Scholia, and the chemistry corner of it specifically. See this ICCS conference poster. The poster outlines some of the reasons why I like it, but one is this link between chemical structures and literature, here for caffeine:

You can see the problem with our chemical knowledge here (in Wikidata): before 1950 it's pretty blank. Hence my question on Twitter what journal to look at. A few suggestions came back, and I decided to focus on the journal that is now called the European Journal of Organic Chemistry but that started in 1832 as the Annalen der Pharmacie. I remember the EurJOC being launched by the KNCV and many other European chemistry societies.

BTW, note here that all these chemistry societies decided it was better to team up with a commercial publisher than to continue publishing it themselves. #Plan_S

Anyway, the full history is not complete, but the route from Annalen to EurJOC now is (each journal name has a different color):

That took me an hour or two, because CrossRef has for all articles the EurJOC journal name. Technically perhaps correct, but metadata-wise the above is much better. Thanks to whomever actually created Wikidata items for each journal and linking them by follows and followed by.

In doing so, you quickly run into many more metadata issues. The best one I found was a paper by Crasts and Friedel, known for the Friedel-Crafts reaction :) Other gems are researcher names like Erlenmeyer-Heidelberg and Demselben and Von Demselben.

Back to caffeine, an active chemical in coffee, a chemical many of us must have in the morning, is actually the same as theine. Tea drinkers also get their dose of caffeine. We all know that. What I did not know, but discovered while doing this work, is that already established that :caffeine owl:sameAs :theine (doi:10.1002/jlac.18380250106). Cool!

Saturday, November 17, 2018

Join me in encouraging the ACS to join the Initiative for Open Citations

My research is into abstract representation of chemical information, important for other research to be performed. Indeed, my work is generally reused, but knowing which research fields my work is used in, or which societal problems it is helping solve, is not easily retrieved or determined. Efforts like WikiCite and Scholia do allow me to navigate the citation network, so that I can determine which research fields my output influences and which diseases are studied with methods I proposed. Here's a network of topics of articles citing my work:

Graphs like this show information on how people are using my work, which in turn allows me to further support. But this relies on open citations.

In my opinion, citations are an essential part of our research process. It gives us access to import prior work on which a study is based, and reflects how a work influences other research or even is essential to that other work. For example, it allows us to not repeat earlier published work, while preserving the ability to reproduce the full work. The Initiative for Open Citations encourages these citations to be publicly available to benefit research, but removing barriers to access this critical part of scholarly communication. While many societies and publishers have joined this initiative, the American Chemical Society (ACS) has not yet. By not joining the limit the sharing of knowledge for unclear reasons.

And I would really like to see the ACS to join this initiative, and proposed this a few times already. Because they still have not joined the initiative, I have started this petition. If you agree, please sign and share it with others.

New paper: "Explicit interaction information from WikiPathways in RDF facilitates drug discovery in the Open PHACTS Discovery Platform"

Figure from the article showing the interactive
Open PHACTS documentation to access
Ryan, PhD candidate in our group, is studying how to represent and use interaction information in pathway databases, and WikiPathways specifically. His paper Explicit interaction information from WikiPathways in RDF facilitates drug discovery in the Open PHACTS Discovery Platform (doi:10.12688/f1000research.13197.2) was recently accepted in F1000Research, which extends on work started by, among others, Andra (see doi:10.1371/journal.pcbi.1004989).

The paper describes the application programming interfaces (API) methods of the Open PHACTS REST API for accessing interaction information, e.g. to learn which genes are upstream of downstream in the pathway. This information can be used in pharmacological research. The paper discussed examples queries and demonstrates how the API methods can be called from HTML+JavaScript and Python.

Sunday, November 04, 2018

Programming in the Life Sciences #23: research output for the future

A random public domain
picture with 10 in it.
Ensuring that you and others can understand you research output five years from now requires effort. This is why scholars tend to keep lab notebooks. The computational age has perhaps made us a bit lazy here, but we still make an effort. A series of Ten Simple Rules articles outline some of the things to think about:
  1. Goodman A, Pepe A, Blocker AW, Borgman CL, Cranmer K, Crosas M, et al. Ten Simple Rules for the Care and Feeding of Scientific Data. Bourne PE, editor. PLoS Computational Biology. 2014 Apr 24;10(4):e1003542.
  2. List M, Ebert P, Albrecht F. Ten Simple Rules for Developing Usable Software in Computational Biology. Markel S, editor. PLOS Computational Biology. 2017 Jan 5;13(1):e1005265.
  3. Perez-Riverol Y, Gatto L, Wang R, Sachsenberg T, Uszkoreit J, Leprevost F da V, et al. Ten Simple Rules for Taking Advantage of Git and GitHub. Markel S, editor. PLOS Computational Biology. 2016 Jul 14;12(7):e1004947.
  4. Prlić A, Procter JB. Ten Simple Rules for the Open Development of Scientific Software. PLoS Computational Biology. 2012 Dec 6;8(12):e1002802.
  5. Sandve GK, Nekrutenko A, Taylor J, Hovig E. Ten Simple Rules for Reproducible Computational Research. Bourne PE, editor. PLoS Computational Biology. 2013 Oct 24;9(10):e1003285.
Regarding licensing, I can highly recommend reading this book:
  1. Rosen L. Open Source Licensing [Internet]. 2004. Available from:
Regarding Git, I recommend these two resources:
  1. Wiegley J. Git From the Bottom Up [Internet]. 2017. Available from:
  2. Task 1: How to set up a repository on GitHub [Internet]. 2018. Available from:

Saturday, November 03, 2018

Fwd: "We challenge you to reuse Additional Files (a.k.a. Supplementary Information)"

Download statistics of J. Cheminform.
Additional Files show a clear growth.
Posted on the BMC (formerly BioMedCentral) Research in progress blog our challenge to you to reuse additional files:
    Since our open-access portfolio in BMC and SpringerOpen started collaborating with Figshare, Additional Files and Supplementary Information have been deposited in journal-specific Figshare repositories, and files available for the Journal of Cheminformatics alone have been viewed more than ten thousand times. Yet what is the best way to make the most of this data and reuse the files? Journal of Cheminformatics challenges you to think about just that with their new upcoming special issue.
We already know you are downloading the data frequently and more every year, so let us know what you're doing with that data!

For example, I would love to see more data from these additional files end up in databases, such as Wikidata, but any reuse in RDF form would interest me.