Monday, December 31, 2018

Wikidata-Taxonomy: class and instance hierarchies on the command line

For some time I had a 2017 Tweet from Dan Brickley on my todo list (I use Todoist), and now that it is holiday, I finally had time to play with Wikidata-Taxonomy. Here's it in action for a class of five phytocassanes:

$ node wdtaxonomy.js  Q60224961 -i

Give it a try.

Friday, December 28, 2018

Replacing BibTeX with Citation.js

As part of replacing LaTeX with Markdown for my Groovy Cheminformatics book (now Open Access), I also needed to replace BibTex. Fortunately, Citation.js supports Wikidata and the solution by Lars was simpler than I hoped. Similar to LaTeX, I have citations annotated in the Markdown, but the reference code does not refer to a BibTeX file entry, but to Wikidata (see also Wikidata-powered citation lists with citation.js).

The set up is as follows:
  1. extract the Wikidata Q-codes (which creates references.qids)
  2. using Citation.js to format the reference as plain text
  3. number of the citations and create the bibliography
The first step uses a Groovy script, and the second a very short JavaScript script:

fs.readFile('references.qids', 'utf8',
            async function (err, file) {
  const data = Array.from(await Cite.async(file)).map(
    item => + '=' + Cite(item).format(
      'bibliography', {template: 'vancouver'}
  fs.writeFile('references.dat', data.join(''),
    function() {}

The result looks like:

I have yet some things left to do, like add the DOI, and add some Markdown formatting. But the toolkit allows that but also is not urgent.

Thursday, December 27, 2018

Creating nanopublications with Groovy

Compound found in Taphrorychus bicolor
Published in Liebigs Annalen, see
this post about the history of that journal.
Yesterday I struggled some with creating nanopublications with Groovy. My first attempt was an utter failure, but then I discovered Thomas Kuhn's NanopubCreator and it was downhill from there.

There are two good things about this. First, I now have a code base that I can easily repurpose to make trusty nanopublications (doi:10.1007/978-3-319-07443-6_63) about anything structured as a table (so can you).

Second, I now about almost 1200 CCZero nanopublications that tell you in which species a certain metabolite has been found. Sourced from Wikidata, using their SPARQL end point. This collection is a bit boring that this moment, and most of them are human metabolites, where the source is either Recon 2.2 or WikiPathways. But I expect (hope) to see more DOIs to show up. Think We challenge you to reuse Additional Files.

Finally, you are probably interested in learning what one of the created nanopublications looks like, to I put a Gist online:

Wednesday, December 26, 2018

Groovy Cheminformatics rises from the ashes

Cover of the last print
version of the book.
Like a phoenix (Phenix aegyptus), my Groovy Cheminformatics rises from the ashes. About a year ago I blogged that I could not longer maintain my book, not in the print form. The hardest part was actually resizing the cover each time the book got thicker. I actually started the book about 10 years ago, but the wish to make it Open Access grew bigger with the years.

So, here we go. It's based on CDK 2.0, but somewhere in the coming weeks I'll migrate to the latest version. It will take some weeks to migrate all content, and your chapter priority requests here.

The making of...
Over the past months I have been playing with some ideas on how to make the transition. I wanted to preserve the core concept of the book that all books are compiled and executed which each release and that all output of scripts is autogenerated (including many of the diagrams). I wanted to publish the next iteration of the book as Markdown, but also pondered with the idea of still being able to generate a PDF with LaTeX. That means I have a lot of stuff to upgrade.

I ended up somewhere in between. It's source is Markdown, but not entirely. It's source code that looks like Markdown with snippets of XML. This makes sure the source looks formatted when on GitHub:
But you can see that this is not processed yet. The CreateAtom1 and CreateAtom2 refers to code examples, and the above screenshot shows the source of a source code inclusion (for CreateAtom1 and CreateAtom2) and a output inclusion (for CreateAtom2). After processing, the actual page looks like this:

That looks pretty close to what the print book had. An extra here is that you can click (hard in a print book) the link to the code. That is something I improved on along the way, and leads to a Markdown (new) page that shows the full sources and the output (should I add the @Grab instructions, or too obvious?):

If you check the first online version (🎶 On the first day of xmas, #openscience got from me ... 🎶), I have quite some content to migrate. First, back to doing the reference sections properly, as if I was still working with BibLaTeX.

Happy holidays!

Saturday, December 22, 2018

About Frontiers

Frontiers is getting a lot of critique at this moment, about very low rejection rates (only ~10%), reviewers who seemingly cannot reject articles, the use of the impact factor (sad), their almost pyramid-like gaming of recruiting editors, reviewers, etc are questionable to me (focused on continuous growth of literature, which we must not want), and perhaps most important, questionable lobbying around Plan S. Also, they are just expensive and I see little real publishing innovation.

For Marvin Martens' paper we received fair quality reviewers. But with the above points in mind, retrospectively, I want to comment. For this paper we had a reviewer that withdrew, and while they provided feedback, we could not directly reply to this reviewer, and we had to direct our replies and updates based on those reviews to the editor instead.

But I note that the the "major + withdrew + minor" we received could just as well have been (my personal interpretation based on the reviewers' comments) a "major + reject + minor". The third review was based on our revision and we took into account the reviews of both the major and reject review. For me as editor, a "major + reject" often results in a "back to the drawing board" decision. For this paper we were lucky, and the reject was mostly about the excellent note by the reviewer that our article was wrongly submitted as a review article, which we corrected (should have been "Hypothesis and Theory" for a positioning paper).

I'll carefully monitor where Frontiers is going, but their prominent use of the impact factor and the intention to keep increasing the volume of journal article literature alone is reason enough for me to not quickly consider them again. We have a second paper under review with Frontiers, but I will have a moratorium on Frontiers until further notice.

BTW, if you like to see journals publish their rejection rates, please RT this tweet:

New paper: "Introducing WikiPathways as a Data-Source to Support Adverse Outcome Pathways for Regulatory Risk Assessment of Chemicals and Nanomaterials"

An adverse outcome pathway (AOP) links
molecular initiating events (MIEs) via key
events (KEs) to the adverse outcome (AO).
Each event is a biological process and it should
be able to link them to normal biological
pathways (PWs). Figure from the paper.
Marvin Martens published his vision on the integration of adverse outcome with biological pathways (doi:10.3389/fgene.2018.00661). Specifically, he looked into our options to link the AOPWiki with WikiPathways, taking input from various people around the world (see the list of co-authors). The paper looks into how links can be made, and some statistics are calculated for genes mentioned in AOPs and biological pathways, as well as seeing which molecular initiators are found in biological pathways (see the figure on the right).

The paper started out as positioning paper but I was happy to see that Marvin could not resist getting some actual data and include that as well. With the code available from GitHub and archived on Zenodo (doi:10.5281/ZENODO.1306408). The next step is to formalize this integration, and the first bits of data are being produced and they look very exciting!

BTW, if you like where this is going, also make sure to read this paper by Dr. Penny NymarkA Data Fusion Pipeline for Generating and Enriching Adverse Outcome Pathway Descriptions).

Monday, December 17, 2018

From the "Annalen der Pharmacie" to the "European Journal of Organic Chemistry"

2D structure of caffeine, also
known as theine.
One of my hobbies is the history of chemistry. It has a practical use to my current research, as a lot of knowledge about human metabolites is actually quite ancient. One thing I have trouble understanding that in a time where Facebook knows you better than your spouse, we have trouble finding relevant literature without expensive, expert databases, not generally available.

Hell, even the article that established that some metabolite is actually a human metabolite is not found within reasonable time (less than a minute).

This is one of the reasons I started working on Scholia, and the chemistry corner of it specifically. See this ICCS conference poster. The poster outlines some of the reasons why I like it, but one is this link between chemical structures and literature, here for caffeine:

You can see the problem with our chemical knowledge here (in Wikidata): before 1950 it's pretty blank. Hence my question on Twitter what journal to look at. A few suggestions came back, and I decided to focus on the journal that is now called the European Journal of Organic Chemistry but that started in 1832 as the Annalen der Pharmacie. I remember the EurJOC being launched by the KNCV and many other European chemistry societies.

BTW, note here that all these chemistry societies decided it was better to team up with a commercial publisher than to continue publishing it themselves. #Plan_S

Anyway, the full history is not complete, but the route from Annalen to EurJOC now is (each journal name has a different color):

That took me an hour or two, because CrossRef has for all articles the EurJOC journal name. Technically perhaps correct, but metadata-wise the above is much better. Thanks to whomever actually created Wikidata items for each journal and linking them by follows and followed by.

In doing so, you quickly run into many more metadata issues. The best one I found was a paper by Crasts and Friedel, known for the Friedel-Crafts reaction :) Other gems are researcher names like Erlenmeyer-Heidelberg and Demselben and Von Demselben.

Back to caffeine, an active chemical in coffee, a chemical many of us must have in the morning, is actually the same as theine. Tea drinkers also get their dose of caffeine. We all know that. What I did not know, but discovered while doing this work, is that already established that :caffeine owl:sameAs :theine (doi:10.1002/jlac.18380250106). Cool!