Wednesday, January 27, 2016

Adding chemical compounds to Wikidata

Adding chemical compounds to Wikidata is not difficult. You can store the chemical formula (P274), (canonical) SMILES (P233), InChIKey (P235) (and InChI (P234), of course), as well various database identifiers (see what I wrote about that here). It also allows storing of the provenance, and has predicates for that too.

So, to enter a new structure for a compound, you should enter the compound information to Wikidata. Of course, make sure to create the needed accounts, particularly one for Wikidata (create account) (not sure if the next steps needs a more general Wikimedia account too).

Entering the research paper
Magnus Manske pointed me to this helper tool. If you have the DOI of the paper, it is easy to add a new paper. This is what the tool shows for doi:10.1128/AAC.01148-08 (but no longer when you try!):

You need permission to run this script and the tool will alert you about that, and give the instructions how to get permission. After I clicked the Open in QuickStatements I get this output, showing me an entry in Wikidata was created for this paper:

Later, I can use the new Q-code (Q22309806) to use as source for statements about the compound (formula, etc).

Draw your compound and get an InChIKey
The next step is to draw a compound and get an InChIKey. This can be done with many tools, including Bioclipse. Rajarshi opted for alternatives:

Then check if the compound is not already in Wikidata. You can use this SPARQL query for that using the InChIKey of the compound (it's for acetic acid, so it will be found):

For convenience, here the copy/pastable SPARQL:
    PREFIX wdt: 
    SELECT ?compound WHERE {
      ?compound wdt:P235 "QTBSBXVTEAMEQO-UHFFFAOYSA-N" .
Entering the compound
So, the compound is not already in Wikidata, so time to add it. The minimal information you should provide is the following:
  • mark the new entry as 'instance of' (P) 'chemical compound (Q)
  • the chemical formula and SMILES (use as reference the paper)
    • add the reference to the paper you entered above
  • add the InChIKey and/or InChI
The first step is to create a new Wikidat entry. The Create new item menu in the left side panel can be used, showing a page like this:

As a label you can use the name used in the paper for the compound, even if a code, and as description 'chemical compound' will do for now; it can be changed later.
Feel free to add as much information about the compound as you can find. There are some chemically rich entries in Wikidata, such as that for acetic acid (Q47512).

Wednesday, January 13, 2016

Publishing H2020 Proposals

Figure from the RIO paper.
Over a year ago Daniel Mietchen invited me to join writing a H2020 proposal around Open Science. Well, that combines two of my current worlds, so interesting indeed. But there was more: Daniel wanted to do the writing openly, and that was certainly new to me. But since I see piles of benefits in open science, this is sort of the next step. Not obvious, perhaps, but certainly a step I wanted to try.

The proposal that resulted from this was "Enabling Open Science: Wikidata for Research (Wiki4R)", as said, lead by Daniel Mietchen. It was drafted fully in the open, and we got a lot of feedback from people not involved in the anticipated consortium. Of course, we did not get it; you would have heard me about it earlier if we had.

As part of the open writing is, of course, an open license, to ensure everyone who participates has equal IP on the proposal. (Some seem to forget that an Open Access license is not giving your IP; you're just licensing it!) The final, proposal was posted on ZENODO (see below) just after submission. More recently, however, Daniel submitted it to Research Ideas and Outcomes journal (ISSN 2367-7163) (which, of course, the Open license allows too!) some weeks back, which is a new journal which covers not just the end product of some research (a research paper), but also other things, including project proposals (full reference below). Mind you, not everything in this "journal" of peer-reviewed pre-publication, and the proposal is not reviewed, indeed. Post-review is most welcome, BTW! Just head of to PubPeer or Publons and start ranting about the proposal ;)

Now, the journal seems to have blogged about this H2020 proposal publication - Daniel is involved in setting up the journal - and send it out as a press release-like thing, which is actually being picked up by news outlets :) That's new to me too.

All in all, it's an interesting experiment, and I am grateful to Daniel for having been able to be part of this. Writing H2020 proposals openly is a new phenomenon, and I cannot commit myself to use this approach for all my proposals, but I think I may do this more often in the future.

Mietchen, D., Hagedorn, G., Willighagen, E., Rico, M., Gomez-Perez, A., Aibar, E., Rafes, K., Germain, C., Dunning, A., Pintscher, L., Kinzler, D., Anonymous, Jan. 2015. Enabling open science: Wikidata for research.
Mietchen, D., Hagedorn, G., Willighagen, E., Rico, M., Gómez-Pérez, A., Aibar, E., Rafes, K., Germain, C., Dunning, A., Pintscher, L., Kinzler, D., Dec. 2015. Enabling open science: Wikidata for research (Wiki4R). Research Ideas and Outcomes 1, e7573+.

Sunday, January 03, 2016

ELIXIR is setting up a Tools and Data Services Registry

ELIXIR is setting up a Tools and Data Services Registry. Recently, they organized a workshop in Amsterdam that I attended and where I learned how to add tools and services to their database. I played with the entry for WikiPathways, and one of the nice things is that it inherits from past European registry projects and allows the encoding if the input and output format, for tools and services alike. Here's what it gives for WikiPathways now:

The record editing facility is pretty straightforward and uses a number of tabs where you can add information.

A summary:

The publications:


Where documentation is found:

And information would is not really supplementary, such as the license terms:

Here, the collections are of particular interest. During the meeting, a few people from the Dutch Techcenter for Life Sciences decided to use a ELIXIR-NL group for all Dutch services that benefit the full ELIXIR network. Furthermore, the BIGCAT-UM collection was set up to indicate all services by our research group, which may eventually serve is a folder towards supporting the Dutch ZonMW Enabling Technologies Hotels calls.

Mind you, the registry can distinguish various services. The above entry is for the web interface, not for the web services. That entry in the registry is not that well populated yet, and that's for a reason. (Actually, more than one, one being that I did not create that entry and cannot change it).

But the WikiPathways Webservices are nicely exposed via a Swagger configuration file. Moreover, the registry supports JSON too, export and import. The format is pretty simply and we only need to create a Swagger 2.0 config file convertor. I just need to find a bit of time to finish my draft implementation.

Ison, J., Rapacki, K., Ménager, H., Kalaš, M., Rydza, E., Chmura, P., Anthon, C., Beard, N., Berka, K., Bolser, D., Booth, T., Bretaudeau, A., Brezovsky, J., Casadio, R., Cesareni, G., Coppens, F., Cornell, M., Cuccuru, G., Davidsen, K., Vedova, G. D., Dogan, T., Doppelt-Azeroual, O., Emery, L., Gasteiger, E., Gatter, T., Goldberg, T., Grosjean, M., Grüning, B., Helmer-Citterich, M., Ienasescu, H., Ioannidis, V., Jespersen, M. C., Jimenez, R., Juty, N., Juvan, P., Koch, M., Laibe, C., Li, J.-W., Licata, L., Mareuil, F., Mičetić, I., Friborg, R. M., Moretti, S., Morris, C., Möller, S., Nenadic, A., Peterson, H., Profiti, G., Rice, P., Romano, P., Roncaglia, P., Saidi, R., Schafferhans, A., Schwämmle, V., Smith, C., Sperotto, M. M., Stockinger, H., Vařeková, R. S., Tosatto, S. C. E., de la Torre, V., Uva, P., Via, A., Yachdav, G., Zambelli, F., Vriend, G., Rost, B., Parkinson, H., Løngreen, P., Brunak, S., Jan. 2016. Tools and data services registry: a community effort to document bioinformatics resources. Nucleic Acids Research 44 (D1), D38-D47.

Open Spectral Database

Stuart Chalk wrote on the CHMINF-L mailing list about Open Spectral Database (OSDB). This new database is more of an idea than something with critical mass yet. But the idea seems right: it has a CCZero waiver for the data, is Open Source (see, and API. The webinterface looks good too:

It supports various spectral types and maybe it can be seeded with data from one of the Massbank instances. That said, it does seem popular enough to already attract some spamming in the collections corner; that also means, it needs curators that keep an eye on what enters. Perhaps register via ORCID may be an option to fight spam, but I do not have experience with setting that up. Other feature requests I can think of is links out to Wikidata, in addition to the existing three databases.

Now I really have a good reason to dig out my past NMRShiftDB contributions and submit that here (see also these past blog posts about NMRShiftDB).

Saturday, January 02, 2016

Project "Chemical Safety Library", aka redistributable MSDS data

Public Domain, Wikipedia.
The Pistoia Alliance has an interesting project proposal:
    This project consists of two distinct phases:
    1. development of a collaborative system for sharing information about known laboratory hazard based on an existing system developed and implemented by a member of the Pistoia Alliance 
    2. working with chemical suppliers and publishers to define and adopt standards for hazard information making MSDS and handbooks more accessible and more easily used
One reason this has never happened is that in certain jurisdictions (re)distributing such information introduces a liability for the person or organization doing that (re)distribution, I was told (IANAL). Looking forward to this project, whether it will be open, how they will handle redistribution (needed if you want to have it show up in ELNs), etc