Pages

Wednesday, January 27, 2016

Adding chemical compounds to Wikidata

Adding chemical compounds to Wikidata is not difficult. You can store the chemical formula (P274), (canonical) SMILES (P233), InChIKey (P235) (and InChI (P234), of course), as well various database identifiers (see what I wrote about that here). It also allows storing of the provenance, and has predicates for that too.

So, to enter a new structure for a compound, you should enter the compound information to Wikidata. Of course, make sure to create the needed accounts, particularly one for Wikidata (create account) (not sure if the next steps needs a more general Wikimedia account too).

Entering the research paper
Magnus Manske pointed me to this helper tool. If you have the DOI of the paper, it is easy to add a new paper. This is what the tool shows for doi:10.1128/AAC.01148-08 (but no longer when you try!):


You need permission to run this script and the tool will alert you about that, and give the instructions how to get permission. After I clicked the Open in QuickStatements I get this output, showing me an entry in Wikidata was created for this paper:


Later, I can use the new Q-code (Q22309806) to use as source for statements about the compound (formula, etc).

Draw your compound and get an InChIKey
The next step is to draw a compound and get an InChIKey. This can be done with many tools, including Bioclipse. Rajarshi opted for alternatives:

Then check if the compound is not already in Wikidata. You can use this SPARQL query for that using the InChIKey of the compound (it's for acetic acid, so it will be found):


For convenience, here the copy/pastable SPARQL:
    PREFIX wdt: 
    SELECT ?compound WHERE {
      ?compound wdt:P235 "QTBSBXVTEAMEQO-UHFFFAOYSA-N" .
    }
    
Entering the compound
So, the compound is not already in Wikidata, so time to add it. The minimal information you should provide is the following:
  • mark the new entry as 'instance of' (P) 'chemical compound (Q)
  • the chemical formula and SMILES (use as reference the paper)
    • add the reference to the paper you entered above
  • add the InChIKey and/or InChI
The first step is to create a new Wikidat entry. The Create new item menu in the left side panel can be used, showing a page like this:


As a label you can use the name used in the paper for the compound, even if a code, and as description 'chemical compound' will do for now; it can be changed later.
Feel free to add as much information about the compound as you can find. There are some chemically rich entries in Wikidata, such as that for acetic acid (Q47512).