Sunday, March 25, 2018

SPLASHes in Wikidata

Mass spectrum from the OSDB (see also this post).
A bit over a year ago I added EPA CompTox Dashboard IDs to Wikidata. Considering that an entry in that database means that likely is something known about the adverse properties of that compound, the identifier can be used as proxy for that. Better, once the EPA team starts supporting RDF with a SPARQL end point, we will be able to do some cool federated queries.

For metabolomics the availability of mass spectra is of interest for metabolite identification. A while ago the SPLASH was introduced (doi:10.1038/nbt.3689), and adopted by several databases around the world. After the recent metabolomics winterschool it became apparent that this is now enough adopted to be used in Wikidata. So, I proposed a new SPLASH Wikidata property, which was approved last week (see P4964). The MassBank of North America (MoNA; Fiehn's lab) team made available a mapping the links the InChI for the compounds with SPLASH identifiers for spectra for that compound, as CCZero.

So, over the weekend I pushed some 37 thousand SPLASHes into Wikidata :)

This is for about 4800 compounds.

Yes, technically, I used the same Bioclipse script approach as with the CompTox identifiers, resulting in QuickStatements. Next up is SPLASHs from the Chalk's aforementioned OSDB.