Tuesday, May 21, 2019

Scholia: an open source platform around open data

Some 2.5 years ago Finn Nielsen started Scholia. I have been blogging about it a few times, and thanks to Finn, Lane Rasberry, and Daniel Mietchen, we were awarded a grant by the Alfred P. Sloan Foundation to continue working on it (grant: G-2019-11458). I'll tweet more about how it fits the infrastructure to support our core research lines, but for now just want to mention that we published the full proposal in RIO Journal.

Oh, just as a teaser and clickbait, here's one of the use cases. dissemination of knowledge of metabolites and chemicals in general (full poster):

Saturday, May 18, 2019

LIPID MAPS: mass spectra and species annotation from Wikidata

Part of the LIPID MAPS classification
scheme in Wikidata (try it).
A bit over a week I attended LIPID MAPS Working Group meeting in Cambridge, as I have become member of the Working Group 2: Tools and Technical Committee in autumn. That followed a fruitful effort by Eoin Fahy to make several LIPID MAPS pathways available in WikiPathways (see this Lipids Portal), e.g. the Omega-3/Omega-6 FA synthesis pathway. It was a great pleasure to attend the meeting, meet everyone, and I learned a lot about the internals of the LIPID MAPS project.

I showed them how we contribute to WikiPathways, particularly in the area of lipids. Denise Slenter and I have been working on having more identifier mappings in Wikidata, among which the lipids. Some results of that work was part of this presentation. One of the nice things about Wikidata is that you can make live Venn diagrams, e.g. compounds in LIPID MAPS for which Wikidata also has a statement about which species it is found in (try it):

SELECT ?lipid ?lipidLabel ?lmid ?species ?speciesLabel
            ?source ?sourceLabel ?doi
  ?lipid wdt:P2063 ?lmid ;
         p:P703 ?speciesStatement .
    ?speciesStatement prov:wasDerivedFrom/pr:P248 ?source ;
                      ps:P703 ?species .
    OPTIONAL { ?source wdt:P356 ?doi }
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en".

A second query searches lipids for which also mass spectra are found in MassBank (try it):

  ?lipid ?lipidLabel ?lmid
  (GROUP_CONCAT(DISTINCT ?massbanks) as ?massbank)
  ?lipid wdt:P2063 ?lmid ;
         wdt:P6689 ?massbanks .
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en".
} GROUP BY ?lipid ?lipidLabel ?lmid


Saturday, May 04, 2019

Wikidata, CompTox Chemistry Dashboard, and the DSSTOX substance identifier

The US EPA published a paper recently about the CompTox Chemistry Dashboard (doi:10.1186/s13321-017-0247-6). Some time ago I worked with Antony Williams and we proposed a matching Wikidata identifier. When it was accepted, I used a InChIKey-DSSTOX identifier mapping data sets by Antony (doi:10.6084/M9.FIGSHARE.3578313.V1) to populate Wikidata with links. Overtime, when more InChIKeys were found in Wikidata, I would use this script to add additional mappings. That resulted in this growth graph:

Source: Wikidata.
Now, about a week ago Antony informed me he worked with someone of Wikipedia to have the DSSTOX automatically show up in the ChemBox, which I find awesome. It's cool to see your work on about 38 thousand (!) Wikipedia pages :)
Part of the ChemBox of perfluorooctanoic acid.
(I'm making the assumption that all 38 thousand Wikidata pages for chemicals have Wikipedia equivalents, which may be a false assumption.)