Saturday, July 02, 2016

Harmonized identifiers in the WikiPathways RDF

Biological knowledge should not only be captured
in nice graphics, but should be machine readable.
Public domain image from Wikipedia.
WikiPathways described biological processes. Entities in these processes are genes, gene products, like miRNAs, proteins, and metabolites. The pathways do not describe what these entities are, but only provide identifiers in external databases allowing you to study the identity in those databases. Therefore, for metabolites you will not find chemical graphs but identifiers from HDMB, CAS, KEGG, ChEBI, and others.

To ensure experimental data can be mapped to these pathways, independent of whatever identifiers are used, BridgeDb was developed. WikiPathways uses a BridgeDb webservice, Open PHACTS embeds BridgeDb technologies in their Identifier Mapping Service (particularly developed by Carole Goble's team), and PathVisio uses local BridgeD ID mapping files.

The WikiPathways SPARQL end point is not using the Open PHACTS IMS and Andra introduced harmonized identifiers and provides these as additional triples in the WikiPathways RDF. For example:

SELECT DISTINCT ?gene fn:substring(?ensId,32) as ?ensembl
  ?gene a wp:GeneProduct ;
    wp:bdbEnsembl ?ensId .

Now, the gene resource IRIs actually use the Ensembl identifier when available, so this query returns redundant information, but there are other harmonized identifiers available:

  ?entity a ?type; ?pred [] .
  FILTER (regex(?pred,'bdb'))

That results in a table like this:

Therefore, for these databases it is easy to make links between those identifiers and the pathways in which entities with those identifiers are found. For example, to create a link between Ensembl identifiers and pathways, we could do something like:

  ?pathwayRes str(?wpid) as ?pathway
  str(?title) as ?pathwayTitle
  fn:substring(?ensId,32) as ?ensembl
  ?gene a wp:GeneProduct ;
    dcterms:identifier ?id ;
    dcterms:isPartOf ?pathwayRes ;
    wp:bdbEnsembl ?ensId .
  ?pathwayRes a wp:Pathway ;
    dcterms:identifier ?wpid ;
    dc:title ?title .

I am collecting a number of those queries in the WikiPathways help wiki's page with many example SPARQL queries. For example, check out the federated SPARQL queries listed there.