Pages

Friday, February 27, 2009

Solubility Data in Bioclipse #3: Finding ChEBI IDs

With the RDF functionality set up in Bioclipse (see Solubility Data in Bioclipse #2: handling RDF), we can start mining the Chemical RDF space. Check out this mashup:
var ons = rdf.createStore()
// output: RDFStore: 0 triples

rdf.importURL(ons,
"http://github.com/egonw/onssolubility/raw/master/ons.solubility.rdf/ons.rdf")
// output: RDFStore: 1206 triples

var results = rdf.sparql(ons, "PREFIX owl: <http://www.w3.org/2002/07/owl#> " +
"PREFIX ons: <http://spreadsheet.google.com/plwwufp30hfq0udnEmRD1aQ/onto#> " +
"SELECT DISTINCT ?same WHERE { " +
" ?solvent a ons:Solvent . " +
" ?solvent owl:sameAs ?same" +
"}"
)

for (i=0; i<results.size(); i++) {
var row = results.get(i);
for (j=0; j<row.size(); j++) {
// use the owl:sameAs to find more triples
var uri = row.get(j);
if (uri.startsWith("http://rdf.openmolecules.net/?")) {
print("Added " + uri + "...\n");
rdf.importURL(ons, uri);
}
}
}

rdf.sparql(ons, "PREFIX owl: <http://www.w3.org/2002/07/owl#> " +
"PREFIX ons: <http://spreadsheet.google.com/plwwufp30hfq0udnEmRD1aQ/onto#> " +
"PREFIX rdfonm: <http://rdf.openmolecules.net/#> " +
"PREFIX dc: <http://purl.org/dc/elements/1.1/> " +
"SELECT DISTINCT ?title ?chebi WHERE { " +
" ?solvent a ons:Solvent . " +
" ?solvent dc:title ?title . " +
" ?solvent owl:sameAs ?same ." +
" ?same rdfonm:chebiid ?chebi" +
"}"
)


What happens in this script is the following:
  1. Load the ONS Solubility data (line 4-5)
  2. ask for all owl:sameAs relations to navigate (line 8-14)
  3. load the RDF for the rdf.openmolecule.net resources (line 16-26)
  4. query for all solvents which have an ChEBI identifier (line 28-38)
The output will look like the following (in the future this will be opened as spreadsheet in Bioclipse):
[[ethanol 40C, CHEBI:16236], 
[acetonitrile, CHEBI:38472],
[chloroform, CHEBI:35255],
[methanol 30C, CHEBI:17790],
[THF, CHEBI:26911],
[ethanol, CHEBI:16236],
[ethanol 30C, CHEBI:16236],
[methanol 40C, CHEBI:17790],
[methanol, CHEBI:17790]]
Now, this example shows a simple yet powerful feature of how RDF is used nowadays: the ChEBI identifier was not part of the original Solubility spreadsheet at Google Docs. But, taking advantage of the unique and resolvable URIs for molecules, when can simply look them up.

Nice, isn't it?

Update: the embedded gist did not show up nicely, so replaced it with a pre block.