Wednesday, February 11, 2009

DBPedia: lookup and autocomplete of chemistry

On the DBPedia discussion mailing list there was a post on a nice web page which allows you to look up things, and which features a autocomplete edit field. The below screenshot show lookup of molecular structures:

If you are not ware of this, adding content to DBPedia is as easy as adding something to WikiPedia. Literally: DBPedia is the RDF flavour of WikiPedia. It extracts the information from the info boxes, as I discussed before (see Molecules in Wikipedia).

BTW, one can take advantage of DBPedia to see what WikiPedia has to offer in terms of chemistry. For example, to list all molecules which have a SMILES, one can use this simple SPARQL query:
Or, to list those which have an InChI:
And this is actually quite useful, e.g. it can be used in quality control. Running the above queries will show up several broken SMILES and InChIs. I have not had time to fix those yet, so please go ahead and beat me to those fixes, and get some WikiPedia Fame :) Alternatively, invert the queries and add missing InChIs, PubChem CID or SMILES. When I have a bit more free time again, after the new stable CDK and Bioclipse releases, I'll runs these analyses again, and summarize them in a web page.