Tuesday, March 19, 2013

New Paper: "Applications of the InChI in cheminformatics with the CDK and Bioclipse"

Last week, Ola, Sam Adams, Arvid, and I published a paper (doi:10.1186/1758-2946-5-14) on the InChI functionality in the Bioclipse, which uses Sam's JNI-InChI and the Chemistry Development Kit underneath.

This paper partly describes the earlier work by Sam on JNI-InChI itself and the integration into the CDK, but also includes the recent support for CDK's IStereoElement, OSGi bundles for JNI-InChI by Arvid, and a few new applications in Bioclipse.

These applications demo what you can do with the InChI in Bioclipse. Obviously, this involves creating InChIs for any structure drawn in Bioclipse (that is old). New is that the manager now also support creating InChIs with particular layers. For example, with fixed hydrogens:

mol = cdk.fromSMILES("OC=O")
sinchi = inchi.generate(mol);
inchi = inchi.generate(mol), "FixedH");

But the more interesting bits are next. For example, the InChI is ideal for look up, and can be used in decision support with knowledge bases.

But as Christopher Southan showed in his "InChI in the wild: an assessment of InChIKey searching in Google" paper (doi:10.1186/1758-2946-5-10), the InChI is good for finding useful information on the web. I have taken a different approach with Isbjørn, which does not use Google, but Linked Data approaches to find information on the web. This semantic search is seeded with the InChI.

The third examples exposes work done by Mark Rijnbeek, formerly in the group of Christoph Steinbeck, who implemented a method that uses the InChI library for tautomer generation for the CDK. This functionality is now exposed in Bioclipse too. Obviously, this functionality is limited by those of the InChI library to generate those tautomers. But if you like to try it, you can do this with:

// no aromatic rings that make it hard to
// see where the double bonds are

inputSMILES = "c1ccccc1O";
inputName = "phenol";
tautomers = cdk.getTautomers(

file = "/Virtual/" + inputName + ".sdf";
cdk.saveSDFile(file, tautomers);;

Details on how to try all this in practice can be found on this page. And I am looking forward to hearing what you think of it, how you like to use it or are using it. If you like to extend it, the source code is on GitHub.
Spjuth, O.;  Berg, A.;  Adams, S.;  Willighagen, E. Journal of Cheminformatics 2013, 5, 14+.