Tuesday, March 19, 2013

New Paper: "Applications of the InChI in cheminformatics with the CDK and Bioclipse"

Last week, Ola, Sam Adams, Arvid, and I published a paper (doi:10.1186/1758-2946-5-14) on the InChI functionality in the Bioclipse, which uses Sam's JNI-InChI and the Chemistry Development Kit underneath.

This paper partly describes the earlier work by Sam on JNI-InChI itself and the integration into the CDK, but also includes the recent support for CDK's IStereoElement, OSGi bundles for JNI-InChI by Arvid, and a few new applications in Bioclipse.

These applications demo what you can do with the InChI in Bioclipse. Obviously, this involves creating InChIs for any structure drawn in Bioclipse (that is old). New is that the manager now also support creating InChIs with particular layers. For example, with fixed hydrogens:

mol = cdk.fromSMILES("OC=O")
sinchi = inchi.generate(mol);
inchi = inchi.generate(mol), "FixedH");

But the more interesting bits are next. For example, the InChI is ideal for look up, and can be used in decision support with knowledge bases.

But as Christopher Southan showed in his "InChI in the wild: an assessment of InChIKey searching in Google" paper (doi:10.1186/1758-2946-5-10), the InChI is good for finding useful information on the web. I have taken a different approach with Isbjørn, which does not use Google, but Linked Data approaches to find information on the web. This semantic search is seeded with the InChI.

The third examples exposes work done by Mark Rijnbeek, formerly in the group of Christoph Steinbeck, who implemented a method that uses the InChI library for tautomer generation for the CDK. This functionality is now exposed in Bioclipse too. Obviously, this functionality is limited by those of the InChI library to generate those tautomers. But if you like to try it, you can do this with:

// no aromatic rings that make it hard to
// see where the double bonds are

inputSMILES = "c1ccccc1O";
inputName = "phenol";
tautomers = cdk.getTautomers(

file = "/Virtual/" + inputName + ".sdf";
cdk.saveSDFile(file, tautomers);;

Details on how to try all this in practice can be found on this page. And I am looking forward to hearing what you think of it, how you like to use it or are using it. If you like to extend it, the source code is on GitHub.
Spjuth, O.;  Berg, A.;  Adams, S.;  Willighagen, E. Journal of Cheminformatics 2013, 5, 14+.


  1. Just wondering, which release of CDK was the first to provide some InChI support? In the article you mention version 1.4.13 but I'm pretty sure it's been around a lot longer than that, right?

    1. Noel, not sure about the version, but it was introduced in August 2006:

      commit b3d9e8340fd6f2276c1cf5076cdb8c740294c247
      Author: sea36
      Date: Thu Aug 31 15:00:55 2006 +0000

      Added glue to JNI-InChI, allowing generation of InChIs from IAtomContainers, and IAtomContainers from InChIs.

      git-svn-id: eb4e18e3-b210-0410-a6ab-dec725e4b171

    2. Thanks. I'm putting together a simple timeframe for InChI. In general I'm using release dates so I'll see if I can find the next release after this date that has the code. Otherwise I'll just put this in as a date for hitting the devel code.

    3. This comment has been removed by the author.

    4. OK, the oldest release I can find right now is 0.99.1:

      From 2007-02-09

      (which suggests there has been a 0.99 too...)

  2. The initial JNI-InChI was Aug 2006, so that seems about right.