Pages

Monday, June 29, 2015

#metsoc2015 Converting SMILES annotation into InChIKey annotation

One of the questions I had in the hackathon today is about how to use the CDK to convert SMILES string into InChIs and InChIKeys (see doi:10.1186/1758-2946-5-14). So, here goes. This is the Groovy variant, though you can access the CDK just as well in other programming languages (Python, Java, JavaScript). We'll use the binary jar for CDK 1.5.10.  We can then run code, say test.groovy, using the CDK with:

groovy -cp cdk-1.5.10.jar test.groovy

With that out of the way, let's look at the code. Let's assume we start with a text file with one SMILES string on each line, say test.smi, then we parse this file with:

new File("test.smi").eachLine { line ->
  mol = parser.parseSmiles(line)
}

This already parses the SMILES string into a chemical graph. If we pass this to the generator to create an InChIKey, we may get an error, so we do an extra check:

gen = factory.getInChIGenerator(mol)
if (gen.getReturnStatus() == INCHI_RET.OKAY) {
  println gen.inchiKey;
} else {
  println "# error: " + gen.message
}

If we combine these two bits, we get a full test.groovy program:

import org.openscience.cdk.silent.*
import org.openscience.cdk.smiles.*
import org.openscience.cdk.inchi.*
import net.sf.jniinchi.INCHI_RET

parser = new SmilesParser(
  SilentChemObjectBuilder.instance
)
factory = InChIGeneratorFactory.instance

new File("test.smi").eachLine { line ->
  mol = parser.parseSmiles(line)
  gen = factory.getInChIGenerator(mol)
  if (gen.getReturnStatus() == INCHI_RET.OKAY) {
    println gen.inchiKey;
  } else {
    println "# error: " + gen.message
  }
}

Update: John May suggested an update, which I quite like. If the result is not 100% okay, but the InChI library gave a warning, it still yields an InChIKey which we can output, along with the warning message. For this, replace the if-else statement with this code:

if (gen.returnStatus == INCHI_RET.OKAY) {
  println gen.inchiKey;
} else if (gen.returnStatus == INCHI_RET.WARNING) {
  println gen.inchiKey + " # warning: " + gen.message;
} else {
  println "# error: " + gen.message

}