Pages

Sunday, September 26, 2010

Using PubChem to create CDK unit tests

In 2008 I posted about Wicked chemistry and unit testing and was using BeanShell at the time to convert a structure on PubChem into CDK source code. But since I rather use Groovy now, I have updated the code. I used CDK 1.3.6 and the PubChem XML format now.:
import org.openscience.cdk.Molecule;
import org.openscience.cdk.io.*;

if (args.length == 0 || args[0] == null) {
  System.out.println("Syntax: pc2ut.groovy [CID]\n");
  System.exit(0);
}

String cid = args[0];
String urlString =
  "http://pubchem.ncbi.nlm.nih.gov/summary/" +
  "summary.cgi?disopt=SaveXML&cid=" + cid;

URL url = new URL(urlString);

PCCompoundXMLReader reader =
  new PCCompoundXMLReader(url.openStream());
Molecule mol = reader.read(new Molecule());

StringWriter stringWriter = new StringWriter();
CDKSourceCodeWriter writer =
  new CDKSourceCodeWriter(stringWriter);
writer.write(mol);
writer.close();

System.out.print(stringWriter.toString());

Update An observant reader would have noticed that the output of the current CDKSourceCodeWriter is actually producing code that does not compile. The CDK API has changed, but the created output was not updated accordingly. Apparently, no one is actually using this class, or those who have were not interested in that piece of functionality to file a bug report.