Thursday, August 01, 2013

CTR #8: Unique SMARTS matches against a SMILES string

Of course, I had hardly numbered CTR #7 when I realized that I should solve the SMARTS matching CTR first. But because I had already numbered #7 I had to name this one #8. You know, for historic consistency and not meddling with your lab notebook.... life sucks.

Anyway, Rajarshi wrote a convenient SMARTSQueryTool for the CDK, which makes this CTR rather trivial. The hardest bit is the workaround for a limitation of the edge-based graph matching used by the CDK UniversalIsomorphismTester (cyclopropane and isobutane are indistinguishable at an edge level, but easily separated by matching atom count):

import org.openscience.cdk.interfaces.*;
import org.openscience.cdk.smiles.*;
import org.openscience.cdk.smiles.smarts.*;
import org.openscience.cdk.silent.SilentChemObjectBuilder;

SmilesParser sp = new SmilesParser(SilentChemObjectBuilder.getInstance());
atomContainer = sp.parseSmiles("C1CC12C3(C24CC4)CC3");
querytool = new SMARTSQueryTool("*1**1");

found = querytool.matches(atomContainer);
if (found) {
mappings = querytool.getMatchingAtoms()
hits = 0
for (int i = 0; i < mappings.size(); i++) {
atomIndices = mappings.get(i);
if (atomIndices.size() == 3) {
// work around the cyclopropane / isobutane equivalence
hits++
}
}
println "hits: $hits" mappings = querytool.getUniqueMatchingAtoms() uniqueHits = 0 for (int i = 0; i < mappings.size(); i++) { atomIndices = mappings.get(i); if (atomIndices.size() == 3) { // work around the cyclopropane / isobutane equivalence uniqueHits++ } } println "unique hits:$uniqueHits"
}


To see all solutions, check the full list of problems in my blog.