Pages

Thursday, August 01, 2013

CTR #8: Unique SMARTS matches against a SMILES string

Of course, I had hardly numbered CTR #7 when I realized that I should solve the SMARTS matching CTR first. But because I had already numbered #7 I had to name this one #8. You know, for historic consistency and not meddling with your lab notebook.... life sucks.

Anyway, Rajarshi wrote a convenient SMARTSQueryTool for the CDK, which makes this CTR rather trivial. The hardest bit is the workaround for a limitation of the edge-based graph matching used by the CDK UniversalIsomorphismTester (cyclopropane and isobutane are indistinguishable at an edge level, but easily separated by matching atom count):

import org.openscience.cdk.interfaces.*;
import org.openscience.cdk.smiles.*;
import org.openscience.cdk.smiles.smarts.*;
import org.openscience.cdk.silent.SilentChemObjectBuilder;
 
SmilesParser sp = new SmilesParser(SilentChemObjectBuilder.getInstance());
atomContainer = sp.parseSmiles("C1CC12C3(C24CC4)CC3");
querytool = new SMARTSQueryTool("*1**1");
 
found = querytool.matches(atomContainer);
if (found) {
  mappings = querytool.getMatchingAtoms()
  hits = 0
  for (int i = 0; i < mappings.size(); i++) {
    atomIndices = mappings.get(i);
    if (atomIndices.size() == 3) {
      // work around the cyclopropane / isobutane equivalence
      hits++
    }
  }
  println "hits: $hits"
 
  mappings = querytool.getUniqueMatchingAtoms()
  uniqueHits = 0
  for (int i = 0; i < mappings.size(); i++) {
    atomIndices = mappings.get(i);
    if (atomIndices.size() == 3) {
      // work around the cyclopropane / isobutane equivalence
      uniqueHits++
    }
  }
  println "unique hits: $uniqueHits"
}

To see all solutions, check the full list of problems in my blog.