Pages

Sunday, December 17, 2006

Counting constitutional isomers from the molecular formula

We all know the combinatorial explosion when calculating the number of possible constitutional isomers (see wp:structural isomorphism) of a certain molecular formula. For example, C2H6 has only one constitutional isomer (ethane, InChI=1/C2H6/c1-2/h1-2H3), and C4H10 has only two. Especially, breaking symmetry by replacing one carbon by another element, or replacing a single by a double bond, increases the number sharply. For example, C7H16 has only nine constitutional isomers, while replacing two single bonds by two double bonds, creating C7H10, increases this number to 499! Then, replacing in the last formula, one carbon by an oxygen adds another few, totaling 747 isomers.

Now, C8H8NBr has at least 649 thousand constitutional isomers, and I am quite interested in being able to know the number of isomers beforehand, without having to generate the structures itself (for example, using CDK's GENMDeterministicGenerator). InChI=1/C8H8BrN/c9-7-1-2-8-6(5-7)3-4-10-8/h1-2,5,10H,3-4H2 is one of the isomers.

So, my question: is anyone aware of free code (in order of preference: 1. LGPL, 2. BSD/MIT, 3. opensource, 4. free) to calculate or estimate the number of constitutional isomers for a certain molecular formula. An estimate would already be nice. Ideally, I would implement this bit of code into the CDK, but otherwise, just knowing the number of isomers for C8H8NBr would be nice :)

Additionally, any relevant, recent literature recommendations are most welcomed. I am aware of the use of polynomials, but literature I have seen so far just focuses on molecules of a certain architecture, and it not able to come up with a guess based on the molecular formula alone.

10 comments:

  1. I just had MolGen have a go at it. The MolGen demo for Windows runs just fine on Linux using Wine. It counted 1.223.013 isomers for C8H8BrN! Realize that C10H22 only has 75 constitutional isomers :) That's the same number of heavy atoms! That's what I call combinatorial explosion :)

    ReplyDelete
  2. How long did MolGen take (unde Wine) to calculate the 1.2 mio compounds?
    Typically, it's very fast and comparable with our CDK stuff :-)
    Cheers, Chris

    ReplyDelete
  3. The following reference might be useful, though I think it attempts to contruct the set of isomers (rather than just count)

    Algorithm for Exhaustive and Nonredundant Organic Stereoisomer Generation
    Contreras, M. L.; Alvarez, J.; Guajardo, D.; Rozas, R.
    J. Chem. Inf. Model; (Article); 2006; 46(6); 2288-2298. DOI: 10.1021/ci6002762

    ReplyDelete
  4. Christoph,

    about 2 minutes, so about 10k structures per second. But that is just counting, so can hardly be compared with the CDK generator. Would be nice to extend that to just count the isomers, and not to create CDK Molecules for them, as an option.

    Rajarshi,

    I spoke with Martin Ott (CMBI, ru.nl) yesterday, who did the same for up to C20H42. Is there code open source?

    ReplyDelete
  5. Hi Egon,

    I have no answer to your question :-( But some remarks :-)

    I would ask the people from the MOLGEN team.
    However you probably mean structural isomers and not stereo-isomers?
    The number of stereoismers can be different for each strucutural isomer.

    A possible literature would be:
    Combinatorial Enumeration in Chemistry
    D. BABIC , D.J. KLEIN, J. VON KNOP AND N. TRINAJSTIC

    http://www.rsc.org/images/CM003004_tcm18-15988.pdf
    (get the whole filename with *.PDF)

    1)
    It is also important to mention that different atoms can have different valences.
    N=3,5; P=3,5; S=2,4,6;

    You can mimic that by using the formulae
    C8H8BrZ1-3 - wich refers to Valence N=3 constructed isomers: 1,223,013
    C8H8BrZ1-5 - wich refers to Valence N=3 constructed isomers: 3,645,017

    Especially if you deal with nitrogen or sulfur and many halogen atoms these
    mixed valences become important. No such approach has been
    implemented into MOLGEN or CDK's GENMDeterministicGenerator.

    2)
    It is important to mention that there can be aromatic doublettes.
    So the real number of structral isomers can be even lower than
    1223013 for C8H8N1Br1 (valence = 3)

    3)
    MOLGEN demo counts the C8H8NBr isomers in 2 seconds on a 2.8 GHz Opteron.

    Tobias

    Tobias Kind
    fiehnlab.ucdavis.edu

    ReplyDelete
  6. Hi,
    I just found another list (no code) written by Markus Meringer (its contained in is extremely huge and interesting(!) to read dissertation) - it lists the number of all structural isomers for CHNO up to 150 u on page 315.
    "Isomere nach Bruttoformel und Masse"
    http://www.mathe2.uni-bayreuth.de/markus/pdf/pub/dis/MathModKombChemMolStrukt.pdf

    Kind regards
    Tobias

    Tobias Kind
    fiehnlab.ucdavis.edu

    ReplyDelete
  7. Tobias, thanx for the comments and references!

    ReplyDelete
  8. Hi All,

    This might be to late, but recently in downloaded cdk-1.4.7.jar file. My aim is to find all the constitutional isomers for formula or smile string and i noticed that there is no GENDeterministicGenerator.java file in it. Can anyone help me out where is the latest and also the recent file for it? Or any alternative algorithm to find all the constitutional isomers?

    Thanks.

    ReplyDelete
    Replies
    1. Hi Anonymous, Tobias (one of the commenters) found a few problems in the generator, which we tried to fix, but were unable to. In the end we removed the code.

      Delete