We all know the combinatorial explosion when calculating the number of possible constitutional isomers (see wp:structural isomorphism) of a certain molecular formula. For example, C2H6 has only one constitutional isomer (ethane, InChI=1/C2H6/c1-2/h1-2H3), and C4H10 has only two. Especially, breaking symmetry by replacing one carbon by another element, or replacing a single by a double bond, increases the number sharply. For example, C7H16 has only nine constitutional isomers, while replacing two single bonds by two double bonds, creating C7H10, increases this number to 499! Then, replacing in the last formula, one carbon by an oxygen adds another few, totaling 747 isomers.
Now, C8H8NBr has at least 649 thousand constitutional isomers, and I am quite interested in being able to know the number of isomers beforehand, without having to generate the structures itself (for example, using CDK's GENMDeterministicGenerator). InChI=1/C8H8BrN/c9-7-1-2-8-6(5-7)3-4-10-8/h1-2,5,10H,3-4H2 is one of the isomers.
So, my question: is anyone aware of free code (in order of preference: 1. LGPL, 2. BSD/MIT, 3. opensource, 4. free) to calculate or estimate the number of constitutional isomers for a certain molecular formula. An estimate would already be nice. Ideally, I would implement this bit of code into the CDK, but otherwise, just knowing the number of isomers for C8H8NBr would be nice :)
Additionally, any relevant, recent literature recommendations are most welcomed. I am aware of the use of polynomials, but literature I have seen so far just focuses on molecules of a certain architecture, and it not able to come up with a guess based on the molecular formula alone.
I just had MolGen have a go at it. The MolGen demo for Windows runs just fine on Linux using Wine. It counted 1.223.013 isomers for C8H8BrN! Realize that C10H22 only has 75 constitutional isomers :) That's the same number of heavy atoms! That's what I call combinatorial explosion :)
ReplyDeleteHow long did MolGen take (unde Wine) to calculate the 1.2 mio compounds?
ReplyDeleteTypically, it's very fast and comparable with our CDK stuff :-)
Cheers, Chris
The following reference might be useful, though I think it attempts to contruct the set of isomers (rather than just count)
ReplyDeleteAlgorithm for Exhaustive and Nonredundant Organic Stereoisomer Generation
Contreras, M. L.; Alvarez, J.; Guajardo, D.; Rozas, R.
J. Chem. Inf. Model; (Article); 2006; 46(6); 2288-2298. DOI: 10.1021/ci6002762
Christoph,
ReplyDeleteabout 2 minutes, so about 10k structures per second. But that is just counting, so can hardly be compared with the CDK generator. Would be nice to extend that to just count the isomers, and not to create CDK Molecules for them, as an option.
Rajarshi,
I spoke with Martin Ott (CMBI, ru.nl) yesterday, who did the same for up to C20H42. Is there code open source?
Hi Egon,
ReplyDeleteI have no answer to your question :-( But some remarks :-)
I would ask the people from the MOLGEN team.
However you probably mean structural isomers and not stereo-isomers?
The number of stereoismers can be different for each strucutural isomer.
A possible literature would be:
Combinatorial Enumeration in Chemistry
D. BABIC , D.J. KLEIN, J. VON KNOP AND N. TRINAJSTIC
http://www.rsc.org/images/CM003004_tcm18-15988.pdf
(get the whole filename with *.PDF)
1)
It is also important to mention that different atoms can have different valences.
N=3,5; P=3,5; S=2,4,6;
You can mimic that by using the formulae
C8H8BrZ1-3 - wich refers to Valence N=3 constructed isomers: 1,223,013
C8H8BrZ1-5 - wich refers to Valence N=3 constructed isomers: 3,645,017
Especially if you deal with nitrogen or sulfur and many halogen atoms these
mixed valences become important. No such approach has been
implemented into MOLGEN or CDK's GENMDeterministicGenerator.
2)
It is important to mention that there can be aromatic doublettes.
So the real number of structral isomers can be even lower than
1223013 for C8H8N1Br1 (valence = 3)
3)
MOLGEN demo counts the C8H8NBr isomers in 2 seconds on a 2.8 GHz Opteron.
Tobias
Tobias Kind
fiehnlab.ucdavis.edu
Hi,
ReplyDeleteI just found another list (no code) written by Markus Meringer (its contained in is extremely huge and interesting(!) to read dissertation) - it lists the number of all structural isomers for CHNO up to 150 u on page 315.
"Isomere nach Bruttoformel und Masse"
http://www.mathe2.uni-bayreuth.de/markus/pdf/pub/dis/MathModKombChemMolStrukt.pdf
Kind regards
Tobias
Tobias Kind
fiehnlab.ucdavis.edu
Tobias, thanx for the comments and references!
ReplyDeleteWish I had an answer
ReplyDeleteHi All,
ReplyDeleteThis might be to late, but recently in downloaded cdk-1.4.7.jar file. My aim is to find all the constitutional isomers for formula or smile string and i noticed that there is no GENDeterministicGenerator.java file in it. Can anyone help me out where is the latest and also the recent file for it? Or any alternative algorithm to find all the constitutional isomers?
Thanks.