Sunday, October 02, 2011

CDK's getNaturalExactMass(): isotopic and atomic weights

A recurrent question for the CDK project is the about the AtomContainerManipulator.getNaturalExactMass() and why it return NaN for some elements. There are various incarnations of the issues here, but the key here is the difference between the various weights a molecule can have.

Monoisotopic molecular weight
The weight of a single molecule is well-defined, but requires you to know which isotopes are present in the molecule, which you typically do not. Each isotope of an element has a natural abundance. For example, carbon isotopes found on earth are mostly 12C, and a bit of 13C. In fact, 12C is about 99 times more abundant than 13C. The same goes for 1H and 2H: the first is way more abundant than the latter. This means that the molecular weight of a single methane molecule can only be calculated which isotopes are present in the molecule. But, the most occurring combination, is the molecule with just the major isotopes, SMILES: [1H][12C]([1H])([1H])[1H]. A few elements have two isotopes which are almost equally abundant, such as bromine.

Natural molecular weight
However, in experimental chemistry we are not looking at individual molecules (well, we can nowadays), but typically at substances of those molecules. Substances contains very many single molecules (remember the Avogadro's constant). So, we have a mixture of molecules with different isotope ratios. As indicated earlier, the majority of those molecules contain only major isotopes, but as a substance we rather just take the natural mass of the molecular, which is an average weight for that element (called atomic weights), giving the natural abundances of the element's isotopes. Mind you, these atomic weights are not constant, as recently made the news. Only isotopic weights are constant, the atomic weights depend on the ratio of isotopes, which varies around the world.

But, returning to AtomContainerManipulator.getNaturalExactMass() method, it should be noted that atomic weights are also only defined for elements which in fact are found on earth; elements which are naturally found, and for which the isotopic abundances can be calculated. Thus, atomic weights are not defined for synthetic elements, which we have more and more.

So, if you call this getNaturalExactMass() method for a molecular structures which has one or more elements with that do not occur naturally, you will get an NaN answer. In fact, I guess the method should throw a CDKException with a message like "Hey dude, you have non-natural elements in this molecule!".

Getting practical: technetium
Technetium (Tc) is such an element which does not have any naturally occurring isotopes, so the CDK cannot calculate a atomic weight. And that is the answer to this bug report.