Sunday, November 11, 2012

CDK 1.4.15: the changes, the authors, and the reviewers

At some point I had thought that I could finally concentrate on master. We have enough regressions there, of various kinds, some 40-50 unit tests that did pass properly in the past. Various core changes that increase the accuracy of our library have the nasty side effect that they uncover certain assumptions. But let's not talk about master yet, and focus on the 1.4.15 release (download here). Unlike I had hoped, a lot changed since the 1.4.14 release. On the bright side, CDK 1.4 is getting more and more reliable with every minor release.

I major addition in this release is that of a data model for double bond stereochemistry, making the CDK now handle to two most common forms of stereochemistry for small molecules. It must be stressed that not all IO classes are reading data into this data model yet. The interface looks like (the full JavaDoc is found here):

  public interface IDoubleBondStereochemistry
  extends IStereoElement {
    public enum Conformation {
        TOGETHER,  //  as in Z-but-2-ene
        OPPOSITE   //  as in E-but-2-ene
    public IBond[] getBonds();
    public IBond getStereoBond();
    public Conformation getStereo();

Other new functionality include an alternative aromaticity checker, which is happy to mark rings aromatic even if the ring has double bonds pointing outside the ring (e.g. benzoquinone). That means, we now how two algorithms in the CDK to perceive aromaticity.

Otherwise, there is a truck load of fixes. One really important one, is the fix that ensures that stereochemistry is also cloned(). Other fixing include minor atom typing work, including new selenium atom types, the further generalization of the IO accepts() methods, and a fix in the SDG code to not delete bridging hydrogens before doing structure clean up. There are many more small fixes and tunes, and as always, the full list is given below.

The changes
  • Fixed a bug present in many readers: it would not accept a subclass if ChemFile (e.g. NNChemFile) even if ChemFile itself was accepted bc30798
  • Fixed loading of the right class when reporting possible alternative constructors 420533e
  • [bug:1275] added check to ensure that when String.substring is called the string is long enough 5a7baa4
  • [bug:1274] added conditional to ensure that when multiple bond stereo is specified as attributes and characters only one is used. This is achieved by using the existing flag to determine if a stereo bond value has already been provided. 0e97eff
  • Updated Gilleain's code to hook in with the other two new selenium atom types 9fb2dd3
  • Missing Se.2 atom type and test case 21186c0
  • Added finally cause to ensure the file is closed a50c303
  • - added unit test to demonstrate the bug a6d3d6e
  • added unit test to demonstrate bug and correct bug id's for two recents tests 0ea90e8
  • Adding CMLReaderTest of io module test suite fefc82d
  • Resolved NoNotify fails on AtomParity. Error was due to subclassing of AtomParity. Also the assertEquals params were swapped as the assertion was the wrong way. 46ffbb4
  • Added unit tests for bugs 1270 (removalAllElements should remove stereo elements) and 1273 (double bond stereo chemistry constructor should throw an exception on wrong input). Added @cdk.bug tag for bug 1264 (stereo element cloning) 938306a
  • Used return covariance on clone() to provide cleaner front-end API c3d4af0
  • Added deep cloning of stereo element to atom containers and polymers (atom container subclass) a46545d
  • Added stereo element shallow copy 7a1f243
  • Added a 'map' method on all IStereoElements. The map method allows a stereo element on one container/molecule to mapped to a stereo element on another. This mapping is achieved using two symbol tables, one for atoms and one for bonds. All methods are null safe and the mapping will not fail if any content in the stereo elements is null. The mapping simplifies the cloning of molecules/atom containers but could also be used when comparing isomorphic graphs. 267ec79
  • Reworked AtomContainer.clone() so it is clear what is going on. We now use a HashMap between the original and cloned atoms to avoid to a linear search each time the atom mapping is needed. This is also useful when we add StereoElement cloning (not yet implemented). We also store a bond mapping as well - we will need the bond mapping for double bond stereo chemistry. The stereo elements in the clone need to be set to an empty array on the clone so we don't remove elements from the original (cloning odditity). It was also clear we need to change the clone() method on Polymer which currently undoes all the cloning work we do in the AtomContainer. For all clone instances I added some code to correctly create HashMaps that won't need to resized. The default HashMap implementation works best at 0.75 capacity - we therefore need to do some simple arithmetic to ensure we don't get a resize. The implemented method is what is used in the Guava library. f98b04a
  • Added unit tests for DoubleBond and AtomParity cloning a7e4255
  • Added ability to setStereoElements - this was required due to clone() being shall on List. We need to be able to set a new array when have cloned a AtomContainer 84f8f0e
  • Added unit test for tetrahedral chirality stereo element 83333d7
  • Fixed closing (fixes #1265) a9a27ed
  • Added cdk-silent dependency for test-renderextra 2afae5c
  • Added renderextra to dist-large and test-all targets 8178890
  • Removed redundant code from ChemObj clone - the existing code did exactly what the copy constructor of HashMap does and thus provides a cleaner implementation 8f7c0ab
  • Added removal of stereo elements in 'removeAllElements()' - documentation has been updated d1e8fa6
  • Added check to ensure a DoubleBondStereochemistry is never created with more then 2 bonds - this would cause errors with some methods. f8f98fe
  • Removed print to standard out from ChemObjectBuilders ab6c308
  • Removed redundant code - we don't need to check whether the bond is already in the container as we create a new instance. We also don't need to check the array size as this is done by addBond(IBond) c7786c9
  • Moved TetrahedralChirality from data to core. 0e41b05
  • Added unit test and @TestMehtod annotaitons for new 'isEmpty' mehtods aa0f969
  • add isEmpty() to classes/interfaces ChemModel, AtomContainerSet, ReactionSet, Crystal and AtomContainer f0c14fe
  • Be more informative when the test fails 8b8a848
  • Added missing test annotation bbf00f6
  • Documenting new method and extended unit test 74d206e
  • Removed SVN tags, as suggested by the reviewer 4e35feb
  • Removed cdk.create dates, as suggested by the reviewer 49b6541
  • Testing that benzoquinone is perceived as aromatic using the alternative detection method. 702d8ba
  • Because the placement of double bonds is not deterministic, we cannot be sure we always get them at the same location. Better is to just test that all carbons are perceived as C.sp2 and that they are aromatic. 77a778f
  • Added an alternative aromaticity perception model, which is happy about double bonds pointing outwards from aromatic rings b5bf695
  • Added the missing S.2minus atom type for selenide 367ff4f
  • There is no Se.2 atom type in the CDK; the perception seems to match Se atoms with two neighbors; I added two unit tests for the changed code, assuming one and two implicit hydrogens 48e6060
  • Added a missing import and dependency f400a5e
  • Added aromaticity-based perception: N.planar3 67c27bd
  • Added a null check and return immediately (fixes #1260) 0a981a4
  • Ignore this failing test case; it was one of the original points at which we decided new tools were needed 8adfd90
  • Added double bond stereo 5b644f9
  • Added a data model for double bond stereochemistry 76577b4
  • Added similar testing to reader and writers to fix four further unit tests for support of matching against some IChemObject interface class 47f35b9
  • Fixed the readers and writers to also accept the matching interfaces (fixes #3553780) bfc674a
  • Test that the reader and writers also "accept" the interfaces they support, see bug #3553780 f30f9a3
  • Added a unit test for the JCP bug report for the SDG about briding hydrogens 4d3db46
  • simplify by calling getConnectedBondsCount() 320a21d
  • only delete non-multibond H's; fixes JCP issue 8 d0d785c
  • Fixed the unit test, similar to commit d1da5276dae4a21a4c45d9fa41816be5eb646b4aa: the compound is aromatic. 81d7be7
  • Adds a unit test and fix for the loading of atom pair descriptors. a6ab39c
The authors

26  Egon Willighagen
25  John May
 3  Ralf Stephan
 2  Stephan Beisken

The reviewers

25  Egon Willighagen 
17  John May 
 2  Rajarshi  Guha