Pages

Monday, January 16, 2012

CDK 1.5.0: the changes, the authors, and the reviewers

Yeah, I did it. I made the first new development release (1.5.0) for the CDK after the fork of the stable 1.4.x series. It had to happen after the removal of IMolecule and IMoleculeSet.  Well, in fact, while the list just lists all the patches specific for the current master branch, it is still fairly long. Then again, quite a few of my 'commits' are probably just merges.

Just to make clear, this is a development release, and until we freeze this branch, we expect, and actually intentionally add, API changes. Of course, we intent those to improve things, and please shout out your wishes.



First of all, this release removed IMolecule and IMoleculeSet. That was a big effort, explaining why Rajarshi and I have so many commits in this release. This release also adds the LINGO fingerprint type and a atomic signature-based fingerprint. It removes the nonotify module, as the silent module should be used instead. IChemObjectIO now extends Closeable, making it more Java7-friendly. Also noteworthy is the API for raw fingerprints, which are not of fixed length, but have key and counts. On the implementation side, the QueryAtomContainer now uses local, custom implementations of IAtom and IBond, making the module independent from the data module, freeing the rest of the library for more code clean up.


Mind you, this release shows an increased number of failing unit tests on Nightly.

The changes

  • This commit puts a single comment line for the molecule @param in placeAliphaticHeavyChain() method. 61377fb
  • Fixed JUnit version 41075f3
  • Upgraded PMD to 4.3 79e4532
  • Upgraded JUnit to 4.10 67a1f23
  • Updated the @TestMethod to match the unit test method renames that recently happened 0ad0fd0
  • Fixed comparison, to use the proper double assertEquals method 1ecc22b
  • Renamed the overriding methods to match the method rename of the methods in AbstractChemModelTest they are overriding. This is why one should use @Override. 1d34aac
  • Switched order of IRing and IAtomContainer matching, to fix construction if IRing classes 4741983
  • Removed IMolecule 2fc6b61
  • Replaced more IMolecule by IAtomContainer, fixing more unit test regressions 611aaca
  • A set of casting fixes to reduce the number of unit test regressions 2599bc2
  • Proper typing of the DefaultIteratingChemObjectReader, so that other classes can safely extend it (thanx to Nina) de67121
  • Removed some more usages of Molecule b84587f
  • Updated casting to remove usage of Molecule 04b1542
  • Typed the iterator, removing the need for casting when used 6e6cf8e
  • Two s/IMolecule/IAtomContainer/ patches to fix a few unit test regressions f6ed3d7
  • More Molecule use removed 0a04d3f
  • Update some return types to not use Molecule 1d8ee25
  • Removed field declarations of Molecule 88a2935
  • Removes instanceof and .class usage of Molecule 436dba5
  • Another batch of s/IMolecule/IAtomContainer/ 5da1d1b
  • Made SmilesParser independent from IMolecule 6bf4aa2
  • Got rid of debug and silent Molecule implementations and respective tests 9f05e5c
  • Refactored classes extendign Molecule to use AtomContainer 2ccbe95
  • Removed the last traces of the IMoleculeSet interface a8c03af
  • Got rid of MoleculeSet from the debug module 50aa11f
  • Got rid of MoleculeSet from the silent module 2250763
  • Updated IPolymer to extend IAtomContainer rather than IMolecule 5b7ad0f
  • cleaned up a bunch of MoleculeSet usages 150fee9
  • Updated various reaction classes to remove all usage of Molecule* 0a98a7f
  • Updated IReactionMechanism to use IAtomContainerSet. Also fixed some castings. 738eece
  • Convert IMolecule* to IAtomContainer* across many more classes c9847d2
  • Some IMolecule/IAtomContainer fixes, to get things to compile here c950448
  • Convert IMolecule* to IAtomContainer* across some charges and test classes f74a3ad
  • Removed the nonotify module (replaced by silent) 51af326
  • A bit more IMolecule removal d8d6eee
  • Fixed castings, to solve failing unit tests d273aa4
  • A quick fix. This class will soon be removed, so no peer review needed. 9e97f2d
  • Fixed a casting 3c1086a
  • Removed methods, to get the number of unit test fails down; this two classes will be remove completely, so skipping peer review 1bbfe5a
  • More modifications to use AtomContainer rather than Molecule f4647d8
  • Moved TetrahedralChirality from data to core. 46e3ddc
  • Replace IMolecule* with IAtomContainer* 27c6f7d
  • Updated io classes to use IAtomContainer* 68f2a65
  • Converted template handler to use atomcontainerset f0cf950
  • Refactored the setters from IReaction to use IAtomContainer* 04cbb29
  • Refactored the last getter from IReaction to use IAtomContainer* 91f8083
  • Made sure that we use IAtomContainerSet rather than casting back to Molecule* 104ecd1
  • Modified IReaction and related classes to migrate from IMoleculeSet to IAtomContainerSet 19b7673
  • Replaced IMolecule* with IAtomContainer* 7f6617d
  • Replaced IMolecule with IAtomContainer for some IReaction methods 3d3037f
  • Replaced use of addMolecule() with addAtomContainer() 2c1b31a
  • Replaced use of getMoleculeCount() with getAtomContainerCount() b27ee12
  • Replace use of IMoleculeSet by IAtomContainerSet in ConnectivityChecker.partitionIntoMolecules() 42861a9
  • Replaced IMoleculeSet with IAtomContainerSet, fixing a compile issue 2ac19f1
  • Patched bpol descriptor with cheminf ID a4db4a2
  • Make IChemObjectIO extend Closeable as suggested at http://sourceforge.net/tracker/?func=detail&atid=370024&aid=3437936&group_id=20024 a250cb3
  • Makes the smiles module independent from the data module a7920c1
  • More use of interfaces rather than implementations for the smiles module c3040e8
  • Removed non-existing dependency 3eaeaa7
  • Work with AtomContainers (fixes #2788763) c72d442
  • Allow IChemModels to contain IAtomContainerSets (addresses #2784940) 4cc5ff8
  • Replaces junit.framework.Assert from JUnit3 with org.junit.Assert from JUnit4 (fixing #2831081) 6ff1f78
  • Fix to permutation unit tests 45a7599
  • Split AtomContainerPermutorTest into one for the Atom and Bond permutors 40fe774
  • Unit tests for the Permutor class and fix to permutor. 88ceb38
  • ISBN number for the reference 217977c
  • Re-write of the atom container permutation classes. An extra class 'Permutor' is now the base class for permutations, and contains some more functionality than before. acd8e9b
  • Re-write of the atom container permutation classes 04adfca
  • Removed commented out code 16efbc0
  • Removed needless explicitly qualified vecmath types c1a69a4
  • Convenience method for getting an atom container directly from a formula string a1894f2
  • Handle a CDKException thrown by the Parser by throwing an IllegalArgumentException in the constructor face5c5
  • Removed hardcoded String use of InvPair constants in favor of their actual constants 4777958
  • Compile fix: get a builder from the input parameter 98f16f8
  • Removed dependency of io on data again df6ea2e
  • Corrected spelling from CanonicalLable to CanonicalLabel c0e59c8
  • Updated test case to note that SMARTS lexical errors now throw illegal argument exception dc60bc3
  • Added missing dependency fb4ba45
  • failing test for Tanimoto on "raw fingerprints" b42fbae
  • Implemented missing method, copied from Fingerprinter 2731d6f
  • Updated SMARTS parser JJT file, to drop the use of CDKException when a invalid SMARTS is provided. 1ef89fb
  • fix bug #3305550 - moved SMARTSQuery to where used dfb3e86
  • changed from no-op logger call to syserr 4c5beba
  • spelling ac6a9eb
  • Updated Copyright dates 51f2db8
  • Added another missing dependency 416a4cc
  • Added missing dependency 6fc4803
  • Split out unit tests only for fixed-length fingerprints 37e81e8
  • Added a signature-based fingerprinter 688da8b
  • MDL reader deals with query bond types, plus tests c95541f
  • QueryAtomContainer to allowe atoms and bonds 6a8f4c0
  • Another MOL file with query bond types b6bf3b5
  • CTFile query bond class b0227ba
  • MOL file with query bond types 3546ba4
  • Use DefaultChemObjectBuilder directly, not using getBuilder() method d0dc1ef
  • Added constructor for QueryChemObject to prevent nullpointers on flags property 6a75767
  • Cleaning up testing of the isomorphism module, e.g. classes in the proper test module bc1e3e6
  • Minor clean up: use interfaces, depend on Query implementaions, and now compiles without dependency on the data module 6913b9d
  • Reworked QueryAtomContainer to not depend on the data module either a5439dc
  • Use interfaces instead of implementations 076a870
  • More QueryBond 0e14c77
  • Extend QueryAtom instead of Atom 0ee08a6
  • Extend the new QueryChemObject b0a97a6
  • A shared IChemObject implementation for QueryAtom and QueryBond b98f650
  • Added constructors for QueryAtom 3708c8d
  • Cleaned up the code a bit 1654e94
  • Added an abstract QueryAtom class ae347ce
  • Match the IBond interface, as I get a compile error here otherwise 43cb829
  • Query bonds b7e8e3a
  • Placed the LingoFingerprinterTest in the rest module, and added missing dependency 16e456d
  • Fixed wrong module assignment 81db713
  • moved lingo fps to smiles module 1eb9590
  • Replaced the NotImplemented exception with UnsupportedOperation 6071796
  • Updated Javadocs 15be38b
  • Updated issues in Javadocs, copyright and method names dee515e
  • Added similarity method for Lingo's 133f314
  • Updated Tanimoto similarity to handle feature,count type fingerprints f487c30
  • Added implementation of LINGO 72bbae3
  • Added copyright and ref 9707f66
  • Added implementation of LINGO 572018f
  • Updated fingerprinter interface to support access to raw fingerprint of the form Map. Updated implementing classes b0f4050
  • Replaced the broken links with new ones (fixes #3108471) dc504db
The authors


89  Egon Willighagen
37  Rajarshi Guha
 9  Mark Rijnbeek
 8  Gilleain Torrance
 4  Jonathan Alvarsson
 1  Daniel Szisz
 1  John May
 1  Ola Spjuth
 1  Christoph Steinbeck

The reviewers

66 Rajarshi Guha 
30 Egon Willighagen
 4 Gilleain Torrance
 1 Jonathan Alvarsson
 1 Nina Jeliazkova

Sunday, January 15, 2012

Groovy Cheminformatics 4th edition

Six month was not quite the amount of time I anticipated between the third and fourth edition, but I finally managed to upload edition 1.4.7-0 of my Groovy Cheminformatics book. The first three editions sold 37 copies, including two for myself. Enough to feel supported and to continue working on it.

So, this new edition is again thicker, summing up to 152 pages now, which is 28 pages more than the 3rd edition. Indeed, the table of contents is more than half a page longer in itself, though, just barely, still fitting on four pages. In fact, I had to remove one (new) subsection title, because it would take otherwise two further pages.

The new content is again a mix of sections and chapters. While writing new chapters, I find myself realizing I need to cover more basics. Those get typically added as new sections. I did not get many feature requests, except for one email pointing me the text promised how to interpret and handle failing atom type perception, which explains one of the new sections. The full list of new content is:
  • Section 2.1.4: explaining the three flavors of atomic coordinates
  • Extended Section 2.2: added detail about electron counts of bonds (partly in reply to this post by Rich)
  • Chapter 5 "Protein and DNA": four pages, mostly about PDB files, and the matching CDK data structure
  • Chapter 6 "IChemObjectBuilders": four pages explaining the four alternative builders CDK 1.4.7 has
  • Section 7.8: a new section with recipes on how to post-process read input, discussing MDL molfiles only now. It talks about what information is present in the file format, and what steps must be untertaken to add missing information
  • Section 8.2.4 "No atom type perceived?!"
  • Section 11.4: describes how to depict aromatic rings
  • Section 11.5: describes how to change the background color of depictions
  • Section 13.4: explains how to calculate the Van der Waals volume of molecules
  • Section 18.1.3: discussing the API improvement in the iterating readers
  • Appendix C: a list of all descriptors provided by the CDK
  • Appendix D: a list of file formats known by the CDK, indicating which has readers and writers
On top of that, I improved other bits of the book too, such as the resolution of the depictions of molecules, as well as those of various diagrams. Also the number of scripts has seriously gone up, from 94 to 134!

Appendix C is a prelude to a chapter I am already writing, but did not get finished yet: a chapter about descriptor calculation. But since I just started a new post-doc position, it may take another six months for that chapter to make it into print.

The paperbak is available from Lulu.com, an on-demand publisher, as well as this ebook version.

Sunday, January 08, 2012

CDK-JChemPaint #10: background color

I found in the Groovy JChemPaint repository a script I had not blogged about yet, explaining how to change the default background color. It's fairly simple, and just uses parameters. Starting from the common pattern to set up a renderer, you set the background parameter:

backgroundColor = Color.lightGray;
model = renderer.getRenderer2DModel()
model.set(
  BasicSceneGenerator.BackgroundColor.class,
  backgroundColor
)

The full script can be found here. The resulting output looks like that given below.

Saturday, December 31, 2011

CDK 1.4.7: the changes, the authors, and the reviewers

In preparation of the next (4th) edition of my Groovy Cheminformatics book on cheminformatics with the CDK, I found a show stopper bug, fixed it, sent in the patch, and Rajarshi quickly reviewed and applied it to the cdk-1.4.x branch. This particularly bug was a null pointer exception that was fixed not so long ago in the log4j implementation, but turned out to be present in the logger to STDOUT too.

This releases also fixes the reading of aliased atoms in MDL V2000 molfiles, thanx to another bug fix patch from John May (thanx!), and formally deprecates the nonotify implementation, which has already been removed from the master branch. The silent module should be used instead, which has the same functionality but has cleaner code and faster.

However, one important change you should take notice of, is an API change in the IIteratingChemObjectReader class. The change is minor, but useful. The interface is now typed, and implementing classes implement IIteratingChemObjectReader<IChemModel> (IteratingPCSubstancesXMLReader) or IIteratingChemObjectReader<IAtomContainer> (IteratingMDLReader, IteratingPCCompoundASNReader, IteratingPCCompoundXMLReader, IteratingSMILESReader). This means, that this iterator's next() method now returns an IChemModel or an IAtomContainer, and that casting in the using code is no longer needed.

The changes
  • Another hot fix: use @link with the full qualified class name, and removed the import, to fix a dependency issue 0e71cba
  • Added a @deprecated tag on the nonotify data classes, pointing to the silent implementation d283686
  • Fixed dependencies 5ef20b1
  • Extend the abstract suite, so to run the test for the null pointer exception 269c84c
  • Work with the interface 106e5ec
  • Check for a null input fb35047
  • Removed unneeded deps on CMLXOM for JNI-InChI (thanx to Dmitry Katsubo). 8524891
  • Added missing imports of IAtomContainer, needed by the last two patches, but which were not needed in master because we did all that IMolecule/IAtomContainer refactoring already 856f83c
  • Proper typing of the DefaultIteratingChemObjectReader, so that other classes can safely extend it (thanx to Nina) 6de90d3
  • Typed the iterator, removing the need for casting when used 44b7e76
  • Added John May as author 1142dc6
  • Also check that there are two such R1 atoms 962b7d2
  • Added modifications and unit test for alias atom naming patch bd4b094
  • Corrected alias atom naming in MDLV2000Reader and added test 23132a0
The authors

13  Egon Willighagen
  2  John May

The reviewers

6  Rajarshi Guha
2  Egon Willighagen
1  Nina Jeliazkova

CDK 1.4.6: the changes, the authors, and the reviewers

OK, I forgot to write those up again :(

Release 1.4.6 of about a month ago, fixes a few bugs, including broken JavaDoc, atom type perception when SMILES are parsed while keeping the lower case formalism as aromaticity indicators (I will not discuss the pros and cons of that here), and the Chi index descriptors for sulphurs. This release also introduces a new fingerprint, based on an extensive list of biologically-relevant substructures identified by Klekota and Roth in 2008 (doi:10.1093/bioinformatics/btn479). This functionality was backported by Jonathan from the PaDeL software by Yap Chun Wei. The rest is a bunch of small code and dependency clean ups as well as new unit tests.

The changes
  • Added missing unit tests 9119aa2
  • Added get-methods for information needed for extensions 4525cbe
  • A few missing unit tests in the 'qsar' module 2356a10
  • Added further methods needed for CDK-JChemPaint 804a6f5
  • Added missing JavaDoc. e316210
  • No longer complain about missing testing for abstract classes 284ff84
  • Typo: there → their 640b6e6
  • Added unit testing 05216f0
  • Throw a descriptive exception when 2D coordinates are missing (fixes #3355921) cdc4cbd
  • Fixed the cheminf.bibx well-formedness (fixes #3435367) 7bc0772
  • Added missing @cdk.githash. fdd3d22
  • Updaetd chi index util to correctly evaluate deltav for sulphurs. Fixes bug 3434741. Added unit test 8225175
  • Use interfaces instead of implementations 434c9b1
  • Use interfaces instead of implementations 76dcdf7
  • Use interfaces instead of implementations 5bef796
  • Use interfaces instead of implementations b6ed6a7
  • Moved the pi-contact descriptor (atom-pair) to qsarmolecular, removing the depedency of qsar on reaction d902312
  • Added a missing dependency; it now finds PDBPolymer f03eb3c
  • Fixed test method names f6562cb
  • Added a missing test a853c91
  • Fixed TestClass annotation d37961d
  • Added tests for the isomorphism module to the proper suite 657c3a7
  • fixed dependency for fingerprint tests 135edeb
  • added test for getSubstructure d1eb951
  • lookup SMARTS at index in Substructurefingerprint 65602db
  • Wrote test for KlekotaRothFingerprinter 2b1288b
  • adapted to CDK 84aefixesd70
  • Import from the source code of PaDEL-descriptor (doi:10.1002/jcc.21707) 291def4
  • Fix to use interfaces as argument instead of classes 5e828f5
  • Perceive atom types also when aromaticity from the SMILES is kept 2e76ff6
  • Added unit test to make sure atom types are also perceived when aromaticity from SMILES is kept b315ee6
The authors

25  Egon Willighagen
  5  Jonathan Alvarsson
  2  Rajarshi Guha
  1  Nina Jeliazkova
  1  Yap Chun Wei

The reviewers

13  Rajarshi  Guha 
  8  Egon Willighagen 
  1  Nina Jeliazkova