Sunday, May 27, 2012

Finding where to put double bonds...

SMILES has a convenient feature to mark elements from the organic subset in lower case, indicating a particular hybridization state (aromaticity). The locations of double bonds are then not explicitly given, reflecting the delocalized nature of those systems:

However, there are many situations where you do like to know the position of those double bonds, or at least on solution of the set of possible combinations, such as:

Finding the positions of the double bonds is one of the core algorithms in cheminformatics. The CDK had a few algorithms for a long time, one looking at ring systems (DeduceBondSystemTool) and one tackling a more general problem (SaturationChecker); the first was recently found to be slow, caused by the use of the AllRingsFinder (which is slow because of the combinatorial set of ring combinations), and the second never really work that well, because it did not use the CDK atom type perception code.

Recently, Kevin and Klas set off in parallel to develop new implementations. Kevin focusing on improving the DeduceBondSystemTool, and Klas starting from the more general use case.

Kevin's new code was tested by Nina, and found to behave pretty well, with an error rate of well below 1%. Klas' code is still being developed, but I am very much looking forward to his code, as it is not limited to ring systems.

That said, Kevin's code has been merged into the cdk-1.4.x branch, and will be part of the next release, and is ready to be used now. The basic use is pretty simple when starting with SMILES:

  String smiles = "c2ccc3n([H])c1ccccc1c3(c2)";
  SmilesParser smilesParser = new SmilesParser(
  IMolecule molecule = smilesParser.parseSmiles(smiles);
  FixBondOrdersTool fbot = new FixBondOrdersTool();      
  molecule = fbot.kekuliseAromaticRings(molecule);

Thanx to Kevin for this great tool!

Friday, May 25, 2012

#harmony2012 hackathon: rapid prototyping (aka SBML support in Bioclipse)

This is what hackathons are about: sitting in the same room, hacking new stuff, to boldly go where no one has gone before. Who knows where you end up. This morning, I hacked up a SBML plugin for Bioclipse.  OK, it is not yet showing a nice diagram, but not bad for two hours of hacking.

Saturday, May 19, 2012

CDK 1.4.10: the changes, the authors, and the reviewers

Some six weeks after the 1.4.9 release, and not including the recent work by Kevin and Klas on improving (which you can test here), here's the next stable CDK release.

The first thing you'll notice is the long list of patches. This is primarily caused by the inclusion of the renderextra module from the CDK-JChemPaint patch, by Arvid, Stefan, Gilleain, Niels, and me. This part of the rendering engine brings in more rendering functionality. The controller code for editing is still excluded.
Otherwise, it is mostly bug fixing, but a few interesting ones indeed. For example, the long standing SMARTS parsing bug for the Sc element is fixed. Other bugs fixes are around the use of interfaces and one for parallel computing.

The changes
  • Any JUnit version is fine d71024e
  • Only draw + signs when there is more than one entity fe6807d
  • Avoid NPE when trying to calcualate diagram bound on an empty molecule set. 8cae7b2
  • Always needs to register the parameters from the generators to the renderer model fd4ec82
  • Use paint(drawVisitor,diagram) from ChemModelRenderer and not AbstractRenderer d947407
  • Refactored to make use of BoundsCalculator methods for calculating bounds. a613ca2
  • (renderextra). Pass the RendererModel to contained renderers. a3f2642
  • More descriptive variable names b55da37
  • Removed an unused parameter from the private API 4da3504
  • Added last bits of missing JavaDoc 56dcf57
  • Added missing JavaDoc or inherit it 07ccf7a
  • Made two more methods private that are not required by the interface 8899eaf
  • Removed an unused method, made a few methods private not required by the interface, and added a missing test method annotation; also extend the test class of the superclass of the tested class a72cba1
  • Removed an unused method and made another private 4c82b0b
  • Added missing TestMethod annotation 49043c9
  • Extend the test class for the super class of the class tested by this test b1b5364
  • Removed an unused method d34f8b6
  • Inherit JavaDoc where possible 8e7300f
  • Added missing @cdk.githash annotation 54f54ce
  • Added TestMethod annotation for the getParameter() methods 729e73e
  • Added testing in the abstract test class for general pattern that getParameters() should return a non-null List cab344a
  • Added the last three missing test classes 23f9842
  • Removed PrintVisitor f134813
  • Added two more missing dependencies: interface implementations. 2e1c7fa
  • Removed DistanceSearchVisitor be0c544
  • Missing dep for new renderextra tests f525381
  • A bunch of test classes with minimal testing b397579
  • Added missing *RendererTest classes, with basic tests for constructors 2ca2d5a
  • Another check if the WillDrawAtomNumbers param is registered. 2e352c3
  • Check if the WillDrawAtomNumbers param is registered 4dedcb0
  • Fixing PMD warnings: more descriptive variable names (renderextra). f37be8f
  • Uses RendererModel to get all GeneratorParameters 0e28dd3
  • Proper initialization at the primary renderer too. (renderextra). 1f729ca
  • Because the renderers are now typed itself, we can use this.generators instead, fixign a NullPointerException while rendering reactions 965c651
  • Implemented the generateDiagram and use that in the paint methods, which fixes reaction rendering as the ReactionRenderer.generateDiagram() uses MoleculeSetRenderer.generateDiagram() 2ffc5a1
  • Fixed NullPointerException by creating an empty list of generators, and return a list too, rather than a null d13b1a2
  • Fixed registering of rendering parameters (renderextra). 32ca540
  • Updated for API changes in CDK 1.3.6 (renderextra). 640e26a
  • Using IRenderer interface instead of implementation 4f0ac1d
  • More types in renderers, and associations between renderers 067e4dc
  • Renamed the Renderer to ChemModelRenderer, following the naming pattern for the other IRenderer implementations 743455f
  • Refactored into IRenderer with modules implementations for IAtomContainer (the existing AtomContainerRenderer), IMoleculeSet, IReaction, IReactionSet, and IChemModel (all for from the existing, overloaded Renderer) (renderextra). cfcfb28
  • Added null check to avoid NPE when no model is present 185f6f1
  • Mirror structures on Y-axis to compensate for different coordinate systems. 023fde0
  • Fixed some PMD warnings: longer variable names (renderextra). 8ca883f
  • Added rendering parameters for coloring the atom numbers by some scheme, e.g. by element type 4c8b255
  • Converted the offset from a constructor parameter into a proper IRenderingParamater (renderextra). d0f718a
  • Also return the parameters of the super class -a (renderextra). b132ebb
  • Renamed get/setRenderingParameter methods to shorter names (renderextra). 150e1ed
  • Removed scale and zoom form renderer and replaced with values form the generator parameters 8b97071
  • Fixed instantiation of rendering parameter fields, which resulted in NullPointerException (renderextra). 3d8b420
  • Changed wedge width to rendering parameter 63b253c
  • Removed IAtomContainerGenerator and IReactionGenerator in favor if using generics: IGenerator (renderextra). ec9a21e
  • Converted mappingColor and mappingLineWidth to the new rendering parameter API 506377f
  • Converted boundsColor to the new rendering parameter API f771866
  • Converted zoomFactor, scale, bondLength, and arrowHeadWidth to the new rendering parameter API bc7ce45
  • the RadicalGenerator can render atoms with more than one single electron. c59aa18
  • Extend IGenerator which now provides getParameters() e11b061
  • Updated for the IGenerator/IAtomContainerGenerator refactoring df620ea
  • Refactored reaction boxes into a IGeneratorParameter 841fb90
  • IReactionGenerator now has parameters too cb86521
  • Refactored foreground color into a IGeneratorParameter 242c165
  • Refactored showIm/ExplicitHydrogens as IGeneratorParameter 5b65078
  • Converted FontName and FontStyle into IGeneratorParameters 2dd9085
  • Converted margin into the RenderingParameter variant 5475abd
  • Updated for more IGeneratorParameter changes in BasicAtomGenerator 7eea476
  • Added ability to generate atom number at an offset. b8576d7
  • Moved the atom radius into the new rendering parameter API (renderextra module) 8bf0d4e
  • Set up a API to retrieve a IRenderingParameter from the RenderModel fde3785
  • Updated for the move of the rendering parameters for atom draw colors to the new IGeneratorParameter API 71daa52
  • Changed getParameters() to use IGeneratorParameter 1414fc3
  • Fixed the Renderer to implement IRenderer 1f30556
  • Updated for the ILoggingTool patch. 748343c
  • Added missing dependency on cdk-annotation, needed for TestClass 980548c
  • Updated for the new IGenerator.getParameters() interface method 4621407
  • Render extra module with rendering functionality beyond IAtomContainers. b3335de
  • Added reading of the single electron counts for the CDK atom types cf897ab
  • Two more missing test classes, and missing annotation in ArrowElement 403ac57
  • Added testing in the abstract test class for general pattern that getParameters() should return a non-null List b245c6f
  • Patch for @cdk.bug 3523247 174493c
  • Also the last parameter must be an IAtom 3d0b0e5
  • Fixes the code to actually test the second parameter, instead of the first one twice (fixes #3526870) 2a2aecc
  • Unit test: passing an incorrect second new Bond(IAtom, IAtom) should throw an IllegalArgumentException 4205ab5
  • Wrong amide N Java regex correction d7338ec
  • Remove static variable allPaths so that getAllPaths method can be used in a threaded environment. ea828a9
  • This commit solves the nightly OpenJavaDocCheck report at on line 82, adding a period at the end of the comment line. This commit is intended to go into cdk-1.4.x. 0fbe8cc
  • This commit solves the nightly OpenJavaDocCheck report at on line 82. This commit is intended to go into cdk-1.4.x. a19a59e
  • This commit solves the nightly OpenJavaDocCheck report at line 72. becb2c9
  • Backport patch for 'Updated TPSA descriptor to properly cast to IRing rather than different Ring implementations. Added unit test' d5948c7
  • Updated TPSA descriptor to properly cast to IRing rather than different Ring implementations. Added unit test 090afa9
  • Added code to make suggestions to the silent builder too 1e017a2
  • Added a unit test to see if we get suggestions 862d9d9
  • Patch to report candidate constructors when failing to instantiate a class because of an incorrect number of passed parameters. (fixes #2987186) 2de5acc
  • Upgraded to JavaCC 5.0 6b9c1bd
  • Added test case for bug 3513954, OOM in equivalent class partitioner 7477f99
  • Fixed the tests to have the right match counts (patch by Dazhi Jiao) 0aa84d7
  • Fixes recursion errors (patch by Dazhi Jiao) 92604d8
  • Updates with the SMARTS specs, among others to fix 'Sc' parsing (patch by Dazhi Jiao; fixes #2786624) 117377e
  • ForceFieldConfigurator checkForceFieldType correction 2c4f244
The authors

As commonly the case, my count is overrated. Also notable, is the strong underrating of Stefan here, who did a lot of coding on the CDK-JChemPaint branch in the past, which largely was lost in a needed rebase (which also applied to patches from Niels, Gilleain and Arvid).

77  Egon Willighagen
11  Arvid Berg
 6  Daniel Szisz
 3  Gilleain Torrance
 3  Rajarshi Guha
 1  Stefan Kuhn
 1  Yap Chun Wei

The reviewers

10  Rajarshi Guha 
 9  Egon Willighagen 
 1  Nina Jeliazkova
 1  Arvid Berg 

Sunday, May 06, 2012

Defining nitrogen and oxygen count descriptors

In reply to a question on the CDK user mailing list, I wrote up for my CDK API book how to define custom descriptors (nitrogenCount and oxygenCount) using the parameter approach:

The Java code is a bit more complex, requiring more Java language specific code. But the key is simply to use the the setParameters() method.

Saturday, May 05, 2012

Compiling R code, and speed up your computation

I just ran into this interesting post on the R-bloggers Planet. The described R functionality allows you to compile R code (to byte code) so that it will no longer be interpreted but actually run. That is a performance boost. I guess in due time we will see R use JIT technologies, so that the difference will disappear, but for now, a good thing to know about. Here are the numbers from that post:

> system.time( myFunction() )
   user  system elapsed 
 10.002   0.014  10.021 
> system.time( myCompiledFunction() )
   user  system elapsed 
  0.692   0.008   0.700 

Compiling a function seems pretty easy, and I will give that a try soon, but not today:

> library(compiler)
> myCompiledFunction <- cmpfun(myFunction)

Thanx to Måns at Uppsala University :)