Saturday, March 30, 2013

Dr Evan Bolton to speak at Maastricht University: "PubChem: A platform for chemical biology"

Mon 13 May Dr Evan Bolton of PubChem will visit our BiGCaT research group and will give a presentation on the PubChem small compound-bioassay database at 11:00am .

It is very much appreciated if you returned me an email confirming your attendance (firstname dot lastname @ maastrichtuniveristy dot nl).

PubChem: A platform for chemical biology

PubChem is an open archive for small molecules and their biological activities located at the National Center for Biotechnology Information (NCBI).  Despite humble beginnings, PubChem continues to receive broad community support through a continued influx of new information and new information resource types. Over the past eight years, PubChem has seen dramatic growth in participation by the chemical biology community, in terms of contributors (averaging 25% year-over-year growth) and users (averaging 20% year-over-year growth).  PubChem contents include more than 45 million small molecules (+1.8 million with biological testing results), 115 million substance descriptions, 640 thousand biological assay descriptions (against +5800 declared unique protein sequence targets), and over 200 million assay outcomes (a substance tested in an assay is an outcome) from more than 200 contributors.  PubChem provides a number of tools to help navigate this vast corpus of information.  In addition, PubChem integrates contents to other resource of chemical biology interest.  As the needs of the community have changed, so too has PubChem adapted.  This talk give an overview of the PubChem resource as it exists today with an emphasis on recently introduced features.

Saturday, March 23, 2013

CDK 1.4.17: the changes, the authors, and the reviewers

Hot on the heels of the CDK 1.4.16 changelog, here are the changes in CDK 1.4.17. The primary purpose of this release is to fix a regression in 1.4.16 for sorting IAtomContainers, which turned out a bit nasty.

Readers who are still using 1.4.16 or earlier are strongly encouraged to upgrade to this release.

The Changes
  • Identical copies of the same method in AbstractAtomContainerSet, but using IMolecule, solving an exception complaining about IMoleculeSet to contain only IMolecule [only for cdk-1.4.x] 67bed79
  • Added a unit test for the sorting of the multiplier ff2d820
  • updated sort to also sort multiplier values - bug:1291 d48d30d
  • renamed variable 'minimum' and added explanation as to why it is used. ececcc6
  • replacing full array sort with a range sort 1577ea7
  • Fixed null condition to ensure that null values get pushed back in the array. 76dedb5
  • unit tests ensure broken comparators don't put null to the start of the set and than an empty set is never sorted 5b6a2d9
The Authors

4  Egon Willighagen
4  John May
1  Stephan Beisken

Note that I always have a few patches based on doing the actual release, like setting proper version numbers. Thus, this release is really courtesy to the EBI developers.

The Reviewers

4  Egon Willighagen 
2  John May 

CDK 1.4.16: the changes, the authors, and the reviewers

Yes, quite overdue, but here's the changelog of CDK 1.4.16. This release is about bug fixing. The first CDK 1.4 release is long behind us, and the CDK project is working hard on stabilizing the 'master' branch for the first beta releases. Because this release introduced a nasty bug, you are not supposed to use this version, but the newer 1.4.17 version instead!

The list is quite long, and includes a mix of fixes (also of unit tests), improvements, and better documentation. Things to look out for include a distribution that includes the unit tests, important fixes in comparators (affecting canonical SMILES generation), used in sorting (mind the note above, and use 1.4.17!), and a fix in the output length the ProtonTotalPartialChargeDescriptor descriptor. As always, you are recommended to upgrade, but skip this version, and go immediately for 1.4.17.

The Changes
  • Now also has a task to create a source distrib that includes the unit tests (per user request) 26384a5
  • Expanded explanation of Sigma Electronegativity Descriptor 3f02b74
  • Added missing literature reference, and minimal additional information requested in bug #1285 4cf0f31
  • Correct unit test name and added missing unit tests and annotations. 601d0af
  • Added unit test and annotation for moment generation (3D Similarity) 33c99d1
  • Added explicit atom typing and aromaticity detection to several failing fingerprinter tests. ff8d374
  • Changed saturation conditional to also check that a bond is not aromatic. f37cfa7
  • Throw an error when a non-ring is attempted to be closed 9c4bd5e
  • C1C1 is an invalid SMILES; expect an thrown error fc63102
  • Properly pad the resulting descriptor value list to MAX_PROTON_COUNT, fixing two failing unit tests 7627dhttp://1.4.1784
  • Small fixes: remove STDOUT use and proper int-based assertEquals() method 7a81ea3
  • Typed the input class and added support for IChemFile.class input 51cf991
  • Added a missing dependency on the test data 76ade35
  • Added a missing dependency 2e6fc94
  • Properly implemented for data and silent 09868f4
  • And the Groovy deps for the CDKSourceCodeWriterTest 2e62bde
  • Also updated the Eclipse .classpath 3f88256
  • Also updated the other two PMD config files for 5.0.1 fa99bc8
  • Updated PMD to version 5.0.1 44572c6
  • Updated comparator on AtomPlacer. The comparator now uses the built in Integer comparator. To avoid null pointer exceptions the access to the weight value is wrapped in a private method which return the minimum integer value when the weight is not set (i.e. null). 0d658df
  • Corrected null handling on AtomContainer 2D centre comparison. Previous if either container was null the comparator would incorrectly return '0' indicating they were equal. The method was changed to always provide a minimum Point2d when an atom container is null. This will sort all null containers lower then non-null containers. ed399df
  • Replaced subtraction based comparator. This comparator is unlikely to overflow but the safety of using the equalities ensures the proper behaviour of this comparator. e5b1bc6
  • Replaced subtraction based comparison. When the difference (which is a double) is converted to an integer the result may overflow. This overflow can occur in rare cases but would cause the comparator to be non-transitive. 425c795
  • OK, simple fix: reset the test graph before each test method call 40536af
  • Make sure the tests are also run with JUnit4 6e8533a
  • Corrected canonical label sorting in the SmilesGenerator. This canonical labels are long and thus when the difference between two values is taken the result may be larger then the largest possible integer. This in rare cases cause the value to overflow and thus make the comparator non-transitive. 704280a
  • corrected handling of null value in the TreeNodeComparator. Previous if any node or atom were null the comparator would return '0' which is incorrect. If one is null and the other is not the objects are not equal. d7a55d3
  • Generified comparators, removing redundant casts 5a5db14
  • Corrected comparator in InvPair sorting. The current comparison was in violation of the comparable contract which was throwing an IllegalArgumentException on JRE 7. 2f92c91
  • A few unit tests uncovering some issues with == comparisons 46d48f9
  • Added accessor for the bond length in StructureDiagramGenerator 75b5fe6
  • Updated the @cdk.bugs taglet to support the new SourceForge project pages fbebbe0
  • Replaced IBond.Order.ordinal() usages with IBond.Order.numeric() 427fac6
  • Added a field to the bond order enumeration allow access to the numeric value for that order. 8c34bfe
  • unit test 7d5ce83
  • unit test 5bc24fa
  • unit test for bug 1269 (atom... not placed by SDG, causing NPE), remove catch statements, set expected e3c3590
The Authors

21  Egon Willighagen
14  John May
 4  Ralf Stephan

The Reviewers

12  Egon Willighagen 
12  John May 
 1  Gilleain Torrance

Tuesday, March 19, 2013

New Paper: "Applications of the InChI in cheminformatics with the CDK and Bioclipse"

Last week, Ola, Sam Adams, Arvid, and I published a paper (doi:10.1186/1758-2946-5-14) on the InChI functionality in the Bioclipse, which uses Sam's JNI-InChI and the Chemistry Development Kit underneath.

This paper partly describes the earlier work by Sam on JNI-InChI itself and the integration into the CDK, but also includes the recent support for CDK's IStereoElement, OSGi bundles for JNI-InChI by Arvid, and a few new applications in Bioclipse.

These applications demo what you can do with the InChI in Bioclipse. Obviously, this involves creating InChIs for any structure drawn in Bioclipse (that is old). New is that the manager now also support creating InChIs with particular layers. For example, with fixed hydrogens:

mol = cdk.fromSMILES("OC=O")
sinchi = inchi.generate(mol);
inchi = inchi.generate(mol), "FixedH");

But the more interesting bits are next. For example, the InChI is ideal for look up, and can be used in decision support with knowledge bases.

But as Christopher Southan showed in his "InChI in the wild: an assessment of InChIKey searching in Google" paper (doi:10.1186/1758-2946-5-10), the InChI is good for finding useful information on the web. I have taken a different approach with Isbjørn, which does not use Google, but Linked Data approaches to find information on the web. This semantic search is seeded with the InChI.

The third examples exposes work done by Mark Rijnbeek, formerly in the group of Christoph Steinbeck, who implemented a method that uses the InChI library for tautomer generation for the CDK. This functionality is now exposed in Bioclipse too. Obviously, this functionality is limited by those of the InChI library to generate those tautomers. But if you like to try it, you can do this with:

// no aromatic rings that make it hard to
// see where the double bonds are

inputSMILES = "c1ccccc1O";
inputName = "phenol";
tautomers = cdk.getTautomers(

file = "/Virtual/" + inputName + ".sdf";
cdk.saveSDFile(file, tautomers);;

Details on how to try all this in practice can be found on this page. And I am looking forward to hearing what you think of it, how you like to use it or are using it. If you like to extend it, the source code is on GitHub.
Spjuth, O.;  Berg, A.;  Adams, S.;  Willighagen, E. Journal of Cheminformatics 2013, 5, 14+.

Tuesday, March 05, 2013

Source Code documentation: what JavaDoc should look like, and why.

Over the years, I have blogged quite a bit about JavaDoc. JavaDoc is the system commonly used to annotate Java source code for human oriented documentation, and complement source code comments. I will not attempt to summarize what I wrote in the past, but will give some links. The primary user of the JavaDoc is, of course, the user:
And we take effort in producing good JavaDoc (despite it being the unpaid, least favorite task):
And we use tools here to support us:
And using JavaDoc taglets and doclets, we enrich the regular JavaDoc with more detailed information:
The CDK uses the JavaDoc not just for documentation, but also for compiler instructions: