Sunday, July 29, 2012

@cdk.githash (JavaDoc HTML links to the source code again)

The CDK customizes quite a few things in the build process. One aspect of that is custom JavaDoc tags, such as @cdk.githash (source of the Taglet). This tag replaced a similar tag for Subversion (@cdk.svnrev) and allowed a link to the matching source code for that class, for which we have not found another way to achieve that. This linking functionality was broken for a while, but is now fixed again:

The last line also shows the branch name now, instead of always master, thanx to GitHub's link-to-friendly URIs for Git repository content.

Additionally, not all classes have this tag yet, and I have created a Junior Job for that.

Tuesday, July 24, 2012

The CDK reaction API (and chemistry in LaTeX)

Because I had two people asking about it, I decided to write up some material on the CDK API for handling reactions. Here's a very brief preview:

The reaction equation, BTW, was created with the mhchem package for LaTeX, which comes with Debian with the texlive-science package, which I found thanx to this Writing Chemistry with LaTeX series.

Sunday, July 22, 2012

CACE: computer-aided code evaluation of CDK code

Computer-aided code evaluation (CACE) is an important part of scientific code development projects. There are many ways to do peer-review of source code (Maven, Gerrit, ...), and I won't go into details here. Instead, I focus on CDK's Nightly build system.

Nightly reports
Making sure the source code compiles is one of the most basic requirements. Given it compiles, we get a full report with a log of information:

The lead of the report contains useful links to a precompiled binary Java ARchive (jar file), a link to the latest git commit, source code, and the JavaDoc. Also very useful is the keyword list, which acts as an index to CDK functionality, using @cdk.keyword hashtags in the class JavaDoc.

Unit testing
Below the horizontal bar are the code evaluation reports. First, are the results for the unit tests (for which JUnit is used):

In the middle we get the JUnit test results for each module separately, and behind the 'Stable' link there is a summary, giving a quick glance at all modules:

Full reports are again available for individual reports, but we all get statistics per module on the number of unit tests run, the number of fails and errors, and the number of methods not tested.

JavaDoc quality
For JavaDoc we also run evaluations. For this, we use OpenJavaDocCheck for which too alternative solutions are available, I learned later. The front page section of Nightly looks like:

The summery is quite like that of the unit testing, and a single report for a module looks like this:

Many of these tests are general for JavaDoc, but we also have CDK-specific tests, such as shown below (along with the summary down the bottom of the page):

There is a lot of small code fixes for those who like to contribute to the CDK project, and like to learn git skills along the way.

Code evaluation
We use PMD for general code evaluation. which is most useful in computer-aided code evaluation. It often highlights the more interesting bits of code, and importantly, those code bits where errors may occur. Another set of tests involve tests for code readability, which is very important too, allowing your peers to review your code more efficiently. The Nighlty front page looks for PMD very much the same as for the other parts:

For example, we get warnings like these:

We here get reported about various things. For example, about short variable names, like 'st'. Really short variable names often make it harder to read the code, because they are less informative. Is 'ac' refering to the old or the new atom container?

We also get a warning about incorrect use of the StringBuffer.append() method, indicated where we can improve the code (making it faster in this case). We also see a CDK-specific test here (sources are here), warning us about a bad practice: interfaces should take data model interfaces, rather than implementations.

As will be clear, the Nightly reports provide a wealth of information helping code review. I hope this post has popularized this useful resource a bit more, and I invite you to visit it frequently. For example, it is a useful too to validate your own code before you send it for review. For the latter it is useful to know you do not have to install a full Nightly to do this. Mind you, for largest patch writing efforts, we can set up a Nightly crontab on a specific branch, as we have done frequently before.

But you can also run these code evaluations from the command line with:

ant clean dist-all test-dist-all jarTestdata
$ ant -Dmodule=io qa-module

This will run the JavaDoc, JUnit,  and PMD tests, and store the results in the reports/ subfolder.

Sunday, July 15, 2012

Groovy Cheminformatics 6th edition

I have uploaded a new revision of my Groovy Cheminformatics book, based on the CDK 1.4.11 and CDK-JChemPaint 26. Slowly I am becoming confident in uploaded PDFs to and perhaps the frequency of updates will increase. At least, I would love to have revisions of the book at least follow the stable releases, but the previous book version was already based on 1.4.7.

But there is interesting new content, and I am happy this version is out, so that I can now prepare a larger update to be released in September or so. This version has 176 pages, I think an increase of twelve, but the next release will pass 200 pages, making it reach my minimal page count for releasing on

So, the new content in this release:
I also fixed a number of small cosmetics irregularities. You can download a print copy and an eBook version (PDF).

Friday, July 06, 2012

CDK-JChemPaint #11: coloring selections

This is something I have been asked about many times. I had to find out myself, as I had no experience with this corner of the CDK rendering stack. In fact, I think there will be a second, follow-up post on that later, where I will explain I did it all wrong :)

Anyway, here is example code for how to mark a substructure. It a variations of the triazole examples I have given earlier. First thing is to add the proper generator:

// generators make the image elements
List<IGenerator> generators = new ArrayList<IGenerator>();
generators.add(new BasicSceneGenerator());
generators.add(new ExternalHighlightGenerator());
generators.add(new BasicBondGenerator());
generators.add(new BasicAtomGenerator());

And we then also configure things to, for example, make the selection halos a bit larger and make them red:

model = renderer.getRenderer2DModel();

Finally, we set the selection we like to color:

IAtomContainer selection = new AtomContainer();
for (int i=0; i<2; i++) {
  bond = triazole.getBond(i);

And the result then looks like this:

The full script can be downloaded here. A downside of this script is that the background of the symbol is not in the same color as the selection highlight. Also I do not think you can color multiple selection at the same time. But, I guess it is a start of an answer.

BTW, the new JChemPaint applet/application can be downloaded its new hangout at

Wednesday, July 04, 2012

Isbjørn #7: Linked Data

OK, the advantage of Linked Data is that it is Linked Data. So, when a link is made to, for example, side effects, as reported below by the Free University of Berlin (using SIDER, doi:10.1038/msb.2009.98), we do not just get a link to a new resource, but we can actually look up the label for that resource, and show that in the Isbjørn results instead of the URL:

Of course, we also do have the link, so notice the link icons behind the side effect names.

And because it's all using common standards (rdfs:label, dc:title, skos:prefLabel, skos:altLabel) it works for any database, thus my DBpedia support got upgraded too:

Isbjørn #6: DBPedia and Freebase support

Still on a tight schedule, and you must be getting tired of my updates, I'm still beefing up Isbjørn a bit more. First, I added DBPedia and FreeBase support, which means, it knows about the ontologies they use. But I also played with inline images and set the encoding so that the page not only looks nice in Bioclipse, but you can also email it and it will still look nice in Chrome and Firefox:

For FreeBase is looks similar. Note that I had to cut our spidering from that resource, as it links to each translated page back to DBPedia and DBPedia is not always very fast with responding but after I added an additional reading time out to the RDFManager, it seems to be working.

Isbjørn #5: extended Bio2RDF support

Wrapping up the first release of  Isbjørn I am adding further data extraction from the databases, such as for Bio2RDF (doi:10.1016/j.jbi.2008.03.004). Bio2RDF does not (yet) use standard ontologies, so I added support for their ontology:

Monday, July 02, 2012

New content in the Groovy Cheminformatics book


During the summer holidays I plan to extend my Groovy Cheminformatics book to reach 200 pages, but before that I plan to upload an updated PDF for CDK 1.4.11. This upcoming 6th edition will have a few new things, including the above section.

Posted via email from Egon's posterous