Friday, July 31, 2009

Maintaining the JChemPaint-Primary patch

Not so long ago, I finished porting the JChemPaint-Primary branch to be a patch on top of CDK master from our git repository. This means frequent rebasing, to incorporate the latest changes in the CDK master branch. Today, I did such a rebase, after the CDK 1.3.0 release. Hoping that at least some find this informative, this is what I did. Remember, that the patch is organized around the render and control modules, which is why we have so many branches, while merely in linear relationship.

$ # to download all new patches from origin to the master branch:
git pull origin master
# then rebase all patches in the desired order (which makes absolutely no numerical sense)
# 0, 1, 2, 9, 6, 7, 3, 8, 4, 5, 11, 10
git checkout 0-other
git rebase master
git checkout 1-render
git rebase 0-other
git checkout 2-renderbasic
git rebase 1-render
git checkout 9-rendercontrol
git rebase 2-renderbasic
git checkout 6-control
git rebase 9-rendercontrol
git checkout 7-controlbasic
git rebase 6-control
git checkout 3-renderextra
git rebase 7-controlbasic
git checkout 8-controlextra
git rebase 3-renderextra
git checkout 4-renderawt
git rebase 8-controlextra
git checkout 5-rendersvg
git rebase 4-renderawt
git checkout 11-controlawt
git rebase 5-rendersvg
git checkout 10-unsorted
git rebase 11-controlawt
This give me, again a clean patch against the latest CDK master:

Things to check before you consider submitting a (final) CDK patch #1

Mark the final in the above title; if you merely seek advice on your patch, feel free to send them in whatever state. However, if you bring up your patch for peer review, make sure to have gone through the following steps, in random order:
  • be prepared for peer review feedback
  • realize your code will have to be LGPL or LGPL-compatible
  • make sure the copyright lines are properly updated (see Making patches; Attribution; Copyright and License.)
  • your code is fully unit tested
  • your code does not cause PMD failures
  • your code is fully JavaDoc-umented
    • no empty templates
    • JavaDoc for every class field, method and class
    • use of {@link}
    • use of CDK tags @cdk.bug, @cdk.cite, etc
    • period at the end of the first sentence
    • ...
  • make sure all the code still compiles
  • make your code readable
    • 80 characters per line
    • variable names that reflect their purpose and nature
    • no code complexity errors with PMD
    • camelCasing as custom in Java
    • comment your code where appropriate, explaining what your code is supposed to do
    • ...
These are reasons to reject your patch, so better make sure to not have to be reminded of that. The build environment comes with some code to make these checks easier (though not the fixing). For example, say I introduced a new module uff (for the UFF force field):
$ cd cdk/
$ ant clean dist-all test-dist-all
$ ant -Dmodule=uff test-module
$ ant -f javadoc.xml -Dmodule=uff doccheck-module
$ ant -f pmd.xml -Dpmd.test=custom -Dmodule=uff test-module
$ ant -f pmd.xml -Dpmd.test=custom -Dmodule=test-uff test-module

New Blogs #11

Not that the last two weeks has seen a boost on blog submissions to Chemical blogspace; just that I was not really finished with New Blogs #10. Happy reading!

Monday, July 20, 2009

Updating my bioclipse.qsar fork with Ola's main branch

GitHub makes forking cheap, and I have a fork of the bioclipse.qsar repository (see Bioclipse moving to GitHub: CIA hooks enabled), so that I can easily share my patches with Ola for review. Ola can review them and apply them back into his main version.

I was wondering how I could bring my fork synchronized with Ola's version again, and found the answer in this guide on GitHub. It turns out all I have to do is, though this is locally:
$ git remote add olas git://
$ git pull olas master
This gets me into the following state:

This gitk output show that my local master branch is identical to Ola's master branch on GitHub, while both are three commits ahead of my current master branch at GitHub.

Right after this, I updated my fork at GitHub with a simple git push, resulting in this gitk output:

Friday, July 17, 2009

ELN vendor: "The Open Source stuff just works better"

Simon Coles is CTO of Amphora Research Systems (a company I do not know) and in the business of Electronic Lab Notebooks. I know nothing about their products but would like to propagate the statements he just made on Open Source in reply to a question on LinkedIn (btw, my LinkedIn account):
We use a lot of Open Source components in our products, and I know we’re not alone.
He also gives why they do so, quoting from the full arguments in his blog:
  • The Open Source stuff just works better
  • Support is better
  • Licensing issues go away
  • It is dramatically cheaper for our customers to deploy
  • We have much more latitude in deployment options
I am not sure if this involves open source cheminformatics, but asked about that... the whole article is worth reading.

Wednesday, July 15, 2009

Bioclipse moving to GitHub: CIA hooks enabled

Following the CDK and JChemPaint Primary, Bioclipe moved to Git just after the 2.0.0 release.

We decided to split up the repositories, and have one benevolent dictator, or dr. Who, for each repository who will maintain the plugins defined in the repository and coordinate development: Several plugins are still in the SVN world, but a good deal is now Git-ready. BTW, this move also adds several new accounts to watch on GitHub (see Rich' 17 GitHub accounts to watch on Cheminformatics).

GitHub turns out to be our big friend here, not SourceForge, which only supports one Git repository. GitHub recently must have added hooks recently, but I am really happy to see those. The above Bioclipse repositories have hooks turned on for CIA (so that commit messages end up on our #bioclipse IRC channel) and email (as indicated by the green color):

The splitting up, was rather interesting indeed. We wanted to keep the complete commit history, but still reduce the git repositories considerably. This means removing history of the plugins which should not end up in the repository. Git allows this! Git rules! This time, git filter-branch is our friend and there are basically two options: constructive and destructive. The first copied bit by bit plugins from the old to the new repository. The second one does the opposite, and removed bit by bit stuff you do not want. Depending on the ratio of plugins you want to keep and those you want to remove, either solution is more appropriate. I have summarized the git commands I used in detail on this Bioclipse wiki page.

Wednesday, July 08, 2009

JChemPaint-Primary moving to Git

I knew it was going to be painful, but making the jchempaint-primary branch a proper patch to the CDK master branch is painful. I am working my way towards setting up a git repository (IMPORTANT: these patches are not final yet, and their history will change, as I am rebasing regularly to make cleaner patches! Making copies is save, but please hold of any forking and/or branching on top of it until it is final. Thanx.) for the patch, with split ups of the various parts into reviewable blobs:

As you can see (when you click on the image to enlarge it), I have more or less finished the first drafts of the patch sets (see this wiki page) 0-other, 1-render, 2-renderbasic, 9-rendercontrol, and 6-control. The last one does not actually compile properly yet, as I need to abstract an IRenderer interface first.

There are several patch sets that I am still porting, but I hope to finish that this week, after which I'll continue working on the new IEdit framework in the controller modules recently set up by Arvid.

It will take some time before these patches actually get submitted for review, as there is quite some PMD, DocCheck and unit testing work to be done, as is clear from the Nightly running on the SVN branch.

Finally, I like to note that this git repository collapses a lot of work done by developers at both Uppsala University (Arvid, Ola and me) and the EBI (Gilleain, Stefan and now Mark). While the above git history will not reflect those contributions, you can recover this information from the copyright headers. I also like to thank Lars and Sam for their valuable testing!

Tuesday, July 07, 2009

Bioclipse-JChemPaint #2

Recently, I blogged about Bioclipse-JChemPaint of the imminent Bioclipse 2.0.0 release, a complete rewrite of the Bioclipse application as published in doi:10.1186/1471-2105-8-59. I also blogged about the feature to browse large MDL SF files (Bioclipse 2.1 will have support one or more CML conventions for chemical tables). Ola did some profiling on processing SD files, but also notes that such may be more suitable for the StructureDB

Browsing a large set of structures with there properties gives a quick overview of the data set. It also makes bugs shallow, such as the one shown below found when I was browsing the StarLite database:

The MDL molfile for structure 55 is available from the bug report I filed against Bioclipse.

Knowledge Management - Ontologies

Chemistry has a bit of background in ontologies, and ChemAxiom is certainly not the first (though I think it is rather promising...). Three years ago I gave a presentation at the CUBIC (now existing as LinkedIn Alumni Group), which is not so extensive, but does have a few interesting citations on the use of ontologies in chemistry on slide 16:

Thursday, July 02, 2009

Bioclipse for CDK Developers #2

I reported earlier how Bioclipse allows you to use a script to perceive atom types for the content of the JChemPaint RCP editor. This functionality is now available in the outline, and indicates directly if Bioclipse (and the underlying CDK) understands the chemistry you are drawing. In a future Bioclipse release, these problems will be visualized more prominently, likely using the Errors/Problems Views available from Eclipse, or otherwise.