Tuesday, September 30, 2008

Git mirror for the CDK

While slowly merging with Sweden, and ADSL which should reach my house in some two weeks, I am enjoying my new office space and Git to upload patches to the CDK. Christoph wondered if we should switch CDK from SVN to Git. A few developers objected, for various reasons: no native Windows clients (though msysgit might be the solution), no (stable) plugins for Eclipse, IDEA(?), etc.

I made the switch, and really happy about it.

Anyway, one issue for me not to switch the full CDK project would be to have a central place where we could host our Git repository. Now, GitHub does just that, and after inquiring with them about the 100MB limit, Tom emailed me:
    Hi Egon,

    We'd love to have your open source project on GitHub. The 100MB is currently a soft limit, so you won't have any problems uploading a larger repo. We hope you enjoy GitHub!

    Tom Preston-Werner
So, I created an account (I'm happy there are so few Egon's in the world :), and uploaded the CDK 1.2 branch, which, for now at least, will serve as mirror only, while SVN will be the primary repository.

You can easily check it out with:
    $ git clone git://
I am not sure how you can email me your patches, but I know it is possible and report on this later. This mirror is important to those who want to play with Git, as one no longer requires git-svn, dropping one dependency.

Now, it does provide some extra payload on my side, as I need to keep cdk SVN repository (or, better, my git-svn copy of it) synchronized with the git repository, but this turned out to be fairly easy:
    $ cd GitHub/cdk
    $ git pull ../../SourceForge/git-svn/cdk my-local-1.2
    $ git push
So, does this mean no goodies for people who stick to SVN? No, there are some, like this PunchCard:

Wednesday, September 24, 2008

Moved to Sweden: Post-doc in the Bioclipse group of Prof. Jarl Wikberg

The reason why I have not been able to blog much lately, is that my family and I have been moving to Uppsala/Sweden, where I'll start a postdoc in the group of Jarl Wikberg @ BMC @ Uppsala University, where I'll work on chemoinformatics in drug design, and the use of CDK and Bioclipse in particular.

More blogging when I have more frequent internet access again...

Tuesday, September 09, 2008

FriendFeed for the Chemistry Development Kit

FriendFeed is a nice aggregation service allowing discussion of items posted from delicious, blogs, and any other RSS-based feed.(e.g. my feed. It also has a room concept, where people can post stuff around a topic, such as a conference such as Science Blogging 2008 London, or the CDK:

I have associated the RSS feed of the CDK bug tracker, the CDK News ASAP, and will shortly add the commits messages feed.

Sunday, September 07, 2008

CDK development with branches using Git

Christoph pointed me to a video on Git by Linus. CDK is now using branches extensively in development, and just set up a branch for the upcoming 1.2.0 release later this year (end of October, see cdk-1.2.x). Christoph has just reviewed the branch containing the API move to Iterable. This patch now allows to do this (which would really deserve a blog item by itself):
for (IAtom atom : molecule.atoms()) {
System.out.println("Symbol: " + atom.getSymbol());
Now, while branching in SVN is easy (svn copy), merging is a pain, something Miguel and I found out in the last half year, where he and I experimented with using branches in development (see also Comparing Branches). We discovered that porting bug fixes from trunk to a branch, or just keeping the branch synchronized with trunk, simply does not work. And merging itself, after a while, became a tedious process. So, when watching Linus' movie on Git where he mentions being able to merge several branches a day, I knew I had to switch. A full switch for the CDK depends on an always accessible repository (I have been thinking about GitHub; anyone with an opinion on that?).

However, you can start using Git without a central Git repository, including branch support. This blog by Bart has the juicy details, which I'll apply here to CDK, for easy copy/pasting. This replaces the earlier writing on Offline CDK development using git-svn.

First step is to get yourself a Git mirror of SVN (which will take a long time; do it overnight(s)):
$ git svn clone -T trunk -b branches -t tags
$ git gc
The second command compresses commits to reduce the size of your local Git copy, resulting in a cdk folder of about 300MB. Enter the directory, and check that it has the default master branch:
$ git branch
* master
In SVN one must always do a svn update before one starts coding. Similarly, in git you do (and I found this important to keep your local repository consistent):
$ git svn rebase
Committing has not changed, and a simple change would go via:
$ nano build.xml
$ git commit -m "Changed something, but too lazy to write up what I actually changed" build.xml
$ git svn dcommit
Now, before we move to setting up branches, one must realize that there are SVN branches and (local) Git branches. Keep that in mind, and consider that we have Git to realize how to keep them synchronized. The check the Git branches one uses git branch as shown above; to view the SVN branches, however, we type (which should produce a quite long list for the CDK; only a few listed below):
$ git branch -r
Here, the first is CDK trunk, the second a tag tags/cdk-2003-Oct-17, and the last two are the branches cdk-1.2.x and mesprague-iterators (no longer existing). I am not sure why the branches/ is missing here; some git-svn magic I presume.

Now, to create local Git branches that are synchronized with the SVN cdk-1.2.x and cdk-1.0.x branches, we type:
$ git checkout -b my-local-1.2 cdk-1.2.x
$ git checkout -b my-local-1.0 cdk-1.0.x
$ git branch
* master
You can now easily change branches with git checkout <BRANCH>, and check which SVN path you are working against with git log -1:
$ git checkout my-local-1.2
$ git log -1
commit 93bd0b22bbad31897eed6686e5b208c5e23505f7
Author: egonw
Date: Sun Sep 7 08:13:38 2008 +0000

Fixed inline citation (closes #1987947)

git-svn-id: eb4e18e3-b210-0410-a6ab-dec725e4b171
Inspection of the output shows the git-svn-id line which indicates that that patch was indeed commited against cdk/branches/cdk-1.2.x.

With this set up, I can easily changes between trunk and branches, and backport patches from trunk to the cdk-1.2.x branch (using git cherry-pick) and merge all commits to the branch into trunk using:
$ git checkout master
git merge cdk-1.2.x

Git does an excellent job here. It recognizes when the branch was last merged with trunk, and will not attempt to apply patches twice. Even better, it also recognized patches that were backported from trunk to the branch, and will not attempt to merge that either.

The result: I can easily merge branches now, generally speeding up CDK development! For example, it reduces the time between someone submits a patch, and when I apply it to trunk (or cdk-1.2.x in case of a bug fix). I just set up a local branch, apply the patch, and tune until I am happy; I do not keep trunk unstable, as I am doing this in a separate branch. Similarly, if people develop there patch in an SVN branch, I can just as easily switch branches (as described above) and check things, before I merge).

Setting up new SVN branches
As far as I know, git-svn cannot create or delete new SVN branches. But this is easy enough with SVN command:
$ svn copy
$ git svn fetch
$ git checkout -b my-local-newbranch egonw-mynewbranch
$ # hack in my-local-newbranch
$ git commit -a
$ git svn dcommit
$ git checkout master
$ git merge my-local-newbranch
$ svn remove
Enough for now.

Monday, September 01, 2008

Ubiquity fun: entering semantic markup as easy as running a Ubiquity command

Now, the DOI ubiquity scripts I just blogged about, was just the beginning of things. Me exploring the environment and learning the JavaScript language.

I start to become really interesting when we use these technologies to improve things. I am still not sure people will like the command line nature, but at least I will be a happy user. This is the setting: I'm blogging about some chemistry, like to add an InChI (or InChIKey) and add that cool sechemtic markup people have been blogging about, but I do not know (or want to know) the HTML details for that.

Well, no worries, no more. Here comes sechemtic-inchi (installer here)!

Step 1:
I type in the InChI I want in my blog (example showing that of methane):

And, I select the InChI:

After which I hit the Ubiquity shortcut (ALT-SPACE on Linux) and I type sechemtic-inchi:

And, viola, there is my RDFa HTML code for chemistry:

Now, with only minor amounts of fantasy, you can imagine where this is going: SMILES, InChiKey, etc, etc. Hook it up with chemistry webservices to autoconvert SMILES to InChIKeys, and Bob's your uncle.

Ubiquity fun: resolving DOIs

Now, I'm really after something else, but here's my first Ubiquity scripts. It allow you to select a DOI on any web page (which really only makes sense if it is not already a hyperlink), you hit ALT-SPACE (Linux), CTRL-SPACE (Windows), or whatever the shortcut is on your operating system, and type resolve-doi and it will automatically convert the DOI into a hyperlink to look up the paper.

What I am actually interested in, is being able to use this command in a blog editing environment; however, I have not managed to get that working in one command. And because I am apparently not able to put in two ubiquity commands in blog items, you need to go to this page.

Second warning. I have only tried them with Ubiquity 0.1, not 0.1.1, or even later.

For the curious, the script looks like:
name: "resolve-doi",
homepage: "",
author: { name: "Egon Willighagen", email: ""},
description: "Resolves a DOI into a URL",
license: "GPL",
takes: {"doi": noun_arb_text},

preview: function( pblock, doi ) {
var msg = 'Inserts a URL for the DOI: ${doi}';
var d = doi.text || CmdUtils.getSelection();
pblock.innerHTML = CmdUtils.renderTemplate(msg, {doi: d});

execute: function( doi ) {
var msg = '<a href="${doi}">${doi}</a>';
var d = doi.text || CmdUtils.getSelection();
var newText = CmdUtils.renderTemplate(msg, {doi: d});
Comments on this code most welcome! It's GPL. Details can be found in this tutorial and examples in Rajarshi's blog