Saturday, August 20, 2011

Cleaner CDK Code #10: clean patches

The CDK code base is not just a regular dump of Java source code; it is an annotated dump of Java source code. You might have heard about git blame, and if you did, this would be a good time to start reading up on git, e.g. using this great book: Git from the bottom up. However, that will not tell you about git blame, but The Git Community Book will, and the man page will give you all the details.

We take advantage of the history of a file, as it helps us understand the full picture, complementing JavaDoc, inline comments, proper variable names, etc, etc. The annotation links each code line to a commit message. And that also explains that CDK reviewers are strong on good commit messages. No useless messages like 'fixed a bug', but a message that actually describes what has been fixed, and how. That's hard to do, but we are all increasingly trained twitterers, so we are trained to say much in 140 chars. Well, some are. So, I always hope to see something like "made the creation of morgan numbers N times faster, where N is the number of atoms in the AtomContainer" (like today).

What I do not like to see, is line changes that do not actually change something, for example, because they 'fix' whitespace. First, they ruin the git line annotation, by linking a random commit to a particular code line. Second, the reviewer does not know if the line has code changes, or just whitespace changes, and has to check the line in detail anyway. Waste of precious time, where code review is already quite a bottleneck in the CDK development process.

So, no stuff like this in your next patch please (it's extracted from a larger patch):

-        mol.addBond(0,1, CDKConstants.BONDORDER_SINGLE);
-        mol.addBond(1,2, CDKConstants.BONDORDER_DOUBLE);
-        mol.addBond(2,3, CDKConstants.BONDORDER_SINGLE);
-        mol.addBond(3,4, CDKConstants.BONDORDER_DOUBLE);
-        mol.addBond(4,0, CDKConstants.BONDORDER_SINGLE);
+        mol.addBond(0, 1, CDKConstants.BONDORDER_SINGLE);
+        mol.addBond(1, 2, CDKConstants.BONDORDER_DOUBLE);
+        mol.addBond(2, 3, CDKConstants.BONDORDER_SINGLE);
+        mol.addBond(3, 4, CDKConstants.BONDORDER_DOUBLE);
+        mol.addBond(4, 0, CDKConstants.BONDORDER_SINGLE);

Update: It's fine to have whitespace changes as separate patches; if one knows only whitespace changes, that requires a different kind of reviewing. Just don't mix it with functional changes that require more in-depth reviewing.


  1. I am a bit confused here. Does this mean that you want the CDK to remain ugly in the areas where it already is ugly or are you trying to say that it already is perfect?

    In my humble opinion these sort of commits should be allowed but the commit message should be: "fixed whitespace"

  2. I'm fine with dedicated 'fixed whitespace' patches, but only as dedicated patches. They still complicate the 'blame' history, but I can most certainly live with that in this case.