Pages

Thursday, September 01, 2011

Community development

LWN.net is one my short list of obligatory reads, and always has great analyses of Open Source projects (and I can recommend getting a subscription). Last week there was a good post on collaborative development by Jake Edge based on a talk by Clay Shirkey. It discusses some observations on how large collaborative projects work. This awareness applies to smaller cheminformatics projects too, and will help a project grow. Three principles are outlined, and one goes like:


That might appear to be a very large-scale collaboration, but it's not, he said. If you graph the contributions, you soon see that the most active contributors are doing the bulk of the work, with the top contributor doing around 500 edits of their own. The tenth highest contributor did 100 edits, and the 100th did 10 edits. Around 75% of contributors did only one edit ever.

That same pattern shows up in many different places, he said, including Linux kernel commits. These seemingly large-scale collaboration projects are really run by small, tight-knit groups that know each other and care about the project. That group integrates lots of small fixes that come from the wider community. Once we recognize that, we can plan for it, Shirky said.

This should be familiar to many of us. At least to the CDK project, which has a very small core, and too a much larger group of people who make small edits. What the above analysis does not describe, is that those small commits often can be crucial to the impact of the project. For example, the commit by Thorsten Flügel that led to a significant speed up. A small fix, but a major impact.

But, at the same time, us core CDK developers have to accept this, and live with it. This is one of the reasons that code must be peer reviewed, because after the patch is supplied, the maintenance is mostly on the shoulders of these core developers. I learned that the hard way. It's a bit like the learning process used by StackExchange also outlined in the LWN.net write up.

Therefore, it's up to the core developers to educate potential contributors and make the contribution as simple as possible. GitHub, also discussed in the write up, does great work indeed. Fixing spelling errors (or adding missing period after first JavaDoc sentences) is as simple as getting a GitHub account, and hitting the 'Edit this file' button on the page showing a CDK source file, and start working.

Otherwise, the CDK community is very helpful in creating patches. You just have to ask, e.g. on our #cdk IRC channel. And, if there is enough interest, I am more than happy to organize a 'Making a CDK patch from scratch with Git and Ant' crash course.