Tuesday, November 07, 2006

When is open source chemoinformatics successfull?

Open source chemoinformatics has become a common phenomenon, though many projects are small in nature: source code is developed by only few developers, or even in a closed manner and released when considered done. Within open source software there is room for distinguishing a subset of open development chemoinformatics, that is, Bazar-like, instead of Cathedral-like (see ESR famous writing).

Measuring the importance of an open source project can be done by many measures, such as the number of people on the user and developers mailing lists, number of downloads, number of source lines of code [wp:SLOC], number of independent development locations, and rankings on, for example, SourceForge or Google. Just to name a few.

Scientific importance of an open source project can sometimes be measured by a citation index; that is, only when there is a landmark article for the project. Rasmol is such a project: a first article was published in 1995 (DOI:10.1016/S0968-0004(00)89080-5), and a follow up in 2000 (DOI:10.1016/S0968-0004(00)01606-6). The first was cited 1190 times, and the second 65 times (as stated on Web-of-Science). Quite successful indeed.

OK, it is not even 100+, but I am quite happy with the scientific impact of the CDK so far: the 2003 CDK article (DOI:10.1021/ci025584y) was cited 24 times now, and the just published 2006 article (DOI:10.2174/138161206777585274) once:


  1. Have you searched for references to the software (rather than citations)? For example, GaussSum has no 'landmark paper', but I ask users to cite it. By full text searching on the RSC, ACS and Sciencedirect websites I can find a dozen citations.

  2. No, I haven't. It is not really an exhaustive search. OpenBabel, Jmol are two other programs that are used a lot, but lack such a landmark article.