Tuesday, July 22, 2008

Peer reviewed Chemoinformatics: Why OpenSource Chemoinformatics should be the default

The battle for scientific publishing is continuing: openaccess, peer reviewing, how much does it cost, who should pay it, is the data in papers copyrighted, etc, etc.

The battle for chemoinformatics, however, has not even started yet. The Blue Obelisk paper (doi:10.1021/ci050400b) has gotten a lot of attention, and citations. But closed source chemoinformatics is doing fine, and have not really openly taken a standpoint against open source chemoinformatics. Actually, CambridgeSoft just received a good investment. I wonder how this investment will be used, and where the ROI will come from. More closed data and closed algorithms? Focus on services? Early access privileges? At least they had something convincing.

There are many degrees of openness, and many business models. I value open source chemoinformatics, or chemblaics, as I call it. There is a striking similarity between publishing and chemoinformatics. Both play an important role in the progress of sciences. A big difference is that (independent) peer review of published results is done in scientific publishing, but not generally to chemoinformatics. Surely, algorithms are published... Ah, no; they are not. They are described. Ask any chemoinformatician why this subtle difference is causing headaches...

Let me just briefly stress the difference between core chemoinformatics, and GUI applications. The first *must* be opensource, to allow independent Peer Review; the latter is just nice to have as opensource. Bioclipse is the GUI (doi:10.1186/1471-2105-8-59), while the CDK is our peer-reviewed chemoinformatics library (pmid:16796559). I would also like to stress that the CDK is LGPL, allowing the opensource chemoinformatics library to be used in proprietary GUI software. We deliberately choose this license, to allow embedding in proprietary code. The Java Molecular Descriptor Library of iCODONS is an example of this (that is, AFAIK it's not opensource).

So, getting back to that CambridgeSoft investment. I really hope they search the ROI in the added value of the user friendly GUI, and not in the chemoinformatics algorithm implementations, which, IMHO, should be peer-reviewed, thus open source. Meanwhile, I will continue working on the CDK project to provide open source chemoinformatics algorithms implementations, for use in opensource *and* proprietary chemoinformatics GUIs.


  1. I am not sure if CDK or Open Source community has any complaints about the way we are using CDK with JMDL (iCODONS). As matter of fact CDK is one of the cores which can be used with JMDL (support of other Core APIs will be also available very soon). JMDL uses core APIs as dependency to access the basic and core functionalities because nobody wants to reinvent the wheel. I must say the use of CDK as core depends on end user, he/she need to opt for it and download it from CDK website and use it with JMDL. Its all about user preferences and we have just given an option if you can not afford commercial APIs. Hope this clears the cloud about any violation related to CDK LGPL. We would like to here about the same from the CDK community.

  2. Hi little budha, the use of CDK in JMDL is no problem at all. The CDK project deliberately chose the LGPL license to allow use in proprietary software (making the assumption that JMDL remains closed-source).

    The LGPL license does also allow you to distribute the CDK along with your software, as long as you point on your website where people may download the CDK source code; for this, you may simply point to the CDK SourceForge project page.

    Furthermore, please contact me offline to discuss how we may improve collaboration, to ensure a as high as possible user experience of people using your software in combination with the CDK.

  3. Can u drop me a email @ about ur offline contacts (phone no or skype id which ever u r comfortable). We have plans to induce one or two JMDL developers in CDK development community on regular basis.