Saturday, October 28, 2006

Opensource Chemistry and Opensource Chemoinformatics

The Blue Obelisk mailing list has seen an interesting discussion on ambiguity in the term 'open source', triggered by a study by Beth Ritter Guth. For example, Jean-Claude Bradley performs 'open source' science (see his Useful Chemistry blog) who is not opposed to using closed source software, while the Blue Obelisk is about 'open source' software. It seemed that this was contradicting, and Peter Murray-Rust [wp:en] wrote up a lengthy overview of the use of the term 'open'.

Now, I have been giving the 'open source' ambiguity some thinking (well, about a month or so...), and came to the following conclusions:

  1. open source has the exact same meaning in both Bradley-like open source chemistry, and BO-like open source chemoinformatics
  2. both have the same goal
  3. it's just the research topic that is different

Ad 1: same meaning of 'open source'

I think 'open source' just means that every has the right to reproduce (and distribute and the same or modified shape) products created from the source.

In 'open source chemistry' (Bradley-like, sorry for the term :) the source is are the details about the chemical reactions to perform, the product being being able to run the whole reaction pathway.

In 'open source chemoinformatics' (Blue Obelisk-like) the source is the procedure that described how to get from one set of bits to another, really quite like getting from one molecule to another. Chemoinformatics, being IT science, just makes it a lot easier to distribute the algorithm to do that. (Sure, CMLReact is getting along quite nicely.)

The analogy even goes further, both science do not only depend on open source. Like Bradley-like open source science allows embedding proprietary stuff (glass-ware, closed-source software, chemical both from Acros (now Fisher), ...), so does BO-like open source science, which uses tons of proprietary stuff too (computers, Sun's JVM, MS-Windows).

Ad 2: same goal

I can be short on this one. For both 'open source' initiatives the goal is to share knowledge and make science reproducible.

Ad 3: different topic

So, the confusion was just coming from the fact to what extend 'open source' tools are being used. Can you do open source science without using open source chemoinformatics? Sure. In a utopic situation, all tools and small bits are 'open source' (though some are agnostic to this). But fact is, that many Blue Obelisk members use 'closed source' tools all the time, even if they do not have too. At least everyone is doing 'open source' on their specialisms, both in open source chemistry and in open source chemoinformatics.

I guess we should just be stop being short on 'open source software' to remove any ambiguity of the term 'open source'. As a spin-off, this would make Bradley's work fit in nicely with ODOSOS: open data, open source, open standards.

Thursday, October 26, 2006

Running single JUnit tests in Eclipse

Unit testing is important when developing source code. JUnit provides a library to facilitate this in Java, and Eclipse had the functionality to run JUnit tests. Even better, it allows you to run single JUnit tests, even in debug mode:

Just open the java class in your Package Explorer, right click on the JUnit method you want to run, then pick 'Run As' or 'Debug As', and then 'JUnit test'.

Wednesday, October 25, 2006

Being a good opensource user

There are many ways to contribute to opensource software (OSS), programming only being one of them. I develop OSS, but use OSS too. For example, I am a big user of the Linux kernel, the KDE desktop, Kubuntu, Debian (I have unstable in a chroot), Firefox, Eclipse, Classpath, and many, many others. What these have in common, is that I generally have no time to look into the source code of these projects. A small patch excluded, I am really a regular user of these projects.

However, I try not to leech (see also Peter's related comment on that): I care about these projects and, therefore, I file bug reports. Sometimes, I even join the developers and talk to them via commonly used IRC and mailing lists. Even, every now and then I get this itch and then I do look up source code and contribute a patch. But filing bug reports is the least one can do, the least everyone should do.


Classpath is the GNU project to provide a free Java library, i.e. the set of java.* classes that come with the Sun JVM. It is not a virtual machine, though, for which several opensource implementations are available, many of which use Classpath as library provider. They have a very nice chat channel at, called #classpath. There wiki provides a platform for given feedback on how well software runs. A bug track system (BTS) is available too. An overview of the bugs that I filed, can be found at my account: bugreports+Classpath.

Needless to say, Classpath is important in making our Java based chemoinformatics truely opensource.


Things are different for Debian and Kubuntu: these are distributions and, except for some patching, are generally not involved software development as done by upstream. However, they generally do appreciate to know about bugs too, so there is some duplication of bug reports here.

That said, they do provide nice tools for bug reporting which works for all packages that they distribute. Debian has reportbug and Kubuntu has Launchpad. An over view of bugs I reported with Debian can be found at bugreports+debian. I do not have bug reports in Launchpad yet, but two can be found in mailing list archives, see bugreports+ubuntu.


I also tracked back two bugs I reported with KDE, see bugreports+KDE.


Surely, I filed many more bugs to many other projects. A long list of bug reports can be found on SourceForge. However, it seems not possible to make an easy list of that :(

Wednesday, October 11, 2006

Are chemogenomics and proteochemometrics the same?

Joerg Wegner recently blogged about Chemogenomics: structuring the drug discovery process to gene families by C.J. Harris and A. P. Stevens in Drug Discov Today (DOI: 10.1016/j.drudis.2006.08.013). This review article provides a nice overview of a trend in mathematical modelling of the interaction of small organic molecules with proteins, often referred to as QSAR. What the article does not discuss, is the work by the group of Jarl Wikberg who coined the term proteochemometrics (see PubMed: 11342268).

Friday, October 06, 2006

Google's new search engine: /* Code Search */

Google has set up a new search enginge specifically for source code: /* Code Search */. Important difference with their normal search engine is that it allows restricting your search by programming language, license and filename and package. I have not been able to figure out how to use 'package' yet, but the others are pretty clear. For example: AtomContainer license:LGPL lang:java should do it. The search results show filenames, licenses and programming languages:

Alternatively, you can use Koders, which is a source code search engine too. It has been around for quite some time now, and shows the copyright notice too. Additionally, Koders offers a plugin for Eclipse which adds a search 'view' which will show the HTML from the website in an editor window inside Eclipse.

Wednesday, October 04, 2006

Bioinformatics: Open Source or Open Access??

I have heard that bioinformatics is ahead of chemoinformatics. However, I discoverd that this is not necessarily the case, while preparing for a homology modeling course I gave this week at the CUBIC. Open Access is really no issue there, with open access journals and many open access databases. But it is different when it comes down to open source software.

Below is a list of bioinformatics programs which are free for academic use, but not open: And this not even includes the many websites which do not offer the software behind them. And these programs cover several steps in the whole homology modeling process. Open source homology modeling is not possible at this moment :(

But, on the bright side, there are already some open source programs involved too: And protein structure viewers is hardly a problem at all; several open source viewers are available, among which Rasmol, PyMOL and Jmol.

In other words: we might not want to look at bioinformatics too much.