Saturday, October 28, 2006

Opensource Chemistry and Opensource Chemoinformatics

The Blue Obelisk mailing list has seen an interesting discussion on ambiguity in the term 'open source', triggered by a study by Beth Ritter Guth. For example, Jean-Claude Bradley performs 'open source' science (see his Useful Chemistry blog) who is not opposed to using closed source software, while the Blue Obelisk is about 'open source' software. It seemed that this was contradicting, and Peter Murray-Rust [wp:en] wrote up a lengthy overview of the use of the term 'open'.

Now, I have been giving the 'open source' ambiguity some thinking (well, about a month or so...), and came to the following conclusions:

  1. open source has the exact same meaning in both Bradley-like open source chemistry, and BO-like open source chemoinformatics
  2. both have the same goal
  3. it's just the research topic that is different

Ad 1: same meaning of 'open source'

I think 'open source' just means that every has the right to reproduce (and distribute and the same or modified shape) products created from the source.

In 'open source chemistry' (Bradley-like, sorry for the term :) the source is are the details about the chemical reactions to perform, the product being being able to run the whole reaction pathway.

In 'open source chemoinformatics' (Blue Obelisk-like) the source is the procedure that described how to get from one set of bits to another, really quite like getting from one molecule to another. Chemoinformatics, being IT science, just makes it a lot easier to distribute the algorithm to do that. (Sure, CMLReact is getting along quite nicely.)

The analogy even goes further, both science do not only depend on open source. Like Bradley-like open source science allows embedding proprietary stuff (glass-ware, closed-source software, chemical both from Acros (now Fisher), ...), so does BO-like open source science, which uses tons of proprietary stuff too (computers, Sun's JVM, MS-Windows).

Ad 2: same goal

I can be short on this one. For both 'open source' initiatives the goal is to share knowledge and make science reproducible.

Ad 3: different topic

So, the confusion was just coming from the fact to what extend 'open source' tools are being used. Can you do open source science without using open source chemoinformatics? Sure. In a utopic situation, all tools and small bits are 'open source' (though some are agnostic to this). But fact is, that many Blue Obelisk members use 'closed source' tools all the time, even if they do not have too. At least everyone is doing 'open source' on their specialisms, both in open source chemistry and in open source chemoinformatics.

I guess we should just be stop being short on 'open source software' to remove any ambiguity of the term 'open source'. As a spin-off, this would make Bradley's work fit in nicely with ODOSOS: open data, open source, open standards.


  1. Hi Egon: Thanks for this post (it allows me to cite you ;-) I like the ideas you have presented here, and I, too, see the same distinctions. I am interested to know how you compare this "philosophy of sharing" with the traditional chemistry community.

  2. Egon: >
    I guess we should just be stop being short on 'open source software' to remove any ambiguity of the term 'open source'. As a spin-off, this would make Bradley's work fit in nicely with ODOSOS: open data, open source, open standards.

    I agree with Egon's analysis. However there are some semantic traps. In BO the OD and OS are competely separate. OD is normally neglected by many informaticists and scientists although it is becoming more of an issue. So an OS software person can be for/neutral or against OD - in the BO we are, of course for it.

    In chemistry OD and OS (Bradley-like) overlap and are perhaps even synonymous. So in a sense Open Chemistry could be called simply OD. The added dimension in chemistry is the physical sample. Unfortunately the cost of transmission or replication is non-zero (unlike information and software). In some disciplines (e.g. microbiology) there is a real physical sharing of smaples (culture types). Does J-C have views on physical samples.

  3. Peter, interesting comment on OS versus OD. The difference is, what I say in my original blog item, the chemoinformatics has the advantage that it can quite easily (well, packaging is not always that easy) share the algorithms, whereas in chemistry sharing a synthesis procedure is done via plain text, at least at this moment. While we don't consider the latter 'source code', it really is the implementation source, but we tend to see this as OD, i.e., as you put it, there is currently a strong overlap.

    Similarly, we do not focus that much on OD in OSS, though this is changing with, for example, BODR which makes OD in OS more prominently visible. However, I think OD is really where chemistry and chemoinformatics touch.

    Now, it is interesting to note that we see a reaction scheme as OD and not as OS. Maybe we should make a case for changing this, and a good (CML based) programming language to write up synthesis routes might be the key here.

    Interestingly, organic synthesis actually does have a strong history of sharing samples. Companies like Specs&BioSpecs sell organic molecules often synthesized at research institutes.

  4. Egon,
    I am working on the synthesis language right now...

    I don't think that selling samples is really the spirit of sharing. I have nothing against it, but it isn't sharing the synthesis process and it is normally done de facto and simply to make money or possibly obtains patents.

  5. Peter,

    Chemical samples are physical entities which have production costs. So that sending around these samples. In mediating company asking money for getting samples send home is not unreasonable, given reasonable prices. The above applies to situation where there is nu mutual interest in the compound.

    Maybe the Useful Chemistry blog is about cutting out the middle man. More likely, UC is about sharing compounds to share the burden of synthesizing derivatives, or running them against assays.

    Either way, sending around those samples does cost money, and I am not sure if the UC project has the resources to send around samples to people interested in working on that compound, other than in some sort of factual cooperation. Can the UC project afford that?

    Just thoughts and questions; these are not counter-arguments...

  6. Great discussion! I posted a reply to the question of physical samples on the UsefulChem blog.