Monday, July 16, 2007

The Open Science Notebook 10 years ago

So, with all these people blogging about the Open Science Notebook (yes, each word is one distinct blog) it is worth looking back in time. To make clear what I put under the OSN: a notebook in which experimental details and outcome are written down.
So, what did the OSN look like almost ten years ago?

It looked like the early open source chemoinformatics projects, such as CompChem and JMDraw set up by Christoph (the SourceForge projects have, unfortunately, been deleted; so I cannot link to the original project pages). JChemPaint and Jmol also originate from those years.

These projects were OSNs avant le lettre: an experiment in chemoinformatics is the definition of a new (or reformulation of an old) algorithm, writing down the experiment (source code in this code), uploaded into a repository (Open Science!) for everyone to comment on, possible sent around an announcement for discussion to mailing list, and reporting the outcome (preferable in a peer-reviewed journal). While I am ranting^Wtalking about the issues, chemoinformatics is in the luxurious situation that reproducibility of a procedure is much easier, except for the missing data part.

Just wanted to say that OSN is really nothing new, not to chemistry anyway. Maybe for lab chemists. Jean-Claude has shown to be very successful in promoting these open science ideas among lab chemists, and congratulate him with the exposure in all those magazine interviews lately. Cheers!

Open Science versus Open Source
Oh, and let me make the distinction between open source in general and open science. Many of the current open source software in chemistry(/chemoinformatics) are not open science. Open science means that every step in the development process is open, where is many chemoinformatics programs are dumped into the open source sphere at the end. That is not the way it should be.

For the lab chemists: ^W is a shortcut for 'delete the previous word'.


  1. Egon - yes when I use the term Open Notebook Science I generally refer to physical experiments done in a laboratory. We try to do the same for completely electronic "experiments" (like docking) but it has proven to be difficult to do as well. This may be because it is much easier to run "experiments" than in a physical lab and a lot of tweaking happens quickly without formally going through an objective, procedure, results, discussion and conclusion. This may be why it works better to track using a mailing list.

    In that sense, your term "Open Science Notebook" (vs ONS) may be more appropriate for software and cheminformatics development.

  2. I think also BioPerl has started a long time ago in producing code in the open, not just a dump after development. Still, there is quite a difference from software development and knowledge discovery. Anyone coding has been exposed to the open source movement and is more likely to know why it makes sense to expose their research agenda and work. People working at the bench or even people doing knowledge discovery with computational tools are less exposed to the potential benefits. It is great that we can in a discussion point out all the different examples to show that people are trying it and not just in computational work.

    Even considering software development the fraction of work that is open is a tiny minority of the total. So, at least in this case, I think this meme really deserves being hyped up. We just have to be sure it stays close to to a useful definition.

    Nodalpoint has some wiki pages up to compile this type of information. It would be great if you could add some of these examples as well a long with some historical perspective.

  3. Jean-Claude, did I mess up the acronym? Sorry about that. I was trying to refer to the same thing. I use a physical notebook when doing my work, at least as long as I don't have a distributed wiki online. And I keep a SVN repository for all the experiments I perform. I don' think a mailing list replaces the notebook at all, it replaces the group meetings. Many 'experiments' require one to carefully monitor the parameters. Though temperature, solvent is now replaced in docking by force field parameters. The fact that people are tweaking things quickly is bad for science, leading to suboptimcal results. Should not happen in chemoinformatics, neither in any other science.

  4. Pedro, I agree that those working with computers have a clearer advantage of exchanging software and data; just look the the easy copying of music and movies. I do want to stress, though, that a lot of open science chemo-/bioinformatics is not true open science, and Jean-Claude is showing how this is done.

    The open notebook science is to me allowing others to look up all parameters settings and optimizations needed to reproduce what I did. I do not see much difference in what I am doing right now, as what I was doing when in synthetic organic chemistry. (Yes, I used to be a lab chemist myself.)

  5. Thank you very much for sharing your thoughts. It is always great pleasure to read your posts.