Saturday, September 22, 2012

OMG! An Open Molecule Generator!

Earlier this week an important cheminformatics paper appeared in the Journal of Cheminformatics. It is about the Open Molecule Generator (see below for the paper). This was one important piece of functionality still missing from Open Source cheminformatics. This works uses the Chemistry Development Kit, and was written by Julio Peironcely.

The Analytical Biosciences' group of Prof. Hankemeier (and many others, including also Theo Reijmers) and funded by the Netherlands Metabolomics Centre has been using the CDK for metabolomics for a while now, with Miguel Rojas-Chertó as other principle user (and of course CDK developer!). I congratulate them all with this piece of work, and particularly with their choice of license!

Julio (with the other authors) have picked up a difficult algorithm, based in mathematics, but not the straightforward graph theory either. Others have tried to implement structure generation in the CDK, and I looked into this too, when working in Christoph Steinbeck's group back in Cologne. What the OMG team has achieved is significant.

The paper compares their results with MolGen, resulting in results like those in this table (from the CC-SA-BY paper):

It shows that the results are identical, when you consider the atom types it uses. And the use the CDK atom type framework I initiated, which is way cool! Julio found the tables I constructed from earlier CDK code incomplete (as did others) and extended them, to match their needs.

One "problem" with their current code base is that it is quite slow compared to OMG. This is easily compensated by the added functionality of OMG, such as restricting the structure generation with multiple fragments. Now, the CDK data classes are know to be somewhat sluggish, as compared to competition, but the community is increasingly improving this.

But I also think that the OMG use of Naughty via JNI is not helping performance either, and I hope someone will soon jump in and convert that C code into Java code, which should speed up performance too. Another side to this is that removing the dependency on C code will also make it easier to integrate into other tools, like Bioclipse, Taverna, and KNIME.

ResearchBlogging.orgJulio E Peironcely, Miguel Rojas-Chertó, Davide Fichera, Theo Reijmers, Leon Coulier, Jean-Loup Faulon, & Thomas Hankemeier (2012). OMG: open molecule generator Journal of Cheminformatics, 4 DOI: 10.1186/1758-2946-4-21

Friday, September 21, 2012

EBI visit: CDK hacking and bioassays

This week I visited the EBI for a set of meetings, among which a CDK hackathon (results will follow later), a Blue Obelisk (doi:10.1186/1758-2946-3-37) pub meetup, and discussions about bioassays. The latter involved a Open PHACTS (doi:10.1016/j.drudis.2012.05.016) specific meeting where we talked about assay data and about such in ChEMBL and ChEMBL-RDF, and me attending a EU-OPENSCREEN meeting on invitation from Janna Hastings.

The whole week was brilliant, and we made good progress on many fronts, and it got even better because Janna gave me 15 minutes to talk about Open Science. I touched on various aspects, and showed the CDK, OrChem (doi:10.1186/1758-2946-1-17), KNIME, Bioclipse and the Brunn (doi:10.1186/1471-2105-12-179) use case, the network of Blue Obelisk projects (and several showed up during presentations at the EU-OPENSCREEN meeting, including RDKit, Open Babel/PyBel (doi:10.1186/1758-2946-3-33), Cinfony (doi:10.1186/1752-153X-2-24), JChemPaint (doi:10.3390/50100093), the Linked Open Drug Data project of the HCLS ig (doi:10.1186/1758-2946-3-19), Open PHACTS, WikiPathways, PathVisio, SEURAT-1, OpenTox (doi:10.1186/1758-2946-2-7), and ToxBank.

I will add links to papers and websites in the blog post later, but wanted to put the slides in my blog before I catch the taxi to the airport in 20 minutes :)

I had particular good responses on the Blue Obelisk community, where we managed to get a rich eco-system of open source cheminformatics tools. This was one of the things I stressed in my presentation, that it is more practical to not try and make one solution, but have all solutions interoperate. Look what that has brought cheminformatics (click the image for all details):

Wednesday, September 05, 2012

Bringing Science to the young #kennisdebat #plosone

This Monday the Dutch #kennisdebat was trending on twitter. It was a really nicely done debate 100% on twitter, where the future of science in The Netherlands was discussed, around a number of themes and discussion points. Despite the upcoming elections, only few Dutch politicians participated, and most just focused on more or less money for Science. I understood we are doing really bad on an international level, with the amount of science (education) spending compared to our countries turn around.

Unsurprisingly, I participated actively, arguing for Open Science and the sorts, making me in the top three of most active participants (some 1600 in total!). Here's a newspaper summarizing the full content. Despite being trending the full day(!), none of the newspapers spend a word on it on the next day :(

Anyway, I made many statements and argued for them, and since none of that really made it into that newspaper as item, I will soon write up things. One of my points is that Dutch young kids are at disadvantage with kids from the USA and UK (well, decreasingly USA, with Spanish becoming more and more important there): English is the primary language of science, and you will not see this in NL, simply because PLoS ONE papers do not get translated into Dutch (a market?):

It would be a start of XKCD would be translated in Dutch. Allowed: it's CC-BY-NC! Seriously, I like Fokke, Sigmund, etc, maar voor de kenniseconomie zou XKCD in De Volkskrant en NRC een veel beter idee zijn! (... assuming XKCD is too difficult for average Telegraaf and AD reader...)