Friday, September 27, 2013

Urgent Open Science needs for Drug Discovery: pKa and logP

There is quite some discussion right now on Open Source Drug Discovery, and questions about what is Open Source and what is not. But as I made clear yesterday, I do not think that a project that requires the assigbment of specific rights independent from Open licenses is not the way forward, and in many cases not even possible. In my humble opinion, an #openscience approach is critical. Hiding data pending a (open) patent does not work for me; I'm sorry. Not that I am against patents in general (primarily, the patent system is broken and misused, but the idea has merits)...

Instead, I very much prefer to focus on solutions. Like the CDK, Bioclipse, BODR, CML, and many other Blue Obelisk tools. These tools are enabling drug discovery and research into computational tools to aid drug discovery. Without strings. Not fuzz about having to submit your precious data before you can use these tools. We contribute, we pay forward. And seriously, I love to see a Nature Chemical Biology paper and learn it uses the CDK, even if I am not a co-author on that paper, as much as I could use that in my academic career (or any of the other 75 contributors of the CDK!).

We do get back, beyond that aforementioned satisfaction. We do see other projects donate data, donate tools built on top of the CDK, to further aid the community.

But if you really like to know, here's my wishlist of things that we really urgently need: Open Data for training (statistical) models for chemical properties. In particular, I need CCZero experimental data (annotated with experimental method, error, etc) for:

  1. logP (and/or LogD)
  2. pKa (please use this wiki)
We recently saw such initiatives for melting points and solubility already from Jean-Claude, Andrew Lang, Antony Williams, and others.

If you have data, please make it available as Open Data, by putting it online in a machine readable format, and with proper copyright and CCZero waiver information.