Instead, I very much prefer to focus on solutions. Like the CDK, Bioclipse, BODR, CML, and many other Blue Obelisk tools. These tools are enabling drug discovery and research into computational tools to aid drug discovery. Without strings. Not fuzz about having to submit your precious data before you can use these tools. We contribute, we pay forward. And seriously, I love to see a Nature Chemical Biology paper and learn it uses the CDK, even if I am not a co-author on that paper, as much as I could use that in my academic career (or any of the other 75 contributors of the CDK!).
We do get back, beyond that aforementioned satisfaction. We do see other projects donate data, donate tools built on top of the CDK, to further aid the community.
But if you really like to know, here's my wishlist of things that we really urgently need: Open Data for training (statistical) models for chemical properties. In particular, I need CCZero experimental data (annotated with experimental method, error, etc) for:
- logP (and/or LogD)
- pKa (please use this wiki)
We recently saw such initiatives for melting points and solubility already from Jean-Claude, Andrew Lang, Antony Williams, and others.
If you have data, please make it available as Open Data, by putting it online in a machine readable format, and with proper copyright and CCZero waiver information.