## Wednesday, July 23, 2008

### Molecular QSAR descriptors in the CDK

Rajarshi has patched trunk last night with his work to address a few practical issues in the molecular descriptor module of the CDK (and I peer reviewed this work yesterday). One major change is that the IMolecularDescriptor calculate() method no longer throws an Exception, but returns Double.NaN instead. The Exception is stored in the DescriptorValue for convenience. This simplifies the QSAR descriptor calculation considerably, and, importantly, makes it more robust to the input. Though only by propagating errors into descriptor matrix. Just make sure your molecular structures have explicit hydrogens and 3D coordinates, and you're fine.

Anyway, Rajarshi also added a new page to CDK Nightly to list the available descriptors:

1. "Just make sure your molecular structures have explicit hydrogens and 3D coordinates, and you're fine."

Which specific descriptors require 3D coordinates? It would be nice if it were possible to just calculate descriptors which are happy with 2D information.

2. Noel, yes, that is much better, and I have been working on that, but never got around to completing that. One of my tasks in Uppsala.

3. This is actually quite easy using the DescriptorEngine API - just focus on constitutional, topological classes (see for example the CDKDescUI program)

I think the dictionary at one point had the concept of requires3D or something like that. If so, that'd be an easy way to get at the required set of descriptors

4. Rajarshi, indeed... that's the work I was talking about when replying to Noel. There should be some code in the CDK for DataFeatures. An algorithm can require or provide such a feature. This allows selecting suitable descriptors, as well as other interesting things.