I use CiTO to keep track of how the CDK is cited and used, and just looked at a typical QSAR paper. Here are my comments on "Study of indole derivative inhibitors of Cytosolic phospholipase A2α based on Quantitative Structure Activity Relationship", by Lu et al (doi:10.1016/j.chemolab.2011.11.011). Normally, I am fairly short in these reviews which I publish via the CDK Google+ page, briefly describing what CDK functionality is being used. But this time the post became a more substantial review, so decided to put it here too, and use ResearchBlogging which I haven't done in a while.
The paper by Lu et al is typical QSAR paper, with less than 50 compounds, hundreds of descriptors, and some machine learning. They cite the CDK as a free tool to calculate descriptors, but use something else. The article compares PLS, ANN, and SVM, in the typical bad way, by not splitting out the effect of the kernel (RBF) from the regression model, making the comparison pretty uninformative.
If I scanned the paper correctly, they use a single test set, with LOO cross-validation for modeling method parameter estimation. The test set compounds are picked at the outer sides of the end point range, and no information is given on the variance in R2 and Q2 statistics. BTW, these two statistics are surprisingly close to each other (for each method separately). I wonder if that applies to all possible test sets, and some bootstrapping seems in order here.
Also, stepwise MLR was used for descriptor selection, thus prior to statistical modeling, and it seems to me PLS, ANN, and SVR was performed in this subset! Well, that makes the comparison even less relevant, as PLS does not require such prior selection. Moreover, it is know the stepwise MLR easily leads to local minima, not to the most optimal combination of descriptors.
Lu, X., Ji, D., Chen, J., Zhou, X., & Shi, H. (2012). Study of indole derivative inhibitors of Cytosolic phospholipase A2α based on Quantitative Structure Activity Relationship Chemometrics and Intelligent Laboratory Systems DOI: 10.1016/j.chemolab.2011.11.011