Sunday, August 12, 2012

Creating QSAR models in #Bioclipse with #OpenTox

Of course, the Bioclipse team in Uppsala has been working on QSAR and proteochemometrics in Bioclipse form the start. But OpenTox (doi:10.1186/1758-2946-2-7) can generate (predictive) regression models too (it can do a lot). And we integrated Bioclipse and OpenTox before (doi:10.1186/1756-0500-4-487).

So, when Nina asked me about exposing the QSAR model building functionality of OpenTox in Bioclipse, I had a look at it. Because I had not hacked on the Bioclipse-OpenTox code much recently, I set out to add a few more unit tests. These are automatically run by the Jenkins installation. The number of unit tests doubled to some 52 tests, but the new tests also uncovered two regression. One problem was the listCompounds() was not working anymore, and the other was addMolecules(List<IMolecule>) was incorrectly names in the implementation, causing one to not be able to call that method. Both are fixed now.

However, at the end of last night, I was feeling comfortable with the code again, and hacked up a function to be able to create QSAR models with OpenTox:

When I tested this on Nina's AMBIT installation (doi:10.1186/1758-2946-3-18) this, it nicely created this model:

The opentox.createModel() method takes four parameters. The first one is the regression method to use, the second the data set to use as training data. The third parameters refer to the features to be used as independent variables (x data), while the last parameter is the feature with the dependent variable (y data).

The stuff is compiled with Jenkins, and should be available as update in your Bioclipse installation.