This blog will focus on what Bioclipse has to offer CDK developers.
While Bioclipse 1.x (doi:10.1186/1471-2105-8-59) was a prototype that showed the power if integrating different bio- and cheminformatics tools, Bioclipse2 was designed from scratch, taking advantage of the latest Eclipse RCP technologies. More importantly, the team in Uppsala decided to have all functionality work via managers, allowing all actions to be recorded. And, scripting of Bioclipse. I blogged earlier about scripting JChemPaint, and creating UFF optimized 3D structures from SMILES. Example scripts can be found on GitHub (this is their coverage), and are indexed on Delicious.
R for cheminformatics
The fact that we can script everything makes Bioclipse an ideal platform for doing cheminformatics: we have access to a variety of cheminformatics libraries, and the means to visualize results via JChemPaint and Jmol. It is like R for cheminformatics: Bioclipse being the R command line, Bioclipse plugins the R packages. Eclipse provides an mechanism called Update Sites, which makes something like CRAN redundant. Back to the Chemistry Development Kit.
Over the next weeks, I will blog about scripts aimed at CDK developers and people who want to learn more on how the CDK internals work. This series assumes Bioclipse 2.0 beta2 (or better) and the CDK Feature installed. I'll be using the Gist widget to embed scripts in this blog, but you can always download the Gist directly into Bioclipse, with the GUI as described here.
|Bioclipse Feature||ui||Bioclipse UI interaction|
|Cheminformatics Feature||cdk||CDK functionality|
|CDK Feature||cdx||CDK Developer functionality|
Debugging CDK's Atom Type
As I wrote last week with the email on the first CDK 1.2 release candidate, the new CDK atom typer is a core component of the new CDK. The new implementation covers all atom types used in CDK 1.0, and many more. In particular, Miguel boosted support for charged and radical atom types.
CDK developers can take advantage of this functionality, to eliminate possible causes why a certain algorithm fails. CDK atom typing is used for a variate of algorithms, including counting implicit hydrogens, which many other algorithms need to know.
How does the CDK read a SMILES
A use case for people who want to know if a particular SMILES feature is read or to make sure it is read correctly: This script uses the diff functionality introduced in CDK 1.2, and shows two aspects of the SMILES specification: 1. it picked up the isotope information given in the second SMILES; 2. the second SMILES does not include the implicit hydrogen count, which the SMILES specification then defaults as zero.
The CDK managers in Bioclipse (cdk and cdx) expose functionality of the CDK, and allows using it in Bioclipse' rich visual workbench environment.