Sunday, April 23, 2006

Download statistics for chemblaics components

Here are some quick download statistics for some of the chemblaics components. First Jmol. The new stable Jmol 10.2 was release just over a week ago, and this obviously boosted downloads, breaking the monthly download total of two earlier this year (source):

Statistics for the CDK include download numbers for the CDK library itself, but for JChemPaint, the CDK News, and several other packages too. Totals are at about 1/3rd of Jmol. Another new record, breaking an earlier record set in February 2003 (source):

Finally, I want to mention the overall download count for kfile_chemical was is much higher than I ever would have hoped for: 1125 in 7 months! Maybe I should ask to get this in the KDE extragear.

Update: fixed Jmol link.

Protein support in Bioclipse using Jmol and the CDK

I have not blogged for about a week now, and been too busy with other things, like finishing my PhD articles/manuscript, my new job at the CUBIC where I continued the work on proper protein support in Bioclipse using the CDK and Jmol:

The latter involves getting the CdkJmolAdapter, the interface between the CDK and Jmol, updated for changes since the Jmol as 3D viewer for CDK article in CDK News, the open access journal for CDK related projects.

The screenshot is not showing the actual status: the CdkJmolAdapter does not propagate all information to Jmol correctly; as you can see in the screenshot in the BioPolymerTree and Property views, the CDK now reads the structure information from the PDB file, and I verified that Jmol really extracts this using the StructureIterator, but the secundairy structure does not show up yet. I believe the problem is in the AtomIterator: issueing the 'select protein' script, selects zero atoms.

The above screenshot is using a workaround, and was made by using Jmol's own IO instead of the CdkJmolAdapter. But I'm very close and think I will be able to fix this soon.

Friday, April 14, 2006 maps upcoming conferences

Conference season is nearing. And just in time, added a conferences map showing locations of upcoming and recently finished conferences. Oh boy, do I want to set this up for chemoinformatics too! makes use of the rel="conference" attribute for the <a> element. I'm not sure how they distinguish between upcoming and finished conferences (will need to check the source code). But I think some manual processing is done, for example, to extract conference details, like title, location and dates. I assume the URL is used as unique identifier. Additionally, the conferences are not 'tagged' yet, which should be possible too, as already associates tags from blog items with articles mentioned in that item. But this is likely a temporary ommision.

I already saw ChemConf2006 picked up from an earlier post by me. Unfortunately, because it is an online conference, it does not show up on tha map :( The following two conference do have a physical location, and I hope the will appear on the map. If you wonder why I mention only these two, they are the two I will attend in the next 8 weeks, and will have presence of open source bio- and chemoinformatics software developers (at least one, me).

Wednesday, April 12, 2006

The CDK data classes and change notifications

The data classes of the Chemistry Development Kit are mutable, unlike those of Octet. This means that other classes may need to respond when the content updates. For example, a render class. CDK's ChemObject provides a notifyChanged() and addListener() methods for this. However, as was recently pointed out, while this is useful in editors, such as JChempaint, this is a performance killer in high-throughput sitations, such as descriptor calculation, or structure diagram generation runs.

To address this, the IChemObject interface has been extended with the methods setNotification(boolean) and getNotification(), which allow to temporarily disable change notifications. There are no helper methods yet to disable it for a complete data structure, like ChemModelManipulator.setNotification(ChemModel, boolean), but I expect these to be written soon.

Alternatively, special data classes may be used if notification is never needed for a special setup, for example, in case the QSAR descriptor calculation. In such cases, the new NoNotificationChemObjectBuilder can be used:

IChemObjectReader reader = new MDLReader(new FileInputStream(new File("some.mol")));
IChemObjectBuilder builder = NoNotificationChemObjectBuilder.getInstance();
IMolecule molecule =;
// then perform some operation in which the molecule changes a lot

The advantage is that you do not have to manually disable notification for each class you instantiate. This should give a considerable speed up, and I hope soon to give some statistics.

Monday, April 10, 2006

Getting Jmol's 'cartoon on' to work in Bioclipse

Bioclipse 1.0 is to be released in May, and the cartoon on script command is still not working in the Jmol viewer. For those who do not know yet, Bioclipse is a cool Eclipse RCP based Java chemo- and bioinformatics workbench. To have a better idea what goes on inside Bioclipse, I wrote a new BioPolymer tree to show me the strands in the protein. After Ola wrote code to show properties for IChemObject's, I extended did with PDB properties for the atoms, strands and monomers.

The contents of the BioPolymerTree view on the right and the Properties view below that look fine:

So I'll have to dig a bit further.

Tuesday, April 04, 2006

Mining the KEGG pathway database with self-organizing maps

The Self-organizing map (SOM) is a popular (again) and intuitive non-linear mapping method: it transforms a multidimensional space into two dimensions (normally: they are so easy to visualize). Latino and Aires-de-Sousa published a paper that uses this method to analyze the whole KEGG pathway database: Genome-Scale Classification of Metabolic Reactions: A Chemoinformatics Approach (DOI: anie.200503833).

The method is based on earlier work by Zhang and Aires-de-Sousa: Structure-Based Classification of Chemical Reactions without Assignment of Reaction Centers (DOI: 10.1021/ci0502707). A non-trivial feature of the suggested method is the use of two SOMs. The first maps the reaction onto a fixed-length vector (coined MOLMAP), which is used as input vector for the second map. This later map is used to cluster the KEGG reactions on a purely chemical basis. The resemblence with the EC numbering system is striking.

Update: Fixed DOI link and added Technorati tags.

Sunday, April 02, 2006

Uncertainty in NMR based 3D protein models

While I was working on implementing proper author-given chain IDs in PDB structures for Jmol's mmCIF reader today, I thought it was interesting to mention the recent article Traditional Biomolecular Structure Determination by NMR Spectroscopy Allows for Major Errors by Nabuurs (DOI: 10.1371/journal.pcbi.0020009, open access), working at the CMBI, two floors away from my former working location at the Radboud University Nijmegen.

Nabuurs discusses in this article the uncertainties that come with NMR derived 3D molecular structures of proteins. These studies do not give factual data on atomic coordinates, but generally give facts about interatomic distances. Solving the 3D geometry is then an optimization problem where the task is to find the 3D geometry that best reproduces the factual interatomic distances.
Now, this optimization has many closeby, i.e. in terms of matching the experimental data, minima, corresponding, possibly, to quite different structures.

This is nicely demonstrated in the article, by comparing the folds of 1Y4O and 1TGQ, as shown in the figure below (CCAL license):

It is interesting to note that 1TGQ got replaced by 2B95 about the same time the article by Nabuurs was published, which shows a 3D model that is homologous with that of 1Y4O, and different from that in the Nabuurs article.

Free online ChemConf 2006 conference

Internet has the nice feature of bringing together people. This has helped many open source projects in the past. But it is also a convenient and cheap way to have conferences. Next month, the ChemConf 2006 conference will be held, and interested people only need to subscribe to a mailing list to participate.

The topic of this years ChemConf is Web-Based Applications for Chemical Education. At least three posters will show the use of Java applets in chemistry education, using Jmol, JChemPaint and JSpecView. I am (co-)author of two of them.

Again, participation is free. So join in!