Monday, April 30, 2007

Improved CMLRSS feed for Chemical blogspace

While adding the 116th blog (David Bradley's Chemistry News) to Chemical blogspace (see also New Blogs #5, I noticed that David is using semantic markup for InChI's (thanx!).

That urged me to finally clean up my InChI extension for the software. One important step was to create a PHP page with only one InChI, such as this one. That would solve the something broken links in the CMLRSS feed, because of characters in InChIs that Apache cannot handle as the PHP page expects. Once that was done, I also pimped up the CMLRSS feed itself: I added a human-friendly name, the title of the blog item discussing the molecule, and the picture Cb downloads from PubChem:

Of course the feed is still CML enabled.

Friday, April 27, 2007

Ex-CUBIC get-together

Yesterday and today I was in Cologne to meet with other ex-CUBIC researchers from Christoph's research group on chemoinformatics (and with Alexandr). Not all former group members where there, but on the other hand we were complemented with Pascal:

(Yes, the sun was very bright :)

The program was consisted of a couple of group things, like making a short list of articles to write up in the next few months. Yesterday evening ended in a very nice Biergarten called the Altenberger Hof.

Tuesday, April 24, 2007

Bioclipse now allows QSAR descriptor selection

In preparation for the Embrace Workshop for Bioclipse in May, I am working on the QSAR functionality of Bioclipse. A nice extension point got set up some time ago, called DescriptorProvider, and implemented by plugins to allow calculation of one or more descriptors for the selected molecules. Now, the functionality for the resulting matrix has been around for some time too.

What had not been available yet, was some GUI stuff to select descriptors to calculate, and the actual calculation. While the latter is yet to be hooked up, the selection of descriptors is now available:

Interesting here is the use of OWL. CDK's DescriptorEngine provides a simple API written by Rajarshi that interfaces to the dictionary support for OWL (which CDK offers in addition to CML based dictionaries). All CDK descriptors are written up in OWL (the source file and the HTML version). You'll notice the weird characters in the screenshot; there something goes wrong with the encoding when reading the OWL.

Monday, April 23, 2007

CDK 1.0: a milestone after 7 year of development

Last night, I released CDK 1.0 as the previous release candidate did not show up new major problems. It is far from a perfect release (see these still TODO's and Nightly, run by Rajarshi), but the core is pretty solid.

I would warmly thank everyone who has contributed to the project in one way or another (I worked more on maintainance than implementing functionality), as it has been a great pleasure to make CDK releases. OHLOH runs a rather nice developer hall of fame for the CDK. You'll see that Christoph's research group is the major contributor. User contributions, however, are equally important and played a bug role in the quite large set of JUnit tests we have now (3300+).

Another reason why this is an important milestone, is that it is the last release I am creating. I wrote on the user list:

In advance of the actual CDK 1.0 release, thanx very much to all that contributed big *and* small ! It was a great 7 years of open source chemoinformatics development!

Hey, that actually sounds like I am stepping down... Well, it *is* time for a new generation to step up indeed. I won't leave the project, but being CDK News editor, CDK release manager, CDK code developer is a bit much for doing outside office hours. I feel that I have clearly enough made my point for open source chemoinformatics, and it is time for something else... which will
very likely involve the CDK, but likely more as user only... I was hoping in the past few years, that the transition would go smoothly, and have been trying to get people interested in various emails, including this one; however, being humans, we wait for the catastrophe and only after that we're shocked and start doing something about it. So, yeah, I'm forced to make this drastic announcement: CDK 1.0 will be the last CDK release *I* will make.

So, who wants to take over? Some one will have to. I, however, will put my focus on other things. Very likely involving the CDK, as there are still many things I want to do. Some things I have on my list:
  • the Java2D based 2D renderer/editor
  • more accurate atom type perception
  • more articles for CDK News
  • the book "CDK for Dummies"
  • improved structure generator
  • validation
  • ...

Saturday, April 21, 2007

Clustering web search results

The Dutch Intermediair magazine of this week had a letter sent by a reader introducing Clusty, a web search engine that clusters the results. It does a pretty good job for 'egon willighagen':

It seems to use other engine to do the searching and focus on the clustering. Source engine exclude Google, and include Gigablast, MSN and Wikipedia.

For chemoinformatics it comes up with the following top 10 clusters: 'Drug Discovery', 'Structure', 'Cheminformatics', 'Research', 'Books', 'Conference, German', 'Textbook, Gasteiger', 'Laboratory', 'Handbook of Chemoinformatics', and 'School'. Quite acceptable and useful clustering.

This might be the next step in googling. Rich, it also might solve your problem: searching for 'ruby chemoinformatics' does not give a 'Depth First' or 'Rich Apodaca' cluster :)

Friday, April 06, 2007

CUBIC period is over

The end of the CUBIC has come, and so did the end of my 1-year postdoc in the group of Christoph Steinbeck. It would have been much better if the group could have continued for one or two more years, so that we could harvest the fruit of the work done in the past years. Only having been group member since April 1 2006, I mostly contributed work to Bioclipse (doi:10.1186/1471-2105-8-59), CMLSpect (submitted), and integrating Miguel's mass spectrum prediction toolkit into SENECA (doi:10.1021/ci000407n) for structure elucidation. The latter topic is rather exciting and when the method shows powerful enough, this will have a major impact on the field of metabolomics.

BTW, importantly, my CUBIC email address is no longer valid, so please use one of my many other email addresses, e.g. my SourceForge one, or my Gmail account.