Saturday, February 25, 2006

Hacking InChI support into

Earlier I reported about, and needed some diversion from my manuscript work (could no longer think straight about the article I'm working on). So time for some reading up on new technologies. Timing was perfect, because the source code of got just uploaded to SourceForge SVN.

Though the author marks it as not-well-documented and alpha, I was quite happy to see a clear modularisation, and good enough docs to get me started with InChI support: if it can do mining for papers on DOIs, then it can do mining for InChI's too. Here's the result:

It does not show which blog items cite this compound, not does it extract some molecular info from PubChem, but I'm happy with the result of four hours of hacking. BTW, the first two InChI's are left overs from bad regular expressions :)

Open source Jmol taking over the world

Earlier I already reported that student text books were picking up Jmol as 3D viewer. Now, Nature Structural & Molecular Biology reports (DOI: 10.1038/nsmb0206-93) that they picked it up too, using FirstGlance in Jmol (thanx Peter, for reporting this on the Blue Obelisk mailing list!). And, thanx Eric, for acknowledging the hard work of the Jmol developers.

An example article in this Nature publication is Crystal structure of the essential N-terminal domain of telomerase reverse transcriptase by Jacobs et al. (DOI: 10.1038/nsmb1054) about the structure of a part of the telomerase reverse transcriptase (FirstGlance: 2B2A). You can easily google for more articles as they get indexed.

Note that FirstGlance is certainly not the only webinterface using Jmol! An overview of websites using Jmol is found in the Jmol wiki. Those who are not convinced yet, please check out PubMed and search for Jmol there.

And, yes, this makes me a proud Jmol developer!

Friday, February 24, 2006

Novel QSAR and QSPR descriptors?

For the past few weeks I have been working on a review article, which will contain a section with new QSAR/QSPR descriptors published in the period 2000-now. Here are a few:

If you know additional new descriptors, or feel like discussion one or more of the above, please leave a comment.

Wednesday, February 22, 2006

BlueObelisk: OpenSource, OpenData and OpenStandards

OpenSource, OpenData and OpenStandards are not as strong in chemoinformatics as they are in bioinformatcs, where it is common knowledge that sharing is a good. Today, the JCIM published on the web an article about the Blue Obelisk movement, which promotes these three idealogies.

Several open source projects participate, amongst which the CDK, Jmol, JOELib, OpenBabel, Chemical Markup Language, Bioclipse and Kalzium.

Saturday, February 18, 2006

Blogging chemistry on

You might have read earlier posts in this blog on CMLRSS, and received a question today on how to integrate CMLRSS with blogs on Now, current CMLRSS feeds are normally generated with customized scripts, often directly from a database.

So, here's my attempt to include CML in a blog. OpenBabel 2.0 can create good CML, for example for acetic acid [1]:

Nothing much to see, right? Well, that's good, because it's inserted as CML, not as anything readable, like this equivalent:

<cml:molecule xmlns:cml="">
<cml:atomArray atomID="a1 a2 a3 a4" elementType="C C O O" formalCharge="0 0 0 0"/>
<cml:bondArray atomRef1="a1 a2 a2" atomRef2="a2 a3 a4" order="1 2 1"/>

I am curious how this will come out in the RSS feed. Maybe it is usefull; please read the comments for additional notes.


Friday, February 17, 2006

Chemical reactions in CML

Gemma Holiday's article on CMLReact was published in the january issue of the JCIM (DOI 10.1021/ci0502698), which seems to be marked as sample issue right now. She used CMLReact as data format for MACiE (see DOI 10.1093/bioinformatics/bti693), a database of 100 enzyme reactions, with fully annotated reaction mechanisms, making this an remarkable and insightfull database.

Now, the nice thing is that this CML should be readable and renderable by the CDK, though the webinterface uses SVG and can be used using FireFox too.

Update: fixed DOI of CMLReact article.

Wednesday, February 15, 2006

Hot articles; mining the semantic web

Roland Krause discussed today in his blog Notes from the Biomass an interesting website: This website, still marked BETA, mines blogs in the field of genomics and extract noteworthy statistics from it: which articles are cited in those blogs.

For example, the most discussed article is Kai Wang's Gene-function wiki would let biologists pool worldwide resources in Nature. Additionally, links to the DOI, PubMed and shows which blogs discuss the article.

Wow. This really shows what happens when you start doing things in a semantic way!

Now, what does this mean to the molecular web? We already have chemistry enriched blogs, i.e. CMLRSS. Now, let's make a website that mines chemoinformatics blogs in the same way that does, and not stick with statistics for article citations, but add statistics for citing molecules too! Start discussing the molecules we find in our CMLRSS feeds!

Monday, February 06, 2006

Tagging blog items

If you have read my previous post and visited that other blog, you might have noted the Technorati keywords. Or tags, really, as explained in this rel="tag" microformat. Adding them to blog items, will enable indexing by Technorati, one of the bigger blog search engines. So, from now on, you'll see these tags in my items too, hoping they don't get annoying. No idea, btw, how blog planets respond to them... For the record, the tags I list below are general for my blog, and not for this blog item specifically.

A blog about bioinformatics, semantic web, comics and social networks.

I never got around to mentioning this blog, but YAKAFOKON is a nice blog about, as the titel already says, bioinformatics, the semantic web and social networks. Nice to read, and interesting comments on the function and features of the internet and how they relate to bioinformatics, and science in general. Recommended!

An test suite for free, open source JVMs

This weekend I continued my work on getting the CDK and Jmol run with free, open source JVMs. Really, a lot works fine, as reported earlier in this blog: JChemPaint works and Jmol almost works (see the Classpath's FreeSwingTestApps wiki page), and well over 95% of the CDK JUnit tests run without trouble too. So it comes down to identifying what does not run properly, and file bugs for this. For example, 26101 and 26108.

To make this finding bugs in Classpath and the free virtual machines easier, I have setup a CDK based test suite: the CDK OpenSource JVM Test Suite. The idea is it can be used for regression testing, and identification of bugs in the virtual machines. It can also be used to do timing benchmarks, and I will report on both of these soon.

But I first need to write some scripts to make nice XHTML pages. And, I have tweaked the CDK tests to skip known bugs, so that all reported bugs are actually caused by the virtual machine and the Java library that it uses, and not by a bug in the CDK itself.

Saturday, February 04, 2006

Skype on Kubuntu using a Tiptel USB telephone

Because I wanted to test internet telephony I downloaded Skype and tried to get it to work on my Kubuntu system. Unfortunately, the Skype version is only, and it does not work well with arts :( That is, using artsdsp it crashes with segfaults whenever I start even a chat, let alone a phone call. This could be worked around by disabling sound in my KDE session, and then the /dev/dsp is open again.

Better even, I bought a USB telephone yesterday: a reasonably cheap Tiptel 115, with Skype support. Kubunty breezy recognized the USB device, added a /dev/dsp1 and after running alsamixer to raise the sound levels, it seems to work fine, though did not have an actual phone call yet :) I enabled KDE sound again, which is in the first device, and Skype runs on the second. No more segfaults it seems.

Thursday, February 02, 2006

Dutch Google News themes messed up

Recently, a Dutch version of Google News was started, and might mean a replacement for I do not like the the verbose layout much, because it makes it more difficult to scan headlines. I do like the themes. Except for one.

The English theme 'Sci/Tech' is Wetenschap in the Dutch version, or plain Science. And it annoys me to read IT headlines when looking up scientific news. Is a IE 7 beta really science, or did the translators mess up? (If any Google employee is reading this: please split up those two themes.)

Wednesday, February 01, 2006

Open source Jmol hits student text book Biochemistry

Today I received news on the Jmol user list that Lubert Stryer's Biochemistry replaced the proprietary Chime with the open source Jmol. The third edition from which I learned biochemistry in my first year at the university did not feature a CD with live figures, but I am very thrilled to see a program on which I have actively programmed hit a text book I used myself in the past.