## Wednesday, November 30, 2005

### Getting Started with Eclipse and the SWT

Getting Started with Eclipse and the SWT is a very nice set of introductory tutorial on working with SWT and Eclipse in general. The tutorials cover the basic, advanced SWT widgets, SWT layout, and several other interesting topics.

Now that Bioclipse is gaining speed, it is a must-read.

### KDE 3.5 is out

KDE 3.5 was released with lots of changes. SuperKaramba is now a standard KDE application and is neatly integrated. It allows embedding themelets on your desktop background:

It shows several themelets: the weather, a calender, a toolbar with applications, a FoldingAtHome monitor, the contents of the clipboard, the music that is playing (Cake) and a simple todo list. All customizable up to the pixel.

And before I forget: a nice new Kalzium release!

## Monday, November 28, 2005

### A Blue Obelisk blog Planet

Today I setup a blog planet for Blue Obelisk members. First I tried Chumpologica but it did not read Atom feeds.

Next in line was Planet, which turned out to be used by many big planet sites, like Planet Debian. It also works with Atom feeds in general, but not well with Atom 1.0 feeds, like that of Carsten. After some googling I found a patched version which did the job.

The result is at http://www.woc.science.ru.nl/planetbo/, but I hope that someone can arrange a http://planet.blueobelisk.org/.

## Sunday, November 27, 2005

### Open Source Swing: Jmol renderer runs!

Where I was able to mention earlier that JChemPaint now runs with free (as in open source) Java virtual machines, I just tried to run the core Jmol renderer, using the Integration.java which comes as an example:

The screenshot was made with jamvm 1.3.3 and classpath 0.19.

It is very slow, however. I have not tried it with other free virtual machines, which are supposedly faster. It is a good start nevertheless: it means that a Jmol based Bioclipse plugin will work with free virtual machines too.

## Wednesday, November 23, 2005

### Machine crash; SVN went along

Don't happen often, but my machine crashed two hours ago. Not a big deal, because I have my important files in SVN. Oh wait, SVN had a commit in progress during the crash. So, svn recover. Mmmm... doesn't work either. OK, SVN FAQ: try db_recover. That worked. No, it did not: svn commit still not working for the files I was trying to commit. Fortunately, I make regular SVN db backups so I created a brand new SVN repository from scratch and recovered the back up. That worked. Really.

## Monday, November 21, 2005

### Bioclipse: the chemo-/bioinformatics workbench

Some weeks back there was the CDK5AW, the CDK 5th anniversiry workshop. A small group of international open source chemo-, bioinformatics software developers met, among which two from Sweden. It was then decided to generalize their work resulting in Bioclipse:

http://www.bioclipse.net/

It's heavily using the Eclipse Rich Client Platform, making additional plugins trivial. OK, if this does not convinve you: check the screenshots on the Bioclipse website.

It's a killer, really! Ola, Martin: great work!

PS. I am going to try to run it with free Java virtual machines this weekend, but if you have a working solution earlier than that, please leave a comment and screenshot in the comments.

## Sunday, November 20, 2005

### Open Source Swing: JChemPaint runs!

Thanx to Mark's encouragements, I tried to run Jmol and JChemPaint with jamvm.

Jmol fails with an NullPointerException, but JChemPaint runs! And note that this was not even running with the latest of the latest; just recent packages from Kubuntu! Yes, there are some glitches, but I'm happy nevertheless!

## Friday, November 18, 2005

### The goal: a live chemblaics CD

This evening I have been looking at with the KNOPPIX customization howto, and ran many of the interesting commands. I've setup a environment with Kalzium, OpenBabel, CDK, jython, PyMOL, and for development I included gcj and Eclipse. At some later point I will include kfile_chemical too, but I want to make a deb package first.

Moreover, I also wanted it to include JChemPaint, Jmol and Taverna (with the CDK extension). However, these depend on Swing, which is not suffiently provided by open source java virtual machines. I attempted gij 4.0, kaffe and sablevm, all without success.

A live CD with all the open source chemo- and bioinformatics tools would be a real killer. We could take a burned live CD with us to conferences and have others run our software on their laptop! But we need to stop use Swing. Fortunately, there seems to be a serious project going on to port JChemPaint and Jmol to a free Java GUI environment, so maybe we can have the live CD up and going before the 2006 conferences start.

## Thursday, November 17, 2005

### Back from the 1st GCC

OK, just back from the first German Chemoinformatics Conference, which I enjoyed very much. A rather interesting program, and lots of interesting posters too. You can read the programme online, and will not spend too many words on that (at least not now). But what I will do is point out some interesting posters here.

One poster was on the Molecular Query Language (MQL) by Ewgenij Proschak from Frankfurt. You can read more on this in the latest CDK News as it is implemented for the CDK too. The opensource implementation is expected next year.

Another interesting poster was on the use of ontologies to connect chemistry and biology. This poster was by Juergen Harter from BioWisdom, a Cambridge, UK based company.

Marc Zimmermann had a poster on the chemical OCR variant, called chemical structure recognition (CSR). This process converts images, for example scanned from literature, into a connectivity table. Difficult task, indeed. This page contains some information about this project.

There were other interesting posters too, so will probably report on those later too. But do feel free to leave comments to this blog post, discussing other interesting posters.

## Friday, November 11, 2005

### Going to the German Chemoinformatics Conference

This sunday starts the first German Chemoinformatics Conference in Goslar. It's an interesting programme, with presentations on the InChI, PubChem, 25 years of chemoinformatics, the chemical semantic web, and much more.

Among these presentations is mine, on comparing crystal structures (PDF) and deducing cell parameters. But I'm having a poster on QSAR too.

I'll arrive on saturday afternoon in Goslar, so leave a message at the conference hotel if you want to meet up, and talk about my work, or yours, or the CDK, KDE, JChemPaint, Jmol, kfile_chemical, Kat/Chemistry, BlueObelisk, Eclipse, R, or whatever else... I plan to have a modest german meal and one or two beers in the evening.

BTW, after Belém (Lissabon), Sintra, Boppard, Kinderdijk, Hoorn and Cologne, it's the 7th UNESCO world heritage site I'm visiting in just 14 months! Can't we just have conferences in Hawaii and sorts, like they do in other fields?? Oh, wait, we do: EuroQSAR is on a cruise boat.

## Thursday, November 10, 2005

### Scons and bksys for kfile_chemical

Not so long ago, it was decided that KDE 4.0 will use SCons as a configuration and building tool, instead of the autotools and make: the common ./configure && make && make install which has served the open source community very well for so long.

SCons is different in several ways. One of these is that the tar.gz packages it produces are some 500kB smaller, which makes a huge difference for kfile_chemical which is now 121kB instead of 635kB.

Now, the KDE community, or Thomas Nagy to be precise, developed a helper for KDE software, called bksys. Version 1.5.1, however, did not contain an example directory for kfile plugins, but I managed to work something out starting from the configuring scripts from kdissert, and ended up with these SConstruct and config.bks.

Now, I haven't figured out how to include the translations, but will figure that out sooner or later... for now, I'm quite happy with the new build system.

## Tuesday, November 08, 2005

### A R GUI: rkward

The great thing about open source is that... it's open.

When I was browsing the internet just now, I dropped in on KDE Dot News. In the rightside column, there is a feed of new KDE software from KDE-apps.org. A new version of my favoriate music player,
amarok, lured me to the KDE-apps website, where I saw rkward is latest announcement. The funny name, and the categorization as scientific, triggered some interest on my side, and it turned out to be a graphical frontend to my favorite statistics program, R.

Ok, they had a Debian package, and the debian/ build dir in the tar.gz so I downloaded it and started making a Kubuntu 5.10 package. While doing this I saw some notice about the R syntax highlighting used, which conflicts with the older version in the Kate packages.

Then I realized that a long time ago, I wrote such syntax highlighting for Kate, so my attention was lured again. And, indeed, they use my syntax highlighting, though extended later (somewhere down the page).

And this makes me happy. The syntax highlighting was useful to me in the past, but apparently to a lot of other people too. And because I released it as GPL, back then, it now appears in rkward! Yes, a really like open source :)

### When to stop including QSAR model variables...

Yesterday I reviewed an article which published a QSPR model which looked something like:

y = 151 + 50p1 - 12p2 - 0.006p3

with quite OK prediction results (R=0.9880). But I was not quite comfortable with the coefficient for the p3 variable. The article did not calculate significances for the coefficients, so it was not obvious from the article wether is was useful to include them. I then looked at the range for p3, which was 110-150; so, the maximal influence this variable can have is 150*0.006 = 0.9. Now, the experimental values given in the article were rounded to integers, indicating that the maximal
effect of the p3 variable is smaller than the experimental error! It's even worse when you consider the difference between the min and max value (40), then the influence would even be smaller (assuming that most model methods would put the mean temperature effect in the offset, 151 in this case).

Today, I reread an article with a similar issue. The model was something like:

y = -0.81 + 0.03*p1 + 0.009*p2

Here, max(p2)-min(p2) is a smaller than 100, so the maximal effect of the variable would be in the order 0.9, which is of the same order of the root mean square error of prediction (RMSEP) for this model. Indeed, the article already states that the coefficient is only significant at the 95% level, and not at the 99% level. But, without having calculated the RMSEP for a model without the p4 variable, I would guess that leaving it out would give equally good prediction results.

Concluding, I would say the the p2 variable does not include relevant information.

Do you think it is reasonable to include the p2 variable in the second model?

## Monday, November 07, 2005

### Ubuntu Dapper will include chemistry features

I just read that the Kubuntu team wants to include Kat in the dapper release (scheduled for April 2006). Kat is (to be) the KDE equivalent of Google's desktop search bar.

This is great news for us chem-bla-icians, as Kat has support for full text searching of chemistry files! Let's see if I can get the Kubuntu team to package up kfile_chemical too, which will extend Kat (and KDE in general), with extraction of meta data from chemical documents.

Update: Dapper will be released next year, not in 2007.

## Wednesday, November 02, 2005

### Open Source data mining in chemoinformatics

On the 7th International Conference on Chemical Structures Jeroen Kazius has a poster on finding discriminative substructures, that is, molecular fragments which can be discriminate between two acitivity classes. The software is released as Gaston, is written in C++ and has the GPL license.

Later I encountered MoSS which has the same goal, but uses a different algorithm. MoSS is written in Java and uses the LGPL license. MoSS reads STN and SMILES as input, which might not be optimal for all users, so a CDK port comes to mind.

### R/CDK install fails on GCC 4.0 systems

Some time ago Rajarshi Guha introduced R bindings for the CDK (see his CDK News articles), and today I tried to install his rcdk package that makes it happen.

However, it requires SJava which compiled fine on other machines, but not on my AMD64 machine. The problem seems to be related to the GNU GCC 4.0 compiler I have installed. Compiling with 3.4 works fine, but 4.0 complains with:
CtoJava.cweb:215: error: static declaration of 'std_env' follows non-static declarationCtoJava.cweb:195: error: previous declaration of 'std_env' was here

Googling, learned me that I am not the only one with this problem, but did not find any solution. If you know how to fix this problem, please leave a message in the comments.

## Tuesday, November 01, 2005

### The annual Lunteren meeting

Most Dutch chemists have their annual Lunteren meeting, so do I. Lunteren is a small village on the Veluwe where nothing much can be done, except for listening to the presentations. I participate in the Lunteren meeting for analytical chemists, i.e. HPLC, MS, GC and all their combinations upto and including HPLC/MS/MS, and since a few years the Lab-on-a-Chip stuff. And, as such, in many cases a lot of details on how to use and develop these methods.

For a computational chemist, this often is too much practical detail on too little -ics. Fortunately, the proteomics, genomics, etc is a strong upcoming funding subject, so data analysis is getting in their picture too. Which is good for someone with a chemometrics/chemoinformatics background as funding in that area is getting smaller every year.

My presentation went reasonable well, as far as I can tell myself. I was very nervous with both my professor and some 150 other people in the audience, but managed to not wander off the main topic. However, I was told to be a bit too monotone, but that's an unfortunate effect of being so nervous.