Friday, August 25, 2006

Chemical blogspace

We all know chemical space; Chemical blogspace (Cb) is different: it is the chemistry discussed in blogspace. Cb is build on the opensource software of which I bloged on before. The now running Cb aggregates 19 blogs and, like the original, extracts linked (cited or reviewed) articles from literature.

The system is beta, but I am happy about it already that I mention it now. For example, some article titles are not properly recognized, and some journals are known in the statistics in several formats. And, more importantly, I have not yet hooked in the InChI support I developed earlier.

So, if you like the idea, or know other interesting scientifically interesting chemistry blogs, leave a comment, or send me email.

R News special issue on chemistry

R News just released a special issue on the use of the versatile statistics program R in chemistry. It features six articles amongst which one by Rajarshi Guha on the CDK-R bridge, and one by my supervisor and me on the use of self-organizing maps to cluster crystal structures.

Tuesday, August 22, 2006

Bioclipse gets a new extension point

I hacked in a new extension point for Bioclipse yesterday, based on a proposal I made earlier. The new extension point (EP) is called ChildResourceCreator and allows creating child resources for a given IBioResource. One application where this is very useful is the CMLRSS application (earlier blog), or any RSS or Atom enriched with any other XML language. Here, child resources are created for each feed entry resource with as content the foreign XML, e.g. the CML bits in the blog.

Other applications involve complex documents, which is basically most existing documents. Take, for example, the PDB format from the PDB database. These PDB files contain a pletory of information including one or more protein structures, sequences and bibliographic information. Bioclipse supports each of those using the CDK, BioJava and JabRef libraries.

By making extension for the ChildResourceCreator EP, I am able to setup a general PDBResource (with Bioclipse's syntax highlighted PDB editor), and child resources for the different bits of information. Bioclipse 1.0, however, only allow looking at the molecular structure(s) in the file, not at the sequence, nor the references. Will post the obligatory screenshot asap.

Monday, August 21, 2006

CML Explained

Recently, a new generation of Chemical Markup Language CML users seem to hit the learning-curve-wall; there seems to be a niche in explaining the use of CML, so here goes. My new (third) blog will discuss frequently and less frequently asked questions about the use of CML.

Friday, August 18, 2006

Small java applet for 2D structure drawing

Trepalin et al. published in Molecules the article A Java Chemical Structure Editor Supporting the Modular Chemical Descriptor Language (MCDL) (open access PDF). The applet is about 250kB (though the article mentions 200kB) in size and downloadable from the MCDL project on SourceForge (license: Public Domain). The article compares the applet with the JChemPaint applet and notes that their applet is much smaller. Both allow a template database for automated structure diagram generation, and the database that comes with the MCDL applet contains 105 fragments, whereas the JChemPaint applet contains a few.

The article also discusses the algorithm they use to deduce bond orders, starting from the MCDL, a problem CDK is struggling with when dealing with SMILES strings.

Monday, August 14, 2006

Classpath 0.92 has been released

Bling! Bling!. Mark Wielaard announced the GNU Classpath 0.92 release, with the following changes: an alternative awt peer implementation based on Escher that uses the X protocol directly. Various ImageIO providers for png, gif and bmp images. Support for reading and writing midi files and reading .au and .wav files have been added. Various tools and support classes have been added for jar, native2ascii, serialver, keytool, jarsigner. A GConf based util.peers backend has been added. Support for using alternative root certificate authorities with the security and crypto packages. Start of and runtime lang.managment runtime support. NIO channels now support scatter-gather operations.

GNU Classpath

This means new items on my TODO list: remove the dust from the CDK based test suite, test if Jmol, JChemPaint, Taverna still work, and report the outcome on the Classpath website. I wonder how the Cairo and Escher patches for AWT and Swing affect my favorite chemblaics tools.

BTW, that the Classpath team appreciates such testing efforts is clear from the foto in the 'Bling! Bling!' blog by Mark mentioned above.

Thursday, August 10, 2006

Fortran and XML: FoX reads and writes CML

Mix one of the oldest and one of the latest computer technologies, and you get FoX (BSD license), a Fortran library for reading and writing Chemical Markup Language, and thus XML. Amazing, what Toby White achieved, though he did not start from scratch: "FoX evolved from the initial codebase of xmlf90, which was written largely by Alberto Garcia and Jon Wakelin." (source: cml-discuss mailing list).

Sunday, August 06, 2006

new Atom(Elements.CARBON);

Something I have not completely comfortable with about the CDK in the past, is the way Atom's are constructed:

IAtom carbon = new Atom("C");

Not that it is horrible code, but the CDK has an Element too. Why not reuse that? However, until revision 6755 there were not constructors that allowed something like the following:

IAtom carbon = new Atom(new Element("C"));

This afternoon I have hacked in constructors for ChemObject, Element, Isotope, AtomType, Atom and PseudoAtom that allow to be constructed from its interface, or the interface of one of its superclasses.

Additionally, in revision 6753, I added cdk.config.Elements with static IElements for all elements up to atomic number 116, taken from the Blue Obelisk Data Repository. Therefore, I can now also write:

IAtom carbon = new Atom(Elements.CARBON);

Thursday, August 03, 2006

BlueObelisk components in Japanese

Technorati is nice in several ways, one being the feature to set up a watchlist. I have set watches on chemoinformatics, Jmol, Bioclipse and a few more. This allows me see the latest blog items on these topics. Often, the point to Asian blogs, mostly Chinese and Japanese, which I mostly find hard to read. Funny characters with Jmol somewhere in the sentence :)

Yesterday, I found this way a rather interesting Japanese blog, called ケムインフォマティクスに虚空投げ, which I still can't read, but which has a lot of small code fragments. (Can someone please translate the title for me??) The last 10-ish items discuss fingerprints calculation with the CDK and JOELib, some SMARTS work with JOELib, and some discussion on neural network tools.

Tuesday, August 01, 2006

CDK and the Java 6 beta

Recently, a second beta of Java 6 was released, which triggered a patch for the Debian java-package package. It was a Bioclipse bug report today, however, which made me patch my java-package setup and install the beta.

So, next thing was to try to get the CDK compile with the Java 6 beta. Because our build system uses JavaDoc (anyone with a pointer with a easy to use Java parser, which parses JavaDoc too?), and because this setup is different for literally every platform and Java version, the build.xml needed some tweaking (patch 6719 and 6721). Additionally, a number of source files were marked as needing Java 1.5, while they actually depend on features introduced in Java 5 (aka 1.5) and which are present in Java 6 (aka 1.6) too, so that needed some tweaking too (patch 6720).

I have no idea what Java 6 will change and/or introduce, but I did note some comments on it being faster, which is always a good thing. The JUnit test timings seems to agree with this. While my Java 1.5.0_06 installation needed 204 seconds (no duplicates), Java 1.6.0_beta2 needed only 168 seconds (no duplicates), and improvement of 18%.