Pages

Saturday, April 24, 2010

Chem4Word goes Apache 2.0

Early March I reported about Konstantin's JChemPaint-based chemistry plugin for OpenOffice, but there is (friendly) competition: Chem4Word. Being for Microsoft Word, the plugin only works on top of proprietary software, unfortunately; therefore, I cannot tell you if Chem4Word release is any good, but what Jim has showed me about a year ago, it is pretty cool. Another big difference is that Microsoft gave the Chem4Word a big grant, and Konstantin does not have such funding, AFAIK, and relies on community support.

Now, Chem4Word was released earlier this month, as announced by Joe, and I just heard from Jim about it now being opensourced (and Peter blogged it too). Congratulations to all involved in the development! The Chem4Word project page indicates the actual license: Apache 2.0. Good choice!

Now, I said that a limitation of the plugin is that it requires proprietary software to run. This is why you will not quickly see my use it. Well, this is even why you do not see any screenshot! However, this should not spoil the news. This is for two reasons:
  1. The plug-in is Open Source: this means that the community can learn from their project, and how the make molecular structures in Word documents semantic.
  2. The plug-in saves the chemistry in the Chemical Markup Language in the XML-based Word document: this means that anyone will be able to extract the molecular structures in a semantic meaningful way.
And that's, to me, the biggest news: if the organic chemists start using this plug-in, this will be a big win for Open Data. I am sure this is the hidden agenda of an unorthodox move of our fellow Blue Obelisk community members.

Update: made clear I was not referring to hostile competition.

Thursday, April 22, 2010

CIP rules for stereochemistry

Uniquely identifying stereochemical enantiomers is an important aspect of data exchange of chemical structures. The simplest, most neglected solution is to pass around 3D models, but a lot of people like to stick to things like SMILES, or IUPAC names. Now, given that we want to uniquely represent the stereochemistry, we can use special rules. One example for enantiomers are the Cahn-Ingold-Prelog (CIP) rules.

The CDK does not have an implementation of (part of) the CIP rules. However, we recently started a collaboration with Dr Lars Carlsson in the Computational Toxicology, Global Safety Assessment group at AstraZeneca R&D Mölndal, headed by Dr Scott Boyer. Within this collaboration I have started an partial implementation of the CIP rules. The full set of rules is quite extensive, and some subrules are outside the scope of the collaboration. For example, we will likely not look at axial or helical stereochemistry within this collaboration. The kind of things it is able to do is distinguish between these mirror images (yeah, I should use Jmol, but ChemPedia needs more plugging right now: click the images):


The current patch is not looking into the problem of which atom is chiral; that problem is quite complex in itself, and Tim is writing up a nice set of blogs about that. Further, the current aims focuses only at application to atoms of ligancy four; that is, carbons.

The CIP rules uniquely define the stereochemistry of such a carbon, by uniquely ordering the ligands around the atom. Using rules the ligands are ordered, and they include rules defining priority based on atomic number, mass number, etc. It is the recursion that makes things more interesting, but I will not delve into the details of the algorithm here (see the aforelinked Wikipedia page instead, or a cheminformatics book like the one shown on the right). Here, I want to introduce some of the API of the current patch for the CDK.

Ligands and their Priorities
Core to the implementation are the CIP priority rules, that allow ordering of the ligand. So, we define a molecule, and ligands:
IMolecule molecule = parser.parseSmiles("IC(Br)(Cl)[H]");
ILigand ligand1 = new Ligand(
  molecule.getAtom(1), molecule.getAtom(2)
);
ILigand ligand2 = new Ligand(
  molecule, molecule.getAtom(1), molecule.getAtom(0)
);
ISequenceSubRule rule = new CIPLigandRule();
Assert.assertEquals(-1, rule.compare(ligand1, ligand2));
Assert.assertEquals(1, rule.compare(ligand2, ligand1));
This JUnit test looks at the chiral compound given earlier, but without specifying the stereochemistry using the @@/@ SMILES syntax; we get to that later. Here, the example defines two ligands around atom 1 (which is the carbon; the index starts at 0). The first ligand is the bromine, the second ligand is the iodine. Because the latter takes priority according to the CIP rules, the compare(ligand1, ligand2) returns -1.

The CIPTool
This CIPLigandRule is used in the CIPTool to provide more user-oriented methods. The goal, obviously, is this bit of code:
IMolecule molecule = parser.parseSmiles("ClC(Br)(I)[H]");
LigancyFourChirality chirality =
  CIPTool.defineLigancyFourChirality(
    molecule, 1, 4, 0, 2, 3, STEREO.CLOCK_WISE
  );
Assert.assertEquals(
  CIP_CHIRALITY.R,
  CIPTool.getCIPChirality(chirality)
);
Because we do not have 3D coordinates in our SMILES, we define the stereochemistry as CLOCK_WISE and ANTI_CLOCK_WISE. The former here means that, looking from the first ligand, following atoms 2, 3, and 4 are oriented in a circle in a clock-wise turn. This defines uniquely the geometrical orientation, but which changes between CLOCK_WISE and ANTI_CLOCK_WISE upon every atom-atom exchange. Therefore, we uniquely prioritize the ligands, project, and translate the resulting CLOCK_WISE or ANTI_CLOCK_WISE in the appropriate R and S stereochemistry.

That's all for now. Questions, ideas and others most welcome in the comment!

ACS Liveblogging 1st Disclosures of Drug Candidates

Carmen liveblogged via her twitter account the disclosures of drug candidates at the past ACS meeting, and later aggregated the tweets in her blog. While many of her tweets made it into the FriendFeed room, the structure she drew up and shared did not make it. And until just know, I was not aware the had tweeted those too. The first twitpic she pushed was:

Lead on Twitpic

and I hope they will all end up in ChemPedia. I've done the above (6-2018-7215-8416):


Each structure I have transcribed, I will tweet with the tags #acs_sf and #chempedia, like in:

@carmendrahl http://twitpic.com/1a3b3v -> http://ur1.ca/wfar #chempedia #acs_sf

Oh, and if you happen to know the drug candidates name (or company code), please do deposit it in ChemPedia!

Sunday, April 18, 2010

CDK-JChemPaint #5: the Groovy-JChemPaint repository

Oh, I forget to mention just earlier that I have set up a small git repository with the full Groovy demo scripts. Additionally, requests on further tutorials and/or bug reports can be filed in the matching Issues tracker.

CDK-JChemPaint #4: embedding the renderer into a Swing panel

Now that we covered the utmost basics of using the CDK-JChemPaint patch (see #1, #2, #3), it is time to move on. I am happy to hear that so many people have started using the new rendering architecture, either via the EBI JChemPaint Swing applet/application branch, or via the CDK-JChemPaint patch.

A couple of issues and questions came up (scaling not working as expected; how to layout reactions; how to get charges to show up), and I will look at those shortly. But before I get into those matters, I'll first show how to use the renderer with a Swing JPanel (I'll do the SWT alternative later). First, we need to subclass the JPanel:
class JCPPanel extends JPanel {

  IMolecule mol;
  AtomContainerRenderer renderer;
  int width;
  int height;

  public JCPPanel(IMolecule mol, int width, int height) {
    super();
    this.setSize(width, height);
    this.mol = mol;
    this.width = width;
    this.height = height;

    // generators make the image elements
    List generators = new ArrayList();
    generators.add(new BasicSceneGenerator());
    generators.add(new BasicBondGenerator());
    generators.add(new BasicAtomGenerator());

    // the renderer needs to have a toolkit-specific font manager
    renderer = new AtomContainerRenderer(
      generators, new AWTFontManager()
    );
  }

  public Dimension getPreferredSize() {
    return new Dimension(width, height);
  }

  public void paint(Graphics graphics) {
    // the call to 'setup' only needs to be done on the first paint
    renderer.setup(mol, new Rectangle(getWidth(), getHeight()));

    // paint the background
    graphics.setColor(Color.WHITE);
    graphics.fillRect(0, 0, getWidth(), getHeight());

    // the paint method also needs a toolkit-specific renderer
    renderer.paint(mol, new AWTDrawVisitor(graphics));
  }

}
The panel does not implement resizing, and it could consider caching the image too, to speed things up a bit. But, we'll use this as a starting point.

We can then embed this panel into a JFrame to make a small runable application:
int WIDTH = 600;
int HEIGHT = 600;

// create molecule
IMolecule triazole = MoleculeFactory.make123Triazole();
StructureDiagramGenerator sdg = new StructureDiagramGenerator();
sdg.setMolecule(triazole);
sdg.generateCoordinates();
triazole = sdg.getMolecule();

// create the frame
JFrame frame = new JFrame("Swinging CDK-JChemPaint");
frame.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);

JCPPanel panel = new JCPPanel(triazole, WIDTH, HEIGHT);
frame.getContentPane().add(panel);

frame.pack();
frame.setVisible(true);
The result is pretty much the same as with the created PNG, just with a window. But, this should get you started with using the new code base in your Swing-based application. If you need an impression on where this can get you, have a look at the applet developed by Chris' team. Likewise, a SWT-based application can be developed, of which Bioclipse is a full example. This shows one of the features of this new JChemPaint code base: it is widget set-independent. I am not aware of applications using other widget toolkits yet, though, but I am still hoping someone will use QtJambi to create a Qt-based JChemPaint port.

BitTorrents for Science

The idea has been lingering in the air for a long time now: sharing large science data sets using bittorrent. Over the past couple of years I have seen a lot of science related software being distributed over torrents, and the use in open source in general is abundant. Given a good network of so-called seeders, download times go down dramatically, and the overall energy consumption goes down too, as data has to follow a much shorter path.

It could very well be that the uptake of this technology for sharing data is only now coming about because only recently we started caring about Open Data licenses, which formally take care of rights of redistribution, which is obviously crucial to setting up a torrent network. Initiatives like the Panton Principles are changing this, even though we had a good deal of Open Source-licensed data for many years already.

So, with an increasing amount of Open Data, the time was now right, according to the authors Morgan and Jonathan, to set up BioTorrents, and publish a paper in PLoS ONE: BioTorrents: A File Sharing Service for Scientific Data (doi:10.1371/journal.pone.0010071). I have to admit, that I do not particularly like the design of the website, and I think it could do with more social web integration, but importantly, they provide a tracker. Trackers are key parts (well, they are being made obsolete, though I am not up-to-date with the state of that evolution), and work as a service discovery hub. Additionally, the website gives means to find data, and allow categorising torrents.

It is worth nothing that the uptake has been minimal so far, since the idea was posted last October. But it is slowly being picked up, or at least blogged about.

How to make BioTorrent work?
The success of BioTorrent will very much depend on the user base. This is common to social web applications, and a recent accidental loss of torrents is unforgivable; well, personally, I was happy to upload my torrent once more, but would not have done that if I had many torrents uploaded already. Torrent content is distributed, but the tracker information is not. Backup, backup, backup. Oh, and backup :) It happens to the best of us. Additionally, it is worth realising that the service needs to give something back to the user. Traditionally, I always thought this had the be of actual use, but a recent post by Rich actually suggested that even a game mechanic may be enough. Indeed, websites like ChemPedia.com and Blue Obelisk eXchange implement this by means of personal karma, allowing people to compete in high score lists. Also, APIs to integrate with other tools are crucial, such as personal RSS feeds to allow posting my new torrents to, for example, FriendFeed and Identica.

But, the by far most important feature for BioTorrents will be to set up a reliable network of seeders. I already mentioned this on FriendFeed where I suggested university libraries to get involved. Ideally, every library will act as seeder in torrent networks, so that typically, you will download the data directly from your local library, instead of the other end of the world. For data sets of GB size or larger, this is going to have an important environmental impact, on top of the much higher download speed.

Update: to make my message a bit more clear, please start uploading your torrents!

Langille, M., & Eisen, J. (2010). BioTorrents: A File Sharing Service for Scientific Data PLoS ONE, 5 (4) DOI: 10.1371/journal.pone.0010071

Monday, April 05, 2010

CDK-JChemPaint #3: rendering parameters

OK, one last CDK-JChemPaint tutorial for today (see #1 and #2). Rendering wasn't as much fun, if you could not tune it to your needs. JChemPaint has long had many rendering parameters, and one by one we are converting them to the new API. The following code is an modification to the first example, and adds some code to list all rendering parameters for the three used generators:
// generators make the image elements
List generators = new ArrayList();
generators.add(new BasicSceneGenerator());
generators.add(new BasicBondGenerator());
generators.add(new BasicAtomGenerator());

// the renderer needs to have a toolkit-specific font manager
IRenderer renderer = new AtomContainerRenderer(
  generators, new AWTFontManager()
);

// dump all parameters
for (IGenerator generator : renderer.getGenerators()) {
  for (IGeneratorParameter parameter : generator.getParameters()) {
    println "parameter: " +
      parameter.getClass().getName().substring(40) +
      " -> " +
      parameter.getValue();
  }
}
The output will look something like:
parameter: BasicSceneGenerator$BackGroundColor -> java.awt.Color[r=255,g=255,b=255]
parameter: BasicSceneGenerator$Margin -> 10.0
parameter: BasicSceneGenerator$UseAntiAliasing -> true
parameter: BasicSceneGenerator$UsedFontStyle -> NORMAL
parameter: BasicSceneGenerator$FontName -> Arial
parameter: BasicBondGenerator$BondWidth -> 1.0
parameter: BasicBondGenerator$DefaultBondColor -> java.awt.Color[r=0,g=0,b=0]
parameter: BasicAtomGenerator$AtomColor -> java.awt.Color[r=0,g=0,b=0]
parameter: BasicAtomGenerator$AtomColorer -> org.openscience.cdk.renderer.color.CDK2DAtomColors@49aacd5f
parameter: BasicAtomGenerator$AtomRadius -> 8.0
parameter: BasicAtomGenerator$ColorByType -> true
parameter: BasicAtomGenerator$CompactShape -> SQUARE
parameter: BasicAtomGenerator$CompactAtom -> false
parameter: BasicAtomGenerator$KekuleStructure -> false
parameter: BasicAtomGenerator$ShowEndCarbons -> false

CDK-JChemPaint #2: rendering reactions

I posted earlier today a Groovy script to render molecules with CDK-JChemPaint 8. Now, the new JChemPaint rendering engine also contains the functionality to render reactions. So, I can also do:
$ groovy renderReaction.groovy

The matching script:
import java.util.List;

import java.awt.*;
import java.awt.image.*;

import javax.imageio.*;
import javax.vecmath.*;

import org.openscience.cdk.*;
import org.openscience.cdk.geometry.*;
import org.openscience.cdk.interfaces.*;
import org.openscience.cdk.layout.*;
import org.openscience.cdk.renderer.*;
import org.openscience.cdk.renderer.font.*;
import org.openscience.cdk.renderer.generators.*;
import org.openscience.cdk.renderer.visitor.*;
import org.openscience.cdk.templates.*;

int WIDTH = 600;
int HEIGHT = 600;

// the draw area and the image should be the same size
Rectangle drawArea = new Rectangle(WIDTH, HEIGHT);
Image image = new BufferedImage(
  WIDTH, HEIGHT, BufferedImage.TYPE_INT_RGB
);

IMolecule benzene = MoleculeFactory.makeBenzene();
IMolecule triazole = MoleculeFactory.make123Triazole();
IReaction reaction = new Reaction();

StructureDiagramGenerator sdg = new StructureDiagramGenerator();
sdg.setMolecule(triazole);
sdg.generateCoordinates();
triazole = sdg.getMolecule();
sdg.setMolecule(benzene);
sdg.generateCoordinates();
benzene = sdg.getMolecule();
try {
GeometryTools.translate2DCenterTo(benzene, new Point2d(-4,0))
GeometryTools.translate2DCenterTo(triazole, new Point2d(4,0))
} catch (Exception e) {
 e.printStackTrace();
}

reaction.addReactant(benzene);
reaction.addProduct(triazole);

// generators make the image elements
List generators = new ArrayList();
generators.add(new BasicSceneGenerator());
generators.add(new BasicBondGenerator());
generators.add(new BasicAtomGenerator());

 List reactiongenerators =
  new ArrayList();
reactiongenerators.add(new ReactionArrowGenerator());
reactiongenerators.add(new ReactionPlusGenerator());

// the renderer needs to have a toolkit-specific font manager
Renderer renderer = new Renderer(
  generators, reactiongenerators, new AWTFontManager()
);

// the call to 'setup' only needs to be done on the first paint
renderer.setup(reaction, drawArea);

// paint the background
Graphics2D g2 = (Graphics2D)image.getGraphics();
g2.setColor(Color.WHITE);
g2.fillRect(0, 0, WIDTH, HEIGHT);

// the paint method also needs a toolkit-specific renderer
renderer.paintReaction(reaction, new AWTDrawVisitor(g2));

ImageIO.write(
  (RenderedImage)image, "PNG", new File("reaction.png")
);

CDK-JChemPaint #1: rendering molecules

I reported earlier that the CDK-JChemPaint patch is now a clean add-on from the CDK releases. This means that you download cdk-1.3.4.jar and cdk-jchempaint-8.jar separately, put them in your class path, and get started with, for example, Groovy:
$ export CLASSPATH=cdk-1.3.4.jar:cdk-jchempaint-8.jar
$ groovy renderMol.groovy

I have tuned to code in this tutorial by Gilleain a bit, resulting in this code:
import java.util.List;

import java.awt.*;
import java.awt.image.*;

import javax.imageio.*;

import org.openscience.cdk.*;
import org.openscience.cdk.interfaces.*;
import org.openscience.cdk.layout.*;
import org.openscience.cdk.renderer.*;
import org.openscience.cdk.renderer.font.*;
import org.openscience.cdk.renderer.generators.*;
import org.openscience.cdk.renderer.visitor.*;
import org.openscience.cdk.templates.*;

int WIDTH = 600;
int HEIGHT = 600;

// the draw area and the image should be the same size
Rectangle drawArea = new Rectangle(WIDTH, HEIGHT);
Image image = new BufferedImage(
  WIDTH, HEIGHT, BufferedImage.TYPE_INT_RGB
);

IMolecule triazole = MoleculeFactory.make123Triazole();
StructureDiagramGenerator sdg = new StructureDiagramGenerator();
sdg.setMolecule(triazole);
sdg.generateCoordinates();
triazole = sdg.getMolecule();

// generators make the image elements
List generators = new ArrayList();
generators.add(new BasicSceneGenerator());
generators.add(new BasicBondGenerator());
generators.add(new BasicAtomGenerator());

// the renderer needs to have a toolkit-specific font manager
AtomContainerRenderer renderer =
  new AtomContainerRenderer(generators, new AWTFontManager());

// the call to 'setup' only needs to be done on the first paint
renderer.setup(triazole, drawArea);

// paint the background
Graphics2D g2 = (Graphics2D)image.getGraphics();
g2.setColor(Color.WHITE);
g2.fillRect(0, 0, WIDTH, HEIGHT);

// the paint method also needs a toolkit-specific renderer
renderer.paint(triazole, new AWTDrawVisitor(g2));

ImageIO.write((RenderedImage)image, "PNG", new File("triazole.png"));

Friday, April 02, 2010

New Blue Obelisk Exchange online at Shapado.com

StackOverflow has served as very well in the past couple of months with the Blue Obelisk Exchange. The BOx was taking advantage of a beta project of StackOverflow, and users of that program can switch to a payed plan after the beta phase was over. That moment is nearing, but the pricing model is just unrealistic for us. There were also some comments on the Blue Obelisk using proprietary software (see Stackoverflow not open source — not a problem?).

To address both issues, and with the help from the Shapado.com developers, I have moved the (CC0) data to the the new host, http://blueobelisk.shapado.com/. I got a report today that there were some accounts mixed up (affecting four accounts), but that seems to be resolved now. The Shapado software has an OpenSource license (AGPL), and the code can be downloaded at Gitorious. The new questions page looks like:

Thursday, April 01, 2010

OpenTox Virtual Seminar, Thursday April 1, 11.00 CEST

Today I gave a presentation of my research, the CDK, Bioclipse, and recent RDF work:


Because the virtual talk required me to reboot into Windows, I could not give a live demo of the development version of Bioclipse, showing new stuff in action. The audience had to with just screenshots :( I'll try to make a screencast soon.

The Bioclipse OpenTox plugin currently has this functionality:
  1. download data sets (myexperiment:1008)
  2. list available algorithms and descriptors (myexperiment:1204)