Friday, May 30, 2014

International Conference on Chemical Structures 2014

This Sunday the International Conference on Chemical Structures starts. If you aren't joining, it is important you know how to keep track of things online. First, follow the #iccs2014 hashtag on Twitter, and use the hashtag on any social platform. For example, I will bookmark papers mentioned in presentations on CiteULike. And slides that speakers put online, as well as coverage of other kinds, I'll link on Lanyrd. If you want to know what to expect, read this abstract book. And, of course, if you are attending the meeting, you can still join the online discussion.

Wednesday, May 28, 2014

Pathway analysis for Malaria research

A recurrent theme in my blog is that an easy way to support Open Science is to just join the show. You do not have to contribute a lot to have some impact. Of course, sometimes what you do has more impact than other times. Sometimes something with initially little impact gets high impact later. This is hard to predict, but maybe as well as the stock exchange. In the past I have contributed effort to many Open projects, often small bits, some things never get noticed (like my Ant man page in Debian which is more than 10 years old :).

One project I have long wanted to contribute to, is the Open Source Malaria project, which is brilliantly led by Matt Todd. I had two principle ideas:

  1. use Bioclipse to run the Decision Support against the OSM compounds
  2. do pathway analysis on malaria data
  3. use the AMBIT-JS to put all the OSM compounds online as a HTML page
The first and third I still have not gotten around to finishing. The first is a very simple way for you to contribute. The key question here is just to see how the compounds can be made less toxic / have less side effects. And Bioclipse can visualize this easily, based on various toxicity models, among all those from OpenTox. Really, a four hour job.

PCA results from
for the four sample groups.
The other task is more difficult, and I am really happy that Patricia Zaandam started a ten week internship with me to work on this task. She has been blogging her progress, and I strongly invite you to read her blog and comment (ask questions, post ideas, give criticism), as Open projects are driven by Open communication. Because WikiPathways has most pathways for human, Patricia looking at human expression data. And in five weeks time, she did the preprocessing of the raw data using and did the pathways analysis using PathVisio, resulting in this shortlist of pathways. And now the hard part starts: biological and methodological validation of her approach.

There is plenty of room for feedback. I am not at all a malaria expert, and learning a lot from her study. Some questions we welcome expert input in (as independent test set validation, so to say):
  • what key pathways and genes do we expect to see for treated-versus-ill malaria patients
  • what transcriptomics/proteomics/metabolomics data do you like us to consider too
Etc, etc...

Wednesday, May 14, 2014

Jean-Claude Bradley, Blue Obelisk award winner

Chemistry in Second Life. DOI:10.1186/1752-153X-3-14
There are nowadays a lot of people talking about Open, about open access, open data, open source. In fact, some discussion on Twitter resulted in the realization that it is highly unlikely that any scholar has not taken advantage of Open in some way in their research in the last few years. However, this is mostly due to people whom actually do, not by those who talk about it or use it.

One of the few people in chemistry who did both promoting Open and doing Open was Jean-Claude Bradley. Yesterday, I heard the sad news that he passed away. This is a great loss to many of us and certainly to the open chemistry community. Jean-Claude received the Blue Obelisk award for his Open Notebook Science work back in 2007 (I handed him the obelisk at the ACS meeting in Chicago; thanx to Chris for taking the picture, and digging it up!) and he contributed much to the community, among which his melting point and solubility data for organic compounds.

A proud me handing out the Blue Obelisk award to Jean-Claude in Chicago in 2007.
CC-BY 2007 Christoph Steinbeck.
Jean-Claude did some work together, including a book chapter, which I liked being a trained organic chemist myself (well, just a 6 month minor during my M.Sc. on supramolecular chemistry). I was really pleased that he had accepted to become part of the eNanoMapper scientific advisory board, and I was very much looking forward to working with him again on the journal side of dissemination of nanosafety research, in his role as editor-in-chief of Chemistry Central Journal.

Few people leave a big impression on me, but he was certainly one of them. Let his extensive work not go unnoticed; there is still a lot to do in Open chemistry.

Other posts about this loss.

Sunday, May 04, 2014

Changes in CDK 1.6 #5: the SMILES generator

User:Fdardel and User: DMacks
CC-BY-SA at Wikipedia.
I won't say much about this, as John already did. It's much faster, more functional that what the CDK had before. Some things to keep in mind, which I ran into when proofreading my Groovy Cheminformatics with the CDK book. Importantly, make sure to read the SMILESGeneration documentation, as it many new cool options, and like much of the new CDK code, performance was a goal and it therefore is faster.

Canonical SMILES
Generating unique SMILES is done slightly differently, but elegantly:

generator = SmilesGenerator.unique();

"Aromatic" SMILES
Because SMILES with lower case element symbols reflecting aromaticity has less explicit information, it is not my suggestion to use. Still, I know that some of you are keen on using it, for various sometimes logical reasons, so here goes. Previously, you would use the setUseAromaticityFlag(true) method for this, but you can now use instead:

generator = SmilesGenerator.generic().aromatic()
smiles = generator.createSMILES(mol)

Of course, you can combine things.

Saturday, May 03, 2014

Changes in CDK 1.6 #4: IsotopeFactory and Isotopes

A major CDK API change happened around the IsotopeFactory. Previously, this class was used to get isotope information, which it gets from an configurable XML file. This functionality is now available from the XMLIsotopeFactory class. However, to improve the speed of getting basic isotope information as well as to reduce the size of the core modules, CDK 1.6 introduces a Isotopes class, which contains information extracted from the XML file, but is available as a pure Java class. The APIs for getting isotope information is mostly the same, but the instantiation is much simpler, and also no longer requires an IChemObjectBuilder (in Groovy):
    import org.openscience.cdk.config.*;

    isofac = Isotopes.getInstance();
    uranium = 92;
    for (atomicNumber in 1..uranium) {
      element = isofac.getElement(atomicNumber)
Previous posts: