Saturday, April 27, 2013

OpenTox Euro 2013: call for abstracts

Later this year OpenTox Euro 2013 will take place in Mainz:

I also created a page on Lanyrd where you can enlist yourself, preferably with your Twitter account. BTW, the hashtag to be used is #oteu13:

You can submit abstracts here latest by 30 June 2013. There will be sessions around these optics: Open Infrastructure and Application Development, Integrated Data Analysis, Visualisation, Cheminformatics, Bioinformatics, and Systems Biology.

Must-have-ORCID reason #314

Publisher writes:
    2. Author details: Author "" provided in the manuscript differs from the one reflected in the submission system which is "". We have proceeded and followed the manuscript. Please check and advise if action taken is appropriate.
Yes, that is both me.

Saturday, April 20, 2013

#ACSNola talk: "An architecture for an Open Science molecular compound database"

About half a year ago I was fed up with the slow progress in Open Data in chemistry. Some initiatives exist and some projects, but there is no clear central point of access, which particularly is a problem for smaller providers. Thus, I set out a project plan to make this change. There were two other aspects that I wanted to include here:
  1. licensing must be explicit, to allow aggregators to know under what conditions they can redistribute that data (or not)
  2. compound databases must start being clear on whether entries are specific compounds and if listed properties are for a specific tautomer (or not)
This is of critical importance to do reasoning over data in multiple data sets, as recently outlined in our Applications of the InChI paper, or for large data integration projects like Open PHACTS.

This presentation captures all the usual suspects, like the Panton Principles, lists some truly Open Data in chemistry (e.g. CrystalEye), and outlines the architecture I am working on. The primary purpose of this project is Linked Open Data for chemistry and to boost this field. Sadly, grant writing interfered with my agenda, and I did not manage to complete the full demo, but the slides contain this real-world screenshot that shows what it looks like (and I expect this put this publicly online in 1-2 months):

By no means this architecture expected to be as functional as Open PHACTS or to replace large compound databases like ChemSpider or PubChem. Instead, it is meant as a simple architecture that does two things right and is simple enough to set up, that any chemistry lab can do it. Goal: to increase the size of the chemical Linked Open Data network, which is way too small at this moment. I will list data sets with

Basically, you set up a SPARQL endpoint with the data you want to share and the Chemical Compound Box as PHP front end using ARC2. That's it.

And the slides of the #ACSNola presentation:

Friday, April 19, 2013

#ACSNola talk: "Bioclipse-OpenTox: interactive predictive toxicology"

My third #ACSNola talk (well, second chronologically):

John May is now release manager of CDK 1.5.x

Update: John May is now John Mayfield, see also his ORCID profile.

After years of being release manager, John May (from Chris' group) has started as release manager for the 'master' branch, and thus leading to CDK 1.5.x versions (and later 1.6.x).

As part of that, John now has commit powers, to push reviewed patches into the main source tree (like Rajarshi and I have too):

This role of gatekeeper is very practical and comes with a large responsibility: ensure that whatever it pushed actually compiles for everyone. The gatekeeper will get up in the middle of the night when the official repository's branches does not compile! No, but really, really close to that. It's has main priority, so we always double check with 'git clean dist-all test-dist-all' before we push. No, we actually now and then forget to do this, e.g. after porting patches from another branch. But we fix that immediately! Promised!

And, John pushed his first two patches to master (ported from cdk-1.4.x):

Thursday, April 18, 2013

#ACSNola talk: "Open PHACTS: meaningful linking of preclinical drug discovery knowledge"

Half a year ago I submitted this abstract for the #ACSNola meeting last week (and as in the slides, I stress that is a large community effort involving not only academic groups but also many pharma companies):

Open PHACTS: meaningful linking of preclinical
drug discovery knowledge

E. Willighagen, C. Brenninkmeijer, C. Evelo,
L. Harland, A. Gray, C. Goble, A. Waagmeester,
A. Williams

Recently, semantic web technologies have been
adopted by the life sciences community for this
purpose. However, while these new technologies
provide us with methods, they do not provide us
with an exact solution. Open PHACTS uses these
methods to solve problems in linking preclinical
knowledge from databases like Uniprot, ChEMBL,
and WikiPathways. Problems that are discusses
and for which our solutions will be presented
include: 1. approaches to map data between the
databases using the Vocabulary of Interlinked
Dataset, including identifier mapping with
BridgeDBappropriate choices of mapping
predicates, and ontologies to cover provenance,
such as the Provenance Authoring and Versioning
ontology; 2. deal with different units for
experimental data using the Quantities, Units,
Dimensions and Data (QUDT) ontology for (on the
fly) quantity conversion; and 3. how all this
is linked to user-oriented graphical user

I have now uploaded the slides:

Also note the associate partnership program: it is not too late to join the 40 other associate partners and team up with Open PHACTS!

Tuesday, April 09, 2013

I'm a proud "RSC eScience hero"

A while ago Antony Williams, of ChemSpider, was awarded an eScience price for his work in the field. He decided to put that money to use to award several people. He worked things out with the Royal Society of Chemistry who send out a press release a few days ago ("RSC eScience heroes rewarded through Microsoft prize"). And I am proud to be one of those.

A full list of winners is extracted from the press release:

Enormous thanks to Tony for this and congrats too to the other winners. I note that several of the winners are or have been involved in the Blue Obelisk movement, and seeing all the hard work rewarded by this prize indicates we are making impact. Personally, I am awarded for my work on the CDK particularly (but have worked on way more open source code for cheminformatics and statistics). I do stress that the CDK is a community project which I happen to be a long time contributor and founder of. But the success of the CDK is based on the effort from all authors and users!

If you are wondering about the impact of just the CDK, have a look at the growing list of papers citing the CDK papers, which I am tracking in CiteULike, which has good CiTO support (the detail):