Tuesday, October 16, 2007

Lunch at Nature HQ (with Euan, Joanna, Ian and Ålf)

On my way back from the Taverna workshop I visited Nature HQ, as Ian reported about on Nascent. It was a (too) short meeting, but very nice to meet Euan (finally; he wrote the software which I use for Chemical blogspace), Joanna (whom I met in Chicago already, where she had two presentations, and is responsible for Second Nature), Ian (who works on Connotea, and commented on my tagging molecule blog) and Ålf (who works on Scintilla) and briefly Timo (who rules them all). BTW, I had a simple but delicious pasta.

First, let me note that if I would have to name a favorite molecule, and it was acetic acid, not ascorbic acid. Reason why it would be my favorite is acetic acid was the first organic molecule I put in the Woordenboek Organische Chemie in 1995.

We discussed a number of things, regarding the things we do. One of these was tagging molecules. Ian used instead of The first was not yet picked up by but I fixed that.

We also discussed linking molecular structures with scientific literature. The discussions in blogspace of this week show that doing that by using computer programs is not appreciated by publishers (see here, here, here, here, here, and here) (The publishers seem to prefer to like to send of a PDF to India or China.)

I proposed that the InChI would be part of the publication, for all molecules mentioned in the article. If a journal can require exact bibliography and experimental section formats, they can certainly require InChIs too. There are few programs left which cannot autogenerate an InChI, and the chemists draws the structures anyway. However, the software used in the editorial process does not support linking InChIs with a PDF (if that software would have been opensource ...).

So, the best current option seems to be social tagging mechanisms, and this is what we talked about. Just use Connotea (or any other service) and tag your molecule with a DOI:


This tagging is done manually. No machines involved in that. Nothing the publishers can do about this. No ChemRefer needed. But this will allow us to start building a database with links between papers and molecules, which we badly need. BTW, we will not have to start from scratch. The NMRShiftDB already contains many links, which is open data!

Now, you might notice the informal semantics of the doi: prefix. That's something I hereby propose, as it allow services to pick up the content more easily. You might also note the incorrect DOI in Connotea. The reason for that is that Connotea does not yet support a '/' in a tag. I reported that problem.


  1. Wow, lucky man! I would have loved to be at that lunch. I'm curious to hear how they reacted to your suggestions?

    2 questions
    - any suggestion how to annotate the specific String within the document that refers to the chemical (or other) entity? Seems that NPG could easily facilitate that. if they wanted to.

    - want to set up E.D. with the specific set of tags you are after (e.g. InChi ids)? Anything that you can get into an OWL document and post somewhere online can be added very easily and I'd love to do it. (We need more controlled tag sets in there).

  2. Hi Benjamin!

    I have had your manuscript in my bag for at least two weeks now, but haven't found time to write up my comments yet! Sorry about that! It's almost like a true reviewing process :)

    re 1) you mean annotation in the PDF or HTML? To mark up chemical compounds there? Like with RSC's Project Prospect?

    re 2) yes, let's synchronize that. I have set up RDF statements regarding those things for Strigi, see:

    Is that what you mean?

  3. Looking forward to both your review and the real review (which, as far as I can tell based on the deadline they gave me for reviewing other papers for that edition, is more than 2 weeks late). Perhaps you are actually an official reviewer and they will be the same :)!

    I was thinking in the HTML version. - I would guess it would be much easier. I'd personally like to move away from the PDF for repesentation of online content if possible. Unless you are Adobe.. they are pain in the but to work with inside.

    Yes, like ProjectProspect. I need to read about that.. Is there any way that public taggers a la Connoteaites could add in-document tags ?

    Regarding E.D. Right now, E.D. works with vocabularies that follow the representational structure in the OBO pre-alpha OWL versions. That is, it expects the semantic tags to correspond to OWL classes. It does not yet offer the capability to assign typed properties such as those in your RDF document, only adding the 'hasSemanticTag' relation to the currently implicit hasTag relation.

    I want to leave it that way for the next month or so - depending on what happens with the reviews.. But am eager to a) make it work for larger vocabs and b) add typed predicates. For immediate use however, any tag you want in there will have to be an OWL class.