Pages

Friday, June 22, 2007

Archiving spectra: use InChI and CML

Ryan blogged in Archive This about some advices from ACD on how to store spectra in your electronic lab notebook.

Use InChI
This reminded me of a discussion I had with with Colin when he was at the CUBIC, which was about experimental sections. I proposed that the InChI should have a prominent place in the experimental section. An important argument for this is that it allows well-defined atom numbering to be used when writing down the NMR bits in that section: the InChI gives a unique numbering, so that the numbering used in the experimental section becomes author neutral. Because the InChI puts the carbons up front, the 13C NMR details get numbers from 1-13, or whatever the carbon count is. For proton NMR it is not difficult either, they are simply numbered according to the heavy atom to which they are attached. For situations where two hydrogens attached to the same heavy atom have different shifts, then a and b can still be used. The numbers are easily added to 2D diagrams anyway.

If software vendors (e.g. ACD and Bioclipse) and publishers (e.g. ACS, RSC, Chemistry Central) could adopt this proposal, then experimental sections immediately are better machine parsable and ready for automatic processing, such as discussed in my blog item Chemical Archeology: OSCAR3 to NMRShiftDB.org and by Christoph at the ACS meeting, available as PDF and this 18MB MP3.

Use CML
Even better is to use CML for this, or CMLSpect to be precise (paper is accepted, and should appear soon). This XML-based language allows the full semantic markup of all the experimental details and all the interesting assignments you want to archive. I would like to challenge ACD to follow Bioclipse's lead and provide export as CMLSpect for spectral assignments and markup of experimental details, in addition to the PDF in whatever format they prefer. Cheers for the work by Tobias and Stefan on spectrum support in Bioclipse!