Tuesday, March 08, 2011

ToxBank: a data warehouse for (computational) toxicology

Last week I was in sunny Cascais, and in three days experienced -23oC and +18oC. The reason I was there was the kick-off meeting of the EU FP7 cluster SEURAT, which includes 'our' ToxBank project.

Data types we will host include many different types, including my favorite metabolomics. Don't ask me what this will practically mean, but some keywords we already know include RDF, OpenTox, and ToxML. With metabolomics, I hope to squeeze in metabolomics.

And that data warehousing for metabolomics is important was only recently shown be the retraction (via RetractionWatch) of  this Nature paper (doi:10.1038/nature03356). The reason was that it critically depended on conclusions from another retracted paper (doi:10.1021/jf021166h), from J. Agric. Food Chem. in 2009.

In this paper, they identified ten chemicals from arabidopsis: butanoic acid; trans-cinnamic acid; o-coumaric acid; p-coumaric acid; ferulic acid; p-hydroxybenzamide; methyl p-hydroxybenzoate; 3-indolepropanoic acid; syringic acid; and, vanillic acid. I hope I have the links to Wikipedia correct, as this was based on names only, as the paper does not seem to list InChIs or even SMILESes. The ten chemical were identified with HPLC and NMR. No experimental data seems to be given. What NMR data did they base the identification on? I have seen pretty interesting assignments of chemical identity in GC/MS and LC/MS, so was quite disappointed to not see the gory details here.

But fortunately, I could look at the raw data. Yeah, sure! Dream on.

In fact, it seems the characterizations of the 10 chemicals was challenged, causing the authors to look again at their data. Unfortunately, they could not find experimental data anymore. The authors write in the retraction:
    We have been unable to find experimental data that document the actual isolation of butanoic acid, trans-cinnamic acid, ...
Now, readers of my blog I care about raw data (see McPrinciple #1). For example, it was a key feature of our MetWare project. It is not entirely clear to me that they could no longer find the raw data, or whether they were no longer able to correlate their extracted characteristics with the know NMR for those ten compounds. This only strengthens the importance of NMR databases in metabolite identification, something Christoph would only agree with.

I am not sure we will see the bottom of this, and see if the authors could have prevented this retraction. However, I do believe the paper was flawed in the first place: it did not give experimental detail allowing the referees to judge the metabolite identification. The referees failed, as the apparently did not find this aspect important enough to have this data in the paper. And, the journal failed clearly, by not having a good editorial requirement in place around availability of data. This is not specific to this retracted paper, nor of this journal. It's pretty much the community standard, despite many calling for years for better standards, e.g. via minimal reporting standards.

Well, maybe journal editors will soon wake up, and make availability of experimental data in papers of this kind (and any type, IMHO) a community standard, and strong standard, such strong that referees can reject papers of papers do not provide this minimal information.

Why? It would have saved a lot of people from doing the wrong thing. The original paper was cited 54 times (according to WoS) and the Nature paper 52 times (up one since the RetractionWatch post). We're bound to see a few more retractions as a result of this, I guess.

So, where I failed to get MetWare going within the Netherlands Metabolomics Center, let's hope ToxBank does better. But given the list of ToxBank partners, I have no doubt about that.

ResearchBlogging.orgBais, H., Prithiviraj, B., Jha, A., Ausubel, F., & Vivanco, J. (2005). Mediation of pathogen resistance by exudation of antimicrobials from roots Nature, 434 (7030), 217-221 DOI: 10.1038/nature03356

ResearchBlogging.orgWalker, T., Bais, H., Halligan, K., Stermitz, F., & Vivanco, J. (2003). Metabolic Profiling of Root Exudates of Arabidopsis thaliana, Journal of Agricultural and Food Chemistry, 51 (9), 2548-2554 DOI: 10.1021/jf021166h

1 comment:

  1. Egon, Good and important commentary on organisation and availability of data including raw data to ensure scientific evaluation and reproducibility. We will indeed give close attention to this issue on ToxBank.