Saturday, July 26, 2008

CDK Literature #5

Time flies. Another CDK Literature (see also #1, #2, #3, #4). Quite a few papers have been published again, and I'll briefly discuss a few of them.

Detection of IUPAC names
Klinger et al. have written a paper on detection of IUPAC names. As long as semantic markup languages are not the default, this remains important. Remaining problems include correctly finding boundaries in summaries of chemical. The CDK has been used to create SMILES.
Roman Klinger, Corinna Kolárik, Juliane Fluck, Martin Hofmann-Apitius, Christoph M. Friedrich, Detection of IUPAC and IUPAC-like chemical names, Bioinformatics 2008 24(13):i268-i276; doi:10.1093/bioinformatics/btn181

Structure elucidation
Elyashberg, Williams and Martin wrote a review on structure elucidation and discuss Steinbeck's Seneca software, which uses components of the CDK, though the CDK is not directly mentioned.
M.E. Elyashberg, A.J. Williams, G.E. Martin, Computer-assisted structure verification and elucidation tools in NMR-based structure elucidation, Progress in Nuclear Magnetic Resonance Spectroscopy, 2008, 53(1-2):1-104, doi:10.1016/j.pnmrs.2007.04.003

Opensource Distributed Chemical Computing
Karthikeyan et al. have published ChemStar, an opensource distributed chemical computing system, build on top the Java Remote Method Invocation architecture, used by the original Seneca too. The CDK paper and a Fechner/Guha's CDK News paper are cited in relation to a ChemStar application of benchmarking QSAR descriptors. The article does not seem to mention the opensource license, nor have I yet found a source package download.
M. Karthikeyan, S. Krishnan, A.K. Pandey, A. Bender, A. Tropsha, Distributed Chemical Computing Using ChemStar: An Open Source Java Remote Method Invocation Architecture Applied to Large Scale Molecular Data from PubChem, J. Chem. Inf. Model., 48 (4), 691–703, 2008. 10.1021/ci700334f

Taverna's APIConsumer
Taverna has several means of making functionality available to the workflow engine. SOAP and BioMoby are two prominent ones. The APIConsumer is another one, and described in this paper. The CDK-Taverna project lead by Thomas Kuhn, is mentioned as another project that uses this approach.
Peter Li, Tom Oinn, Stian Soiland, Douglas B. Kell, Automated manipulation of systems biology models using libSBML within Taverna workflows, Bioinformatics 2008 24(2):287-289, doi:10.1093/bioinformatics/btm578

Docking for Substrate Identification
Favia uses docking to recognize interesting substrates for short-chain dehydrogenases/reductases. The CDK's fingerprinter is used to describe intermolecular similarity, by calculating the Tanimoto distances between the bit strings.
Angelo D. Favia1, Irene Nobeli, Fabian Glaser, Janet M. Thornton, Molecular Docking for Substrate Identification: The Short-Chain Dehydrogenases/Reductases, Journal of Molecular Biology, 2008, 375(3):855-874, doi:10.1016/j.jmb.2007.10.065


  1. Rajarshi, thanx. Some of these papers I have not been able to access yet.

    BTW, would you like to guest-edit a CDK Literature? I plan to collate #3-#5, maybe a #6 as a CDK Literature submission for CDK News... maybe you'd like to co-author that?

  2. Yes, I haven't been able to get the tox paper (though Todd should be able to provide a copy)

    Wrt guest editing, sure

  3. If anyone wants a copy of the CASE article you fere too..visit: and let me know.