-
Normalization is an important step in many cheminformatics workflows. Picking the right representation for a nitro-group, for example.
Are there best practices here? Should we initiate an Open Specification for normalization steps that should be performed? This would greatly increase the reproducibility in cheminformatics…
I'm not sure if that would help since it's up to the user and used software what kind of normalization you would want. If your software can't handle saltdata in the molfile (like the layout or the qsar.descriptor modules of the CDK) you want saltdata in a seperate field. If your software can handle it you probably want it in the CT for duplicate checking reasons. The same goes for tautomers, stereochemistry and so on.
ReplyDeleteI know Paul Dobson showed his Pipeline Pilot standardizing workflow. The workflow
ReplyDeletehttp://www.myexperiment.org/workflows/636.html
It was used in his Metabolite-likess publication http://www.ncbi.nlm.nih.gov/pubmed/19049901
i do not know, what exactly you mean with structure normalization, but there exists guidelines from IUPAC how to draw organic compounds.
ReplyDeleteDOI:10.1351/pac200880020277
DOI:10.1351/pac200678101897
Anomymous, I am more thinking of things like how to represent a nitro-group, if acid groups should be charged or neutral, if bond orders should be localized or delocalized, etc, etc. This differs from one application to another, but it could be useful if for each application domain a standard was set, which would make computational results easier to compare.
ReplyDeleteyou mean normalization like "The IUPAC Chemical Identifier – Technical Manual" Chapter IVb?
ReplyDeletewhat people often do not understand, is that a connection table which is useful for cheminformatics can be rendered to various depictions. therefore the InChI approach seems for me a good basis (although i know, it is a "dictionary" approach, not a fundamental algorithmic solution). and it does not contradict depiction of doi:10.1351/pac200880020277 GR8.1.