Dec
27
The quality of SMILES strings in Wikidata
One thing that machine readability adds, is all sorts of machine processing. Validation of data consistency is one. For SMILES strings, one of the things you can do is test of the string parses at all. Wikidata is machine readable, and, in fact, easier to parse than Wikipedia, for which the SMILES strings were validated recently in a J. Cheminformatics paper by Ertl et al. (doi:10.1186/s13321-015-0061-y).
Because I was wondering about the quality of the SMILES strings (and because people ask me about these things), I made some time today to run a test:
SPARQL for all SMILES strings process each one of them with the CDK SMILES parser I can do both easily in Bioclipse with an integrated script:
identifier = "P233" // SMILES
type = "smiles"
sparql = """
PREFIX wdt: <http://www.wikidata.
Because I was wondering about the quality of the SMILES strings (and because people ask me about these things), I made some time today to run a test:
SPARQL for all SMILES strings process each one of them with the CDK SMILES parser I can do both easily in Bioclipse with an integrated script:
identifier = "P233" // SMILES
type = "smiles"
sparql = """
PREFIX wdt: <http://www.wikidata.