One thing that machine readability adds, is all sorts of machine processing. Validation of data consistency is one. For SMILES strings, one of the things you can do is test of the string parses at all. Wikidata is machine readable, and, in fact, easier to parse than Wikipedia, for which the SMILES strings were validated recently in a J. Cheminformatics paper by Ertl et al. (doi:10.1186/s13321-015-0061-y).

Because I was wondering about the quality of the SMILES strings (and because people ask me about these things), I made some time today to run a test:

SPARQL for all SMILES strings process each one of them with the CDK SMILES parser I can do both easily in Bioclipse with an integrated script:

identifier = "P233" // SMILES

type = "smiles"

sparql = """

PREFIX wdt: <http://www.wikidata.

doi:10.15200/winn.145228.82018

April this year I blogged about an important SPARQL query for many chemists: getting CAS registry numbers from Wikidata. This is relevant for two reasons:

CAS works together with Wikimedia on a large, free CAS-to-structure database Wikidata is CCZero The original effort validated about eight thousand registry numbers, made available via Wikipedia and the Common Chemistry website.

Earlier this week there was a question on the WikiPathways mailing list about the webservices. There are older SOAP webservices and newer REST-like webservices, which come with this nice Swagger webfront set up by Nuno. Of course, both approaches are pretty standard and you can use them from basically any environment. Still, some personas prefer to not see technical issues: "why should I know how an car engine works". I do not think any scholar is allowed you use this argument, but alas...

Last week the BiGCaT team were present with three person (Linda, Ryan, and me) at the Sematic Web Applications and Tools 4 Life Sciences meeting in Cambridge (#swat4ls). It's a great meeting, particularly because if the workshops and hackathon. Previously, I attended the meeting in Amsterdam (gave this presentation) and Paris (which I apparently did not blog about).
Text
Text
This blog deals with chemblaics in the broader sense. Chemblaics (pronounced chem-bla-ics) is the science that uses computers to solve problems in chemistry, biochemistry and related fields. The big difference between chemblaics and areas such as chem(o)?informatics, chemometrics, computational chemistry, etc, is that chemblaics only uses open source software, open data, and open standards, making experimental results reproducible and validatable. And this is a big difference!
About Me
About Me
Popular Posts
Popular Posts
Pageviews past week
Pageviews past week
1831
Blog Archive
Blog Archive
Labels
Labels
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.