- licensing must be explicit, to allow aggregators to know under what conditions they can redistribute that data (or not)
- compound databases must start being clear on whether entries are specific compounds and if listed properties are for a specific tautomer (or not)
This is of critical importance to do reasoning over data in multiple data sets, as recently outlined in our Applications of the InChI paper, or for large data integration projects like Open PHACTS.
This presentation captures all the usual suspects, like the Panton Principles, lists some truly Open Data in chemistry (e.g. CrystalEye), and outlines the architecture I am working on. The primary purpose of this project is Linked Open Data for chemistry and to boost this field. Sadly, grant writing interfered with my agenda, and I did not manage to complete the full demo, but the slides contain this real-world screenshot that shows what it looks like (and I expect this put this publicly online in 1-2 months):
By no means this architecture expected to be as functional as Open PHACTS or to replace large compound databases like ChemSpider or PubChem. Instead, it is meant as a simple architecture that does two things right and is simple enough to set up, that any chemistry lab can do it. Goal: to increase the size of the chemical Linked Open Data network, which is way too small at this moment. I will list LinkedChemistry.info data sets with DataHub.io.
Basically, you set up a SPARQL endpoint with the data you want to share and the Chemical Compound Box as PHP front end using ARC2. That's it.
And the slides of the #ACSNola presentation: