Monday, August 01, 2011

ChEMBL-RDF as part of the Linked Open Data cloud?

This page nicely writes up what you need to do to make your RDF resource part of the Linked Open Data network. CKAN is used to aggregate facts about the resources, and I am finally getting around to adding the metadata describing how the ChEMBL data (CC-SA-BY) is linked to other LOD resources. This process is conveniently supported by a validator (see the screenshot on the right side).

The links out are mostly to various data sets of Bio2RDF. SPARQL helps me count the number of links to other LOD nodes. A typical query looks like:

SELECT count(DISTINCT ?value)
  ?resource ?p ?value .
  FILTER (regex(str(?value),""))

The str() function is used to allow regex() on URIs.

Right now, the data links out to four data sets, all via Bio2RDF:
Now it is waiting to see if this is enough to make the next LOD cloud.