Wednesday, July 06, 2011

ChemPedia-RDF #3: Uploading data to Kasabi

OK, now that you have seen the outcome, I'll give a short walk through on how the data ended up there.

First, I registered. Easy. No OpenID yet, and I do hope they will add that. But you already got one, because you were so keen to test the SPARQL end point for the ChemPedia data, right?

Next, I added a data set. Or better, I added an entry for the data set, as the data is only added later. I added a name, description, the selected the Science category, the license, and left the rest empty.

The next step is to subscribe to all five APIs yourself, which you can do with the button on the right:
I skipped the Upload Data button. I used curl instead. The actual command I used (except I used my real API key), looks like:
curl -S -v -H Content-Type:application/rdf+xml \
  -d @substances.xml \
This command uses HTTP POST to send the content of the substances.xml file to the given address, using the -H option to set the mime type of the content.

I created the substances.xml with the same script I as before, with the important differences that:
  1. the resource URIs must have the domain and complemented with dataset/chempedia-rdf. Then Kasabi will pick this up and without further work make it available as Linked Open Data
  2. the RDF should not use anonymous resources (aka blank nodes)
I updated my Groovy script accordingly, and uploaded the RDF/XML with the above curl call.

In fact, the first RDF/XML I uploaded did not have those two changes, but Leigh Dodds explained me what I had to do to make the Linked Data feature going. So, I had to delete the original data, which requires you to reset your data set, which you can also do from the command line, with:
curl -S -v -H Content-Type:application/rdf+xml \
  -d @reset.json \
Where the content of the reset.json file looked like:
  "jobType": "reset",
  "startTime": "2011-07-06T12:08:00Z"
After I reuploaded the new RDF/XML, the resources could be dereferenced was working nicely. It's not perfect yet, and I think I will tune things a bit more, and start using CHEMINF too. But, if not mistaken, I have now qualified for this badge :)

five star open Web data