Pages

Friday, September 10, 2010

Pulling out data as JSON from XHTML+RDFa

I am keen on RDFa and RDF in general; that should not be a surprise. RDFa is a serialization of RDF triples embedded in (X)HTML. I recently posted about chemical examples of XHTML+RDFa. Now, the reason for putting data in HTML as RDFa is that we can easily pull it out again, e.g. with this distiller. But the fun goes on, and we can actually also run SPARQL directly on it, for example with RDFaDev which I recently blogged about.

Now, consider we have all these nice visualization tools written in JavaScript which can visualize data from JSON sources, the mashup requires a JSON serialization of that data embedded in HTML pages. Now, I have no experience with the cool JavaScript tools, and hope someone can help me out here, but the JSON bit I already got help with before on SemanticOverflow (thanx to Comment Bot!). The service mentioned no longer works, but there are plenty of alternatives.

Now, Peter is creating this nice data set about green solvents from patents, and it would be great of that data ends up online as RDFa, so that we can easily visualize the trends in solvent use over the years. But as I do not have this data as XHTML+RDFa yet, you will have to do with another example: boiling points.

So, let's consider the data on this page, relating paraffin molecules to boiling points, and we'll take a complexity descriptor (w0, Wiener descriptor) and the boilingpoint (t0). so we get this SPARQL query:
PREFIX cc: <http://github.com/egonw/cheminformatics.classics/1/#>

SELECT * {
    ?mol cc:w0 ?w ;
         cc:p0 ?p .
}
Now, we want to run this query on the aforementioned page, so we add a FROM clause:
PREFIX cc: <http://github.com/egonw/cheminformatics.classics/1/#>

SELECT *
FROM <http://www.w3.org/2007/08/pyRdfa/extract?uri=http%3A%2F%2Fegonw.github.com%2Fcheminformatics.classics%2Fclassic1.html&format=pretty-xml&warnings=false&parser=lax&space-preserve=true>
{
    ?mol cc:w0 ?w ;
         cc:p0 ?p .
}
Notice the use of the distiller here. This way, with a service like that on sparql.org, we can get JSON returned. The result is a bit verbose, but that can perhaps be tuned:
{
  "head": {
    "vars": [ "w" , "p" ]
  } ,
  "results": {
    "bindings": [
      {
        "w": { "datatype": "http://www.w3.org/2001/XMLSchema#integer" , "type": "typed-literal" , "value": "56" } ,
        "p": { "datatype": "http://www.w3.org/2001/XMLSchema#integer" , "type": "typed-literal" , "value": "4" }
      } ,
      {
        "w": { "datatype": "http://www.w3.org/2001/XMLSchema#integer" , "type": "typed-literal" , "value": "35" } ,
        "p": { "datatype": "http://www.w3.org/2001/XMLSchema#integer" , "type": "typed-literal" , "value": "3" }
      }
    ]
  }
}
The point is, I am sure at least one of my readers knows how to visualize the data in this JSON with, for example, Google Chart, particularly, because all the mashing up is embedded in the just linked-to, though obscure, URL. And, if it helps, you can otherwise use the CSV or TSV output. The output of that is even more simple (CSV):
w,p
56,4
286,9
35,3
220,8
20,2
84,5
10,1
165,7
120,6
The first one who can use one of the above URLs to extract the data from that XHTML+RDFa page to create a scatter plot in a HTML page with some JavaScript library, wins a free mention in my blog! ;)