Sunday, September 25, 2011

rrdf 1.5: Accessing SMW SPARQL end points behind LDAP authentication

We are using a Semantic MediaWiki (SMW) for the Gold Compound selection task by the ToxBank in the SEURAT-1 cluster, funded by Colipa and the EC. I do stress that despite being funded by Colipa, they have no control over my research; they just co-fund it. This Gold Compound wiki is hidden behind a cluster agreement-wall, which is implemented with HTTP Basic Auth on the front, and LDAP authentication (at some point this data will become Open) in the background. That actually combines nicely with (S)MW, and automatically logs in people into their (linked) wiki account.

Now, the great thing about SMW is that it is machine readable. It basically allows you a custom DBPedia, and I am using this to capture knowledge from the NanoQSAR literature, as blogged in Importing Nanotoxicity Data with SPARQL into R for analysis. It turned out that the SMW wiki is simply using 'basic HTTP authentication' for the part between web server and web client (thanx to chats with Nina), and LDAP between web server and authentication server. That meant that doing the authentication in Jena was trivial too, and I could simply use QueryEngineHTTP.setBasicAuthentication().

I updated rrdf to version 1.5 to support this too (see this patch; and thanx to Kurt Hornik for taking care of the CRAN incoming/). This mean that I can now extract Gold Compound data directly into my favorite statistics software R (but it would equally work with other tools that have SPARQL support, like Bioclipse), and do all sorts of fun stuff with the data, like validation, consistency checking, data mining, you name it. Like plotting pKa values (with rather uninformative segments :):

The wiki uses the RDFIO extension for SMW written by my former M.Sc. student at Uppsala University, Samuel, who presented this module at SMWCon last week.

SEURAT-1 GCWG and ToxBank members can email me for details on how to link the Gold Compound wiki to R (or other software). But it basically comes down to running command like this:

predicates = sparql.remote(
  "SELECT DISTINCT ?predication WHERE { [] ?predicate [] }",
  user="user", password="password"

No comments:

Post a Comment