Monday, July 09, 2018

Converting any SPARQL endpoint to an OpenAPI

Logo of the grlc project.
Sometimes you run into something awesome. I had that one or two months ago, when I found out about a cool project that can convert a random SPARQL endpoint into an OpenAPI endpoint: grlc. Now, May/June was really busy (well, the least few weeks before summer not much less so), but at the WikiProject Wikidata for research meeting in Berlin last month, I just had to give it a go.

There is a convenient Docker image, so setting it up was a breeze (see their GitHub repo):

git clone
cd grlc
docker pull clariah/grlc
docker-compose -f docker-compose.default.yml up

What the software does, is take a number of configuration files that define what the OpenAPI REST call should look, and what the underlying SPARQL is. For example, to get all projects in Wikidata with a CORDIS project identifier, we have this configration file:

#+ summary: Lists grants with a CORDIS identifier
#+ endpoint_in_url: False
#+ endpoint:
#+ tags:
#+   - Grants

PREFIX bd: <>
PREFIX wikibase: <>
PREFIX wdt: <>

SELECT ?grant ?grantLabel ?cordis WHERE {
  ?grant wdt:P3400 ?cordis .
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en".

The full set of configuration files I hacked up (including one with a parameter) can be found here. The OpenAPI then looks something like this:

I haven't played enough with it yet, and I hope we can later use this in OpenRiskNet.

Sunday, July 01, 2018

European Union Observatory for Nanomaterials now includes eNanoMapper

The European Union Observatory for Nanomaterials (EUON) reported about two weeks ago that their observatory added two new data sets, one of which is the eNanoMapper database which includes NanoWiki and the NANoREG data. And both are exposed using the eNanoMapper database software (see this paper). It's very rewarding to see your work picked up like this and motivates me very much for the new NanoCommons project!

The European Union Observatory for Nanomaterials.

LIPID MAPS identifiers and endocannabinoids

Maybe I will find some more time later, but for now just a quick notice of a open notebook I kept yesterday for adding more LIPID MAPS identifiers in Wikidata. It started with a node in a WikiPathways which did not have an identifier: endocannabinoids:
This is why I am interested in Wikidata, as I can mint entries there myself (see this ICCS 2018 poster). And so I did, but when adding a chemical class, you want to specific compounds from that class too. That's where LIPID MAPS comes in, because that had info on specific compounds in that class.

Some time ago I asked about adding more LIPID MAPS identifiers to Wikidata, which has a lot of benefits for the community and LIPID MAPS. I was informed I could use their REST API to get mappings between InChIKey and their identifiers, and that is enough for me to add more of their identifiers to Wikidata (similar approach I used for the EPA CompTox Dashboard and SPLASHes). The advantages include that LIPID MAPS now can easily get data to add links to the PDB and MassBank to their lipid database (and much more).

My advantage is that I can easily query if a particular compound is a specific endocannabinoids. I created two Bioclipse scripts, and one looks like:

// ask permission to use data from their REST API (I did and got it)

restAPI = ""
propID = "P2063"

allData = bioclipse.downloadAsFile(
  restAPI, "/LipidMaps/lipidmaps.txt"

sparql = """
PREFIX wdt: 
SELECT (substr(str(?compound),32) as ?wd) ?key ?lmid WHERE {
  ?compound wdt:P235 ?key .
  MINUS { ?compound wdt:${propID} ?lmid . }

if (bioclipse.isOnline()) {
  results = rdf.sparqlRemote(
    "", sparql

def renewFile(file) {
  if (ui.fileExists(file)) ui.remove(file)
  return file

mappingsFile = "/LipidMaps/mappings.txt"
missingCompoundFile = "/LipidMaps/missing.txt"

// ignore certain Wikidata items, where I don't want the LIPID MAPS ID added
ignores = new java.util.HashSet();
// ignores.add("Q37111097")

// make a map
map = new HashMap()
for (i=1;i<=results.rowCount;i++) {
  rowVals = results.getRow(i)
  map.put(rowVals[1], rowVals[0])  

batchSize = 500
batchCounter = 0
mappingContent = ""
missingContent = ""
print "Saved a batch"
new File(bioclipse.fullPath("/LipidMaps/lipidmaps.txt")).eachLine{ line ->
  fields = line.split("\t")
  if (fields.length > 15) {
    lmid = fields[1]
    inchikey = fields[15]
    if (inchikey != null && inchikey.length() > 10) {
      if (map.containsKey(inchikey)) {
        wdid = map.get(inchikey)
        if (!ignores.contains(wdid)) {
          mappingContent += "${wdid}\t${propID}\t\"${lmid}\"\tS143\tQ20968889\tS854\t\"\"\tS813\t+2018-06-30T00:00:00Z/11\n"
      } else {
        missingContent += "${inchikey}\n"
  if (batchCounter >= batchSize) {
    ui.append(mappingsFile, mappingContent)
    ui.append(missingCompoundFile, missingContent)
    batchCounter = 0
    mappingContent = ""
    missingContent = ""
    print "."
println "\n"

With that, I managed to increase the number of LIPID MAPS identifiers from 2333 to 6099, but there are an additional 38 thousand lipids not yet in Wikidata.

Many more details can be found in my notebook, but in the end I ended up with a nice Scholia page for endocannabinoids :)