Pages

Saturday, September 15, 2018

Wikidata Query Service recipe: qualifiers and the Greek alphabet

Just because I need to look this up each time myself, I wrote up this quick recipe for how to get information from statement qualifiers from Wikidata. Let's say, I want to list all Greek letters, with in one column the lower case and in the other the upper case letter. This is what our data looks like:


So, let start with a simple query that lists all letters in the Greek alphabet:

SELECT ?letter WHERE {
  ?letter wdt:P361 wd:Q8216 .
}

Of course, that only gives me the Wikidata entries, and not the Unicode characters we are after. So, let's add that Unicode character property:

SELECT ?letter ?unicode WHERE {
  ?letter wdt:P361 wd:Q8216 ;
          wdt:P487 ?unicode .
}

Ah, that gets us somewhere:



But you see that the upper and lower case are still in separate rows, rather than columns. To fix that, we need access to those qualifiers. It's all in there in the Wikidata RDF, but the model is giving people a headache (so do many things, like math, but that does not mean we should stop doing it!). It all comes down to keeping notebooks, write down your tricks, etc. It's called the scientific method (there is more to that, than just keeping notebooks, tho).

Qualifiers
So, a lot of important information is put in qualifiers, and not just the statements. Let's first get all statements for a Greek letter. We would do that with:

?letter ?pprop ?statement .

One thing we want to know about the property we're looking at, is the entity linked to that. We do that by adding this bit:

?property wikibase:claim ?propp .

Of course, the property we are interested in is the Unicode character, so can put that directly in:

wd:P487 wikibase:claim ?propp .

Next, the qualifiers for the statement. We want them all:

?statement ?qualifier ?qualifierVal .
?qualifierProp wikibase:qualifier ?qualifier .

And because we do not want any qualifier but the applies to part, we can put that in too:

?statement ?qualifier ?qualifierVal .
wd:P518 wikibase:qualifier ?qualifier .

Furthermore, we are only interested in lower case and upper case, and we can put that in as well (for upper case):

?statement ?qualifier wd:Q98912 .
wd:P518 wikibase:qualifier ?qualifier .

So, if we want both upper and lower case, we now get this full query:

SELECT DISTINCT ?letter ?unicode WHERE {
  ?letter wdt:P361 wd:Q8216 ;
          wdt:P487 ?unicode .
  ?letter ?pprop ?statement .
  wd:P487 wikibase:claim ?propp .
  ?statement ?qualifier wd:Q8185162 .
  wd:P518 wikibase:qualifier ?qualifier .
}

We are not done yet, because you can see in the above example that we get the unicode character differently from the statement. This needs to be integrated, and we need the wikibase:statementProperty for that:

wd:P487 wikibase:statementProperty ?statementProp .
?statement ?statementProp ?unicode .

If we integrate that, we get this query, which is indeed getting complex:

SELECT DISTINCT ?letter ?unicode WHERE {
  ?letter wdt:P361 wd:Q8216 .
  ?letter ?pprop ?statement .
  wd:P487 wikibase:claim ?propp ;
          wikibase:statementProperty ?statementProp .
  ?statement ?qualifier wd:Q8185162 ;
             ?statementProp ?unicode .  
  wd:P518 wikibase:qualifier ?qualifier .
}

But basically we have our template here, with three parameters:
  1. the property of the statement (here P487: Unicode character)
  2. the property of the qualifier (here P518: applies to part)
  3. the object value of the qualifier (here Q98912: upper case)
If we use the SPARQL VALUES approach, we get the following template. Notice that I renamed the variables of ?letter and ?unicode. But I left the wdt:P361 wd:Q8216 (='part of' 'Greek alphabet') in, so that this query does not time out:

SELECT DISTINCT ?entityOfInterest ?statementDataValue WHERE {
  ?entityOfInterest wdt:P361 wd:Q8216 . # 'part of' 'Greek alphabet'
  VALUES ?qualifierObject { wd:Q8185162 }
  VALUES ?qualifierProperty { wd:P518 }
  VALUES ?statementProperty { wd:P487 }

  # template
  ?entityOfInterest ?pprop ?statement .
  ?statementProperty wikibase:claim ?propp ;
          wikibase:statementProperty ?statementProp .
  ?statement ?qualifier ?qualifierObject ;
             ?statementProp ?statementDataValue .  
  ?qualifierProperty wikibase:qualifier ?qualifier .
}

So, there is our recipe, for everyone to copy/paste.

Completing the Greek alphabet example
OK, now since I actually started with the upper and lower case Unicode character for Greek letters, let's finish that query too. Since we need both, we need to use the template twice:

SELECT DISTINCT ?entityOfInterest ?lowerCase ?upperCase WHERE {
  ?entityOfInterest wdt:P361 wd:Q8216 .

  { # lower case
    ?entityOfInterest ?pprop ?statement .
    wd:P487 wikibase:claim ?propp ;
            wikibase:statementProperty ?statementProp .
    ?statement ?qualifier wd:Q8185162 ;
               ?statementProp ?lowerCase .  
    wd:P518 wikibase:qualifier ?qualifier .
  }

  { # upper case
    ?entityOfInterest ?pprop2 ?statement2 .
    wd:P487 wikibase:claim ?propp2 ;
            wikibase:statementProperty ?statementProp2 .
    ?statement2 ?qualifier2 wd:Q98912 ;
               ?statementProp2 ?upperCase .  
    wd:P518 wikibase:qualifier ?qualifier2 .
  }
}

Still one issue left to fix. Some greek letters have more than one upper case Unicode character. We need to concatenate those. That requires a GROUP BY and the GROUP_CONCAT function, and get this query:

SELECT DISTINCT ?entityOfInterest
  (GROUP_CONCAT(DISTINCT ?lowerCase; separator=", ") AS ?lowerCases)
  (GROUP_CONCAT(DISTINCT ?upperCase; separator=", ") AS ?upperCases)
WHERE {
  ?entityOfInterest wdt:P361 wd:Q8216 .

  { # lower case
    ?entityOfInterest ?pprop ?statement .
    wd:P487 wikibase:claim ?propp ;
            wikibase:statementProperty ?statementProp .
    ?statement ?qualifier wd:Q8185162 ;
               ?statementProp ?lowerCase .  
    wd:P518 wikibase:qualifier ?qualifier .
  }

  { # upper case
    ?entityOfInterest ?pprop2 ?statement2 .
    wd:P487 wikibase:claim ?propp2 ;
            wikibase:statementProperty ?statementProp2 .
    ?statement2 ?qualifier2 wd:Q98912 ;
               ?statementProp2 ?upperCase .  
    wd:P518 wikibase:qualifier ?qualifier2 .
  }
} GROUP BY ?entityOfInterest

Now, since most of my blog posts are not just fun, but typically also have a use case, allow me to shed light on the context. Since you are still reading, your officially part of the secret society of brave followers of my blog. Tweet to my egonwillighagen account a message consisting of a series of letters followed by two numbers (no spaces) and another series of letters, where the two numbers indicate the number of letters at the start and the end, for example, abc32yz or adasgfshjdg111x, and I will you add you to my secret list of brave followers (and I will like the tweet; if you disguise the string to suggest it has some meaning, I will also retweet it). Only that string is allowed and don't tell anyone what it is about, or I will remove you from the list again :) Anyway, my ambition is to make a Wikidata-based BINAS replacement.

So, we only have a human readable name. The frequently used SERVICE wikibase:label does a pretty decent job and we end up with this table: