![]() |
Figure 4 from this editorial which shows how data embedded in webpages can be extracted and visualized automatically with the right tools, as developed by Jankowski (figure license: CC-BY). |
An example, the ChEBI and ChEMBL databases from the EMBL-EBI provide both: they have a human-oriented website with webpages for all data. The data is sorted, and for both at least sorted by chemical compound. But they also have boxes: their FTP sites. You can download here all data in a box, and they leave it to you to unbox the data. Of course, many cheminformaticians just love to unbox the data, sort it, and put it in both other boxes and onto other websites.
Diversion #1: The mainstream publishers actually like boxes a lot. Fifteen years ago I was hopeful with all the Open Access vibes around, we would jointly make data, facts, readily available, also by machines. Sadly, last year, after trying to work with one mainstream publisher, I accepted my defeat. They are mostly interested in boxes. Worse, taped boxes that you cannot open. That's what they proudly presented (ReadCube).
So, many people, including me, were interested in actually making the human-oriented displays of the data and facts a machine readable box itself. For example, we wrote a chapter Beautifying Data in the Real World (use that link for the OA version) for the book Beautiful Data.
My personal interest went into HTML+RDFa. I probably blogged in 2008 about it for the first time, because of browser plugins that could extract it. Yes, indeed, a website as a machine readable data source. For example, a long time ago I played with the idea that one day all research dissemination would be interactive figures and tables (you can find this return in my research many times, as clear from this blog). Only very few publishers want to make this reality.
So, why I am blogging about this, these technologies have been long solved, here, are used, and ready to be taken up. And they are. From the big search engines that use schema.org for SEO to ELIXIR that uses it to make their projects interoperable.
Diversion #2: You may wonder where this puts SPARQL endpoints. Aren't they boxes too? If you would ask me, I would say yes, they are. But they are somewhere in between the story-around-data and box-with-data: they are self-documenting and interactive. Many FTP sites are documenting like those from the EMBL-EBI, but they are not interactive. Better, SPARQL can easily be wrapped in stories, as done here for SARS-CoV-2 and as worked out by Finn with Scholia (doi:10.1007/978-3-319-70407-4_36).
And here are some examples where I still use HTML+RDFa (probably some more):
- my homepage (which needs some updates, and a more modern layout, now that you mention it)
- the cheminformatics classics described in this editorial (which are now so old, they are becoming a classic themselves)
- the WikiPathways vocabularies
No comments:
Post a Comment