Self-protrait by Gustave Courbet.
Source: WikiMedia Commons.
One blog that I have been (occasionally but repeatedly) reading for a long time is the What You're Doing Is Rather Desperate blog by Neil Saunders. HT to WoW!ter for pointing me to this nice post where Saunders shows how to calculate the number of entries in PubMed marked as retracted (in two seperate ways). The spoiler (you should check his R scripts anyway!): about 3900 retractions.

I have been quite interested in this idea of retractions and recent discussions with Chris (Christopher: mostly offline this time, and even some over a beer :) about if retractions are good or bad (BTW, check his inaugural speech on YouTube). He points out that retractions are not always for the right reasons, and, probably worse, have unintended consequences. An example he gave is a more senior researcher with two positions; in one lab someone misbehaved and this lab did not see any misconduct of this senior researcher; however, his other affiliation did not agree and fired him.

Five years ago I would have sad any senior researcher on a paper should still understand things in detail, and if misconduct was found, the senior authors are to blame too. I still believe this is the case, that's what you're co-researcher for, but feeling the pressure of publishing enough and just not having enough time, I do realize I cannot personally reproduce all results my post-docs and collaborators do. But we all know that publishing has made a wrong turn and, yes, I am trying to make it return to a better default.

Why I am interested in retractions
But that is not why I wanted to blog and discuss Saunders post. Instead, I want to explain why I am interested in retractions and, another blog you should check out, Retraction Watch. In fact, I am very much looking forward to their database! However, this is not because of the blame game. Not at all.

Instead, I am interested in noise in knowledge. Obviously, because this greatly affects my chemblaics research. Particularly, I like to reduce noise or at the very least take appropriate measures when doing statistics as we have plenty of means to deal with noise (like cancelling it out). Now, are retractions then a appropriate means to find incorrect, incomplete, or just outright false knowledge? No. But there is nothing better.

There are better approached: I have long and still am advocating the Citation Typing Ontology, though I have to admit I am not up to date with David Shotton's work. The CiTO allows to annotate if two papers disagree or if it agrees and perhaps even uses the knowledge. It can also annotate the citation as merely being included because the cited paper has some authority (expect many of those to Nature and Science papers).

But we have a long way to go before using CiTO becomes a reality. If interested, please check out the CiteULike support and Shotton's OpenCitations.

What does this mean for databases?
Research depends on data, some you measure, some you get from literature and increasingly databases. The latter, for example, to compare your own results with other findings. It is indeed helpful that databases provide these two functions:
  1. provide means to find similar experiments
  2. provide a gold standard of true knowledge
These database will have two different approaches: the first will present the data as reported or even better as raw data (as it came from the machine, unprocessed, though increasingly the machines already do processing to the best of its knowledge); the second will filter out true facts, possibly normalizing data along the way, e.g. by correcting obvious typing and drawing errors.

Indeed, database can combine these features, just like PubChem and ChemSpider do for small compounds. PubChem has the explicit approach of providing both the raw input from sources (the substances with SIDs) and the normalized result (the compounds with CIDs).


But what if the original paper turned out the be wrong? There are (at least) two phases:
  1. can we detect when a paper turns out wrong?
  2. can we propagate this knowledge into databases?
The first clearly reflects my interest in CiTO and retractions. We must develop means to filter out all the reported facts that turn out to be incorrect. And, can we efficiently keep our thousands of databases clean (many valid approaches!)? Do retractions matter here? Yes, because research in so-called higher impact journals is also seeing more retractions (whatever the reason is for that correlation), see this post by Bjoern Brembs


Where we must be heading
What the community needs to develop in the next few years is approaches for propagation of knowledge about correct and incorrect knowledge. That is, high impact knowledge must enter databases quickly, e.g. the exercise myokine irisin, but also the fact that it was recently shown it very likely doesn't exist, or at least that the original paper most likely measured something else (doi:10.1038/srep08889). Now, this is clearly a high-profile "retraction" of facts that few of us will have missed. Right? And that's where the problem is, the very long tail of disproven knowledge is very long, and we cannot rely on such facts to propagate quickly if we do not use tools to help us. This is one reason why my attention turned to semantic technologies, so that contradictions can be found more easily.

But I am getting rather desperate about all the badly annotated knowledge in databases, and I am also getting desperate about being able to make a change. The research I'm doing may turn out rather desperate.
0

Add a comment

Hi all, as posted about a year ago, I moved this blog to a different domain and different platform. Noting that I still have many followers on this domain (and not on my new domain, including over 300 on Feedly.com along).

This is my last post on blogger.com. At least, that is the plan. It has been a great 18 years. I like to thank the owners of blogger.com and Google later for providing this service. I am continuing the chem-bla-ics on a new domain: https://chem-bla-ics.linkedchemistry.info/

I, like so many others, struggle with choosing open infrastructure versus the freebie model. Of course, we know these things come and go. Google Reader, FriendFeed, Twitter/X (see doi:10.1038/d41586-023-02554-0).

Some days ago, I started added boiling points to Wikidata, referenced from Basic Laboratory and Industrial Chemicals (wikidata:Q22236188), David R. Lide's 'a CRC quick reference handbook' from 1993 (well, the edition I have). But Wikidata wants pressure (wikidata:P2077) info at which the boiling point (wikidata:P2102) was measured. Rightfully so. But I had not added those yet, because it slows me and can be automated with QuickStatements.

Just a quick note: I just love the level of detail Wikidata allows us to use. One of the marvels is the practices of 'named as', which can be used in statements for subject and objects. The notion and importance here is that things are referred to in different ways, and these properties allows us to link the interpretation with the source.

I am still an avid user of RSS/Atom feeds. I use Feedly daily, partly because of their easy to use app. My blog is part of Planet RDF, a blog planet. Blog planets aggregate blogs from many people around a certain topic. It's like a forum, but open, free, community driven. It's exactly what the web should be.

This blog is almost 18 years old now. I have long wanted to migrate it to a version control system and at the same time have more control over things. Markdown would be awesome. In the past year, I learned a lot about the power of Jekyll and needed to get more experienced with it to use it for more databases, like we now do for WikiPathways.

So, time to migrate this blog :) This is probably a multiyear project, so feel free to continue reading it hear.
4

The role of a university is manifold. Being a place where people can find knowledge and the track record how that knowledge was reached is often seen as part of that. Over the past decades universities outsources this role, for example to publishers. This is seeing a lot of discussion and I am happy to see that the Dutch Universities are taking back control fast now.

I am pleased to learn that the Dutch Universities start looking at rankings of a more scientific way. It is long overdue that we take scientific peer review of the indicators used in those rankings seriously, instead of hiding beyond fud around the decline of quality of research.

So, what defines the quality of a journal? Or better, of any scholarly dissemination channel? After all, some databases do better peer review than some journals.

A bit over a year ago I got introduced to Qeios when I was asked to review an article by Michie, West, and Hasting: "Creating ontological definitions for use in science" (doi:10.32388/YGIF9B.2). I wrote up my thoughts after reading the paper, and the review was posted openly online and got a DOI. Not the first platform to do this (think F1000), but it is always nice to see some publishers taking publishing seriously. Since then, I reviewed two more papers.
Loading