chem-bla-ics: Open infrastructures with open citations

Research output comes in many ways. Journal articles, books, and book chapters took benefit from the Matthew effect: most abundant format, reinforcing itself. So much, in fact, that some scholars will have no trouble that research quality depends on these forms. Particularly journals articles. With impact factors. Evidence is missing, claims baseless, or just contradicting evidence people did collect.

Anyway, the effect of this dominance was that other research output forms no longer got sufficient attention. Indeed, the journals and books we had infrastructure for: libraries. They index them. Of course, a lot is changing and libraries started holding databases too now, but indexing of these is still mostly absent. Even for books, we typically do not find the "index" at the end of a book integrated in the library index. Copyright is likely to blame, because technically this has never been an issue.

But I think there is another reason explaining the dominance of, particularly, journals: the journal impact factor. In the fifties, it was realized that the citation data had more applications than just finding things. The data could be reused, repurposed. The original paper had as aim to prioritize which journals to index.

But infrastructures for other research outputs are in its infancy. Database? It must have an article for reuse to be tracked. This is what the NAR Database issue is about: to make databases citable. Example for me: the WikiPathways papers. The same for software. It needs a journal article for the infrastructure to track the reuse. If I remember correctly, this was one of the reasons to start the Journal of Cheminformatics: to make open source cheminformatics citable. But also think of the Journal of Open Source Software, which very clearly plays this role too. In the latter case, the peer review is strongly aimed at the source code, which I really like. The Bacting paper was published there. Examples for me in open source cheminformatics where citing via wrapping narratives was needed: the Chemistry Development Kit papers.

My point: the infrastructure has had an immense influence on what we think is quality research. Silly, but reality. I really cannot stress enough how important having this citation data in an open infrastructure is. It drives what research is done, and with that in which order societal problems are being solved. When it comes to scholarly research into a better future, it's money and citations that counts. There is a reason that many scholars that in journals articles are the currency. Silly, but reality.

But there are also many people working on open infrastructures to solve this issue. Think DataCite, software citations, CITATION.cff, and OpenCitations. They recently had a massive milestone, passing 1 billion CCZero citations, thanks to the important I4OC project. Elsevier and the American Chemical Society took their time before they joined, and I am hoping our request contributed to them considering it. This week OpenCitations made a new release, bringing the total close to 1.2 billion open citations, including the first citation data from the American Chemical Society.

I think there is an important task here for the European Open Science Cloud: we must quickly build an open (licensed) infrastructure with open (licensed) citation data for research output, not limited to journal articles, but for all research output. Must of the technologies are around. I am excited to have been involved in or contributed to projects like WikiCite and Scholia (see below screenshot) or the adoption of the Citation Typing Ontology by a Springer Nature journal, that developed open infrastructures.

My contribution this weekend, two small scripts (using Bacting) that use the COCI API to get OpenCitation data to insert this with QuickStatements into Wikidata (only for papers in Wikidata: it does not create new items):

adds citations of the focus paper to other papers (green/yellow in figure below)
adds citations to the focus paper by other papers (red/orange in figure below)

Have fun exploring citation networks in Scholia! Learn what research is done with your favor paper. See what papers are cited by papers you cite: those may be worth a read.

Two different ways to visualize citation data as used in Scholia for a single article.
A network (top) and a histogram (bottom) with references (yellow/green) and
articles citing the focus article (red/orange).

2 comments:

Victor Venema5:00 PM, September 12, 2021
It is a pity that the Initiative for Open Citations only catalogues doi to doi connections. In climatology there are many reports, especially older ones, without a doi. Many of them are at least as reputable as articles, such as guidances of the World Meteorological Organization.

Also older articles still often miss a doi and when you study the recent past you often have to cite articles from before the digital age, published in journals that no longer exist.

chem-bla-ics

Pages

Sunday, September 12, 2021

Open infrastructures with open citations

2 comments: