Sunday, March 27, 2016

Re: How should we add citations inside software?

Practice is that many cite webpages for the software, sometimes even just list the name. I do not understand why scholars do not en masse look up the research papers that are associated with the software. As a reviewer of research papers I often have to advice authors to revise their manuscript accordingly, but I think this is something that should be caught by the journal itself. Fact is, not all reviewers seem to check this.

In some future, if publishers would also take this serious, we will citation metrics for software like we have to research papers and increasingly for data (see also this brief idea). You can support this by assigning DOIs to software releases, e.g. using ZENODO. This list on our research group's webpage shows some of the software releases:

My advice for citation software thus goes a bit beyond what traditionally request for authors:

  1. cite the journal article(s) for the software that you use
  2. cite the specific software release version using ZENODO (or compatible) DOIs

 This tweet gives some advice about citing software, triggering this blog post:
Citations inside software
Daniel Katz takes a step further and asked how we should add citations inside software. After all, software reuses knowledge too, stands on algorithmic shoulders, and this can be a lot. This is something I can relate to a lot: if you write a cheminformatics software library, you use a ton of algorithms, all that are written up somewhere. Joerg Wegner did this too in his JOELib, and we adopted this idea for the Chemistry Development Kit.

So, the output looks something like:

(Yes, I spot the missing page information. But rather than missing information, it's more that this was an online only journal, and the renderer cannot handle it well. BTW, here you can find this paper; it was my first first author paper.)

However, at a Java source code level it looks quite different:

The build process is taking advantage of the JavaDoc taglet API and uses a BibTeXML file with the literature details. The taglet renders it to full HTML as we saw above.

Bioclipse does not use this in the source code, but does have the equivalent of a CITATION file: the managers, that extend the Python, JavaScript, and Groovy scripting environments with domain specific functionality (well, read the paper!). You can ask in any of these scripting languages about citation information:

    > doi bridgedb

This will open the webpage of the cited article (which sometimes opens in Bioclipse, sometimes in an external browser, depending on how it is configured).

At a source code level, this looks like:

So, here are my few cents. Software citation is important!