Thursday, October 18, 2007

Open Data Misconception #1: you do not get cited for your contributions

The Open Data/ChemSpider debate is continuing, and Noel wondered in the ChemSpider Blog item on the Open Data spectra in ChemSpider. The spectra in ChemSpider come from four persons, two of which released their data as Open Data (Robert and Jean-Claude) and two as proprietary data.

One of the two is Gary who expressed his concerns in the ChemSpider blog that people would not cite his contributions if he would release the data as Open Data:
    In principle, someone could download an assortment of spectra for a given molecule, calculate some other spectra, and then write a paper without ever recording a single NMR spectrum of their own. Would they then include the individual who deposited the spectra as a co-author or even acknowledge the source of the spectra that they used? Who knows.

It is a misconception that releasing your Open Data will cause a situation that your scientific work is not acknowledged (citing statistics is the crude mechanism we use for that). First of all, using results without acknowledgment is called plagiarism (which is ethically wrong by any standard). But this is not a feature of Open Data, it is found in any form of science. Recall Herr Schön.

Some months back I advised an other chemical database who had similar concerns, and I pointed the owners, like I commented to Gary, to the CC-BY license which has an explicit Attribution (BY) clause:
    Attribution. You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work).

Using this license, plagiarism would not even just be (scientifically) unethical, it would be illegal too, because it would brake the license agreement. This even allows one to bring the case to court, if you like. (BTW, I was recently informed that the database had switched to the CC-BY license!)