Thursday, October 18, 2007

Open Data Misconception #1: you do not get cited for your contributions

The Open Data/ChemSpider debate is continuing, and Noel wondered in the ChemSpider Blog item on the Open Data spectra in ChemSpider. The spectra in ChemSpider come from four persons, two of which released their data as Open Data (Robert and Jean-Claude) and two as proprietary data.

One of the two is Gary who expressed his concerns in the ChemSpider blog that people would not cite his contributions if he would release the data as Open Data:
    In principle, someone could download an assortment of spectra for a given molecule, calculate some other spectra, and then write a paper without ever recording a single NMR spectrum of their own. Would they then include the individual who deposited the spectra as a co-author or even acknowledge the source of the spectra that they used? Who knows.

It is a misconception that releasing your Open Data will cause a situation that your scientific work is not acknowledged (citing statistics is the crude mechanism we use for that). First of all, using results without acknowledgment is called plagiarism (which is ethically wrong by any standard). But this is not a feature of Open Data, it is found in any form of science. Recall Herr Schön.

Some months back I advised an other chemical database who had similar concerns, and I pointed the owners, like I commented to Gary, to the CC-BY license which has an explicit Attribution (BY) clause:
    Attribution. You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work).

Using this license, plagiarism would not even just be (scientifically) unethical, it would be illegal too, because it would brake the license agreement. This even allows one to bring the case to court, if you like. (BTW, I was recently informed that the database had switched to the CC-BY license!)


  1. Egon,
    I'm glad you brought this up. If someone uses spectra from ChemSpider, they need to cite the source. I WANT people to use our spectra - that is the point of uploading them. If someone is going to plagiarize from ChemSpider, NMRshiftDB, Aldrich, etc. it will get found out and will bite them back professionally.
    I still maintain that data and ideas are safer when public vs. a private submitted research proposal or paper draft.

  2.'ve seen that we have gone with the Open Data declaration used on Crystal Eye just for ease of getting something done in a short time. I have the feeling that overall Creative Commons licensing (or maybe Science Commons licenses if they are available yet) would be more appropriate. Thoughts?

    Whatever the license mechanism I will still take the stance that it is NOT ChemSPider's decision to force Open is a choice for us to offer. However, we/you will continue to educate in rights associated with Open Data. My conversation with Peter SUber was very useful for helping to educate me.

  3. CSM, the CC-BY was just as example that Open Data does not conflict with citation worries... ChemSpider will likely be a mix of licenses, which is OK, in my opinion.