Thursday, September 26, 2013

Why do databases make sharing Open Data difficult?

I tend to feel quite isolated in these matters, but they matter to me: licenses, agreements, etc. Because I try to be a friendly guy and respect the wishes expressed by others.

However, this puts me in a situation where I cannot join many otherwise interesting initiatives. There are many examples, but I will isolate one, for no particular reason other than that they just published an interesting paper about DMSO solubility modeling (doi:10.1021/ci400213d): the Online Chemical Database.

The training data from this solubility study is available from this website, and is listed in the abstract as freely downloadable. Well, free as in free beer. I cannot even look at the data set metadata without signing a license. So, I started reading the license, and clauses like this worry me:
    4.1 The User grants to Helmholtz Zentrum Muenchen by submitting information, data, models and structures to the Online Chemical Environment a world-wide, non-exclusive, transferable and sub licensable right to use all information data, structures and models submitted, for research, teaching and any other (including commercial) purposes.
Originating from an open, academic culture of collaboration, I rarely am the sole copyright owner of a data set. And with my busy agenda I am really not going to chase down all owners and ask them if they are willing to assign these rights to the Helmholtz Zentrum Muenchen. Do you seriously think I have nothing better to do? So, I cannot contribute data to this database. Worse, this clause probably not compatible with Open Data license in general. I fully understand the attention, but you are paying your legal experts probably a lot of money, so let them do their work and explicitly allow Open Data licenses, indicating that any such clauses do not apply to such data.

BTW, comparing this clause to 4.2 is awkward too. Not giving downloaders of data sets uploaded to the database the same rights as the uploader has given you, doesn't sound like being a good citizen.

Now, in no way this data base is unique. Many databases I encounter, all with the best of intentions, come up with legal obstacles. Is that really what you wanted to do?