Thursday, December 08, 2011

Open Science and Non-Commercial licenses (a personal reflection to the Oscar/RSC controversy)

Peter has started a new line of discussion in his blog, referring to a correspondence with representatives from RSC last year, about an annotated literature corpus to (re)train the Oscar3/4 text miner. There are very many sides, and after I reread this post for a second time, I was still not 100% happy about all words: I can only try to express the complexity of the matter and how it started, but do hope to be clear that non-commercial licenses are not useful in Open Science.

I have taken part in parts of the correspondence Peter refers to, and I would not have written up things as Peter wrote up his impression of the outcome of that discussion, and at some point I seem to no longer have been included in the email correspondence, as I at least did not know the final outcome (see below), and cannot fully comment on the accuracy of Peter's coverage of that correspondence, but my impression on the outcome, as limited as it was, is not that far away from what Peter wrote up: Oscar4 needs training (doi:10.1186/1758-2946-3-41), and the RSC was unwilling to contribute the full text training corpus to the project without a non-commercial (NC) clause (and I explain below why I think this is bad). Oscar without a training corpus is useless; Oscar with a NC-licences training course is not Open Source (see below). As detailed below, the corpus at sentence level is NC-free licensed, and a lot of training can be done that way. Sufficient?

Peter wrote:

"I pointed out very clearly that CC-NC would mean we couldn’t redistribute the corpus as a training resource (and that this was essential since others would wish to recalibrate OSCAR). Yes, they understood the implications. No they wouldn’t change. They realised the problems it would cause downstream. So we cannot redistribute the corpus with OSCAR3. The science of textmining suffers again."
I do not know if it is factually correct that the RSC would not change (below we read they attempted), or whether the organisation really understood the problems. But, it certainly is a fact that we cannot redistribute Oscar4 as an Open Science project with a NC-licensed clause.

And, I want to add and stress here, that blog posts sometimes are just like press releases: things have the highest impact if written down in a black-and-white fashion; and getting things factually wrong happens to all of us now and then.

One of the outcomes I learned about this week, is that the RSC released the corpus in some form without the NC-clause. The full text paper corpus remained the NC clause of the CC license, but there is also a version where all sentences are released, and this has a CC license without the NC clause. I think this is not optimal, but still very much appreciate the gesture the RSC is making here, and would to kindly thank them for that! And do I want to make that clear too (thanx to Cameron for phrasing it so well in his comment), it is the principle freedom for the RSC to decide what they want to do, and I fully respect that.

Well, with that out of the way, and I wanted to say something about it, having been involved in the discussion, and feeling a bit in between Peter and the RSC here, appreciating both their view points, and having a third one myself, let's focus on this non-commercial clause a bit more.

Of we enlarge our scope a bit, away from written material, to Open Science, it is clear that the non-commercial clause is bad. In the Open Source world, organisations like the Debian project clearly state that non-commercial clauses violate basic freedoms. From an Open Standard point of perspective, this is pretty much the same. The reason, whether you like it or not, we live in a commercial world. Society expects us to me commercial, and any serious business is legally required to make making profit a company goal. Now, this effectively means that any science made available as non-commercial is not Open: you are effectively not giving people the freedom they need to advance science.

In short, a CC license with the NC clause is in fact quite like "yes, we love to be Open, but we are too scared". Now really, I understand this scare. I am a scientist, post-hopping around Europe, not tenured, and not being an experimental scientist, unlikely to become one. Don't tell me about risk and scare of making things Open. Yet, I did, and it payed of (not enough yet; still looking for a fixed academic position, as I already indicated). But in the more than 15 years I have been working now in Open Science, I have yet to find a compelling (or any) argument to back up this fear: the perceived risk of the NC clause has so far not proved any different than a fear of ghosts.

On the other hand, if I would not have been involved in Open Science, I would not have worked for the top European institutes I have been working in the past ten years.

So, what are the arguments for using the NC clause? The fear I understand, but arguments I do not see that support that a NC clause is useful in an Open Science setting.

Further reading:


  1. Copyright or Controlright?
    Copyright is funny business. Even in science. Even in Open Access. You would be forgiven for thinking that copyright is all about protecting economic interests. But you would be wrong (though sometimes it is about protecting economic interests, honest).


  2. I find it a bit humorous that pmr writes so strongly about the need for choosing a specific CC license and for allowing derivative works when a couple of years ago I pointed out that the CML spec license doesn't state the specific CC license nor does it allow derivative works. I just now brought that up as a comment to one of his recent essays, and don't mean to go into that further here.

    With a different sense of humor, I read your statement "serious business is legally required to make making profit a company goal." As someone doing business through his own consulting company I can say that yes, that's a goal. Just like how you also want to make some profit as an employee.

    The problem is when profit becomes the overriding goal, even to the detriment of long-term social and cultural benefit.

    One downside of being self-employed is that I do not get educational discounts for anything. A conference which costs 200 EUR for you might cost 500 EUR for me, and that extra 300 EUR comes from my profit, which is directly tied to my income. While I might find new work out of the conference, you might find a new job, or help land a new grant contract.

    So, what are the arguments for using the NC clause? You mention fear as one.

    "NC" fits with the tradition that educational organizations get a discount, or no-cost access to software, on the assumption that they have little money. (Frankly, a professor at a well-endowed university shouldn't be paying less for a conference than I have to pay.) It continues the idea that "since I'm non-commercial and I get a discount then others who are also non-commercial should also get a discount."

    Perhaps they might know that there's a university department which handles commercial licensing agreements, and they know the journal has the same. So it seems to fit in with the status quo.

    A second argument is greed. "Why should someone else make money off of my work?" It doesn't cost the author much of anything to say "non-commercial", and there's a potential for some money in the future. Why not take that chance?

  3. Egon

    1st link is to PMRs blog post about OA, which is not relevant to the OSCAR set. Let's not confuse the two.

    2nd you have the final email, if I can get agreement from the others involved I'm happy to post it.

    3rd we really need to be precise. You asked for the full set of articles, but OSCAR only needs sentences. So we released the sentences, and OSCAR has it's open and redistributable training resource.

    So if "But, it certainly is a fact that we cannot redistribute Oscar4 as an Open Science project with a NC-licensed clause." was the case, it would be true. But it isn't, as we've released what OSCAR needed. Argh!

    I appreciate you've tried to give your point of view, but you heard it all, and I hope it's clear that we came up with something that allows OSCAR to carry on.

  4. Richard, thanx for the further information! Really, very much appreciated.

    I also asked Peter further informations about sentences or full text. Oscar4 is magical in ways, and the code is sometimes hard to understand; I'm afraid I do not know all the details, and Corbett's code is sometime quite hard to follow. The rewriting last year was really needed :)

    Peter replied to me yesterday [0] that:

    "no. cannot deduce zoning in document. NPG tried this - failed"

    This is a crucial bit of information for that part of the discussion.


  5. @Andrew: neither do I understand why academics need to be singled out. This is a kind of discrimination unaccepted in the Open Source community.

    I only ever guessed to be the same kind of 'advertisement' to keep important, future clients close. Not unlike Microsoft not coming to hard down on people copying Windows at home; just to make sure everyone got used to the environment.

  6. Peter's comment would be crucial if relevant, but my colleague Colin Batchelor actually understands how OSCAR4 works (and I'm indebted for his explanation below). So:

    Oscar4 does not do zoning of documents. This is entirely separate work done by Simone Teufel at the CL and Advaith Siddharthan, now at Aberdeen. What Oscar4 does do is sequence tagging---assigning probabilities to words being part of a named chemical entity based on a small number of words preceding and succeeding them. Any contribution from adjacent sentences within a paragraph is likely to be extremely small.

    This corresponds exactly to what was agreed as an acceptable compromise, which is reflected in the email trail you have, and which I would be delighted to release when I get agreement from the other individuals involved in Peter's group.

  7. Science and Technology Content for chem-bla-ics
    Hi Egon Willighagen,

    My name is Ben Chasteen and I am the Science/Technology editor at Before It's News, a people-powered news site serving over 4 million people a month. We publish over 4,000 user-generated posts each day at

    I contacted months ago to see if you wanted to syndicate your RSS feed however, but didn't hear anything back. This time I am contacting you because I was wondering if you would be interested in receiving a short email of our top 5 Science/Technology stories each week? We have a lot of stories that the mainstream media don't cover. I think you'd find it a great source of unique information. If it's ok, please just email me back with a YES. You have my iron-clad promise that your email address will not be used for any other purpose or be added to any mailing lists.

    I would also be your personal contact at Before It's News, should you ever have questions or need anything.

    By the way, we also offer free WordPress blog hosting, and we can syndicate your RSS feed, if you're interested. Just let me know.

    Thank you,

    Ben Chasteen
    Science/ Technology Editor
    Before It's News
    775 East Blithedale Ave. #362
    Mill Valley, CA 94941

  8. Dear Ben Chasteen,

    I am sorry I did not reply. I have not found any record of your email, and guess it was catched by the spam filter.

    To answer your first request: thank you for your interest, but I do not feel my feed is particularly suited for a large audience, as my posts are short conversations rather than nicely written articles.

    Also, I have no interest in a daily mail, which, in fact, contradicts the purpose of feed syndication.

    Finally, I am a bit worried about your service, where my first post I git in your Science and Technology is a post on evolution science bashing by a creationist [0]. I am not sure I want to be among a feed with such posts. I appreciate the Fiction/Fact buttons, but facts are settled by popularity. I strongly suggest to remove that feed as your first possible option.

    Good luck with your service,