In reply to Peter's news that the NIH's PubMed Central (PMC) does not allow machine retrieval of content, I was wondering about this section in the CC license of much of the PMC content, such as our paper on userscripts (section 4a of the CC-BY 2.0):
- You may not distribute, publicly display, publicly perform, or publicly digitally perform the Work with any technological measures that control access or use of the Work in a manner inconsistent with the terms of this License Agreement.
Let me make clear that I value machine readable publications much more than free (gratis, as-in-free-beer) publications. Now, the NIH initiative now just is 'Free Access'. An interesting step, but not one I care much about; not in relation to science anyway.
Now, Peter indicates that the NIH has put in place 'technological measures to control access' to the distribution of our work on userscripts (the PMC entry). That is in clear violation of the CC license.
I know that other NIH initiatives do allow this, such as PMC OAI, but that's just an 'auxiliary service'. It may come down to technical details, but some text on the PMC website is at least inaccurate:
- Crawlers and other automated processes may NOT be used to systematically retrieve batches of articles from the PMC web site. Bulk downloading of articles from the main PMC web site, in any way, is prohibited because of copyright restrictions.
What the PMC website should indicate, instead, is that text mining is allowed for the PMC OAI subset, but that they would highly prefer to use the PMC OAI or PMC FTP routes. This is the least they have to do.
No matter what, I still have the feeling that any technical obstacles are disallowed by the CC-license. Any legal expert here, that can explain me if the CC license allows controlling how people have access to my material?


0 peer reviews:
Post a Comment