Saturday, February 27, 2016

Sci-Hub: a sign on the wall, but not a new sign

Logo of Sci-Hub. Fair use,
via Wikipedia.
It is hard to have missed Sci-Hub, as it even showed up in the Dutch Volkskrant. It is now sharing PDFs of many closed and open access papers (yes, there are plenty of OA PDFs shared). Opinions about it vary, but it is important to realize that it violates, for the closed access papers, social agreements made between scientists and publishers. There are valid arguments but also FUD, e.g. by the Association of American Publishers (AAP; some publishers, surely) that writes that "publishers work to ensure that their publications create an accurate and correct scientific record, (i.e., publishing revisions to correct or update data)". Yes, we all know how well that works out :/

As always, Björn Brembs has a great overview, a must read, though I think I disagree with the Sci-Hub As Necessary, Effective Civil Disobedience title. The argument from the developer of Sci-Hub convinces me more. Wikipedia writes: Alexandra Elbakyan has cited the Article 27 of the UN Declaration of Human Rights "to share in scientific advancement and its benefits".

I am not a lawyer and have no idea it this really applies, but as a scholar, it does reflect exactly what I care about: research should benefit society, directly or indirectly. We have established many routes, often involving some sustainability model (whatever that is). But at a very base level, there is nothing sustainable about knowledge that is hard to get. Scarcity improves value, but knowledge must be cheap, and unlike the AAP suggests, cheap is not the same as low quality (if only they had gotten that press release peer-reviewed...). If it would, then a simple smartphone would be of lower quality than a fax.

But civil obedience is not the solution. However, the point I want to make here is that the Sci-Hub practices are not really new: scholars have been sharing papers for free for many, many years. As a student I learned how to ask collaborators a copy of some paper important to the research you were doing. And to ask the author of the paper of a copy. This civil obedience was and is common sense. And publishers were happy about this for many, many years. That is, I have never heard of any of them going after universities for these practices of their scholars.

Sci-Hub just makes this common activity easier. It only makes it easier to look up scientific knowledge. It just that publishers do not seem to want that. It has to take effort (which is a sign of high quality research, right?).

So, here is my howto on how to get access to research output in a way that publishers have been happy with for at least twenty years:
  1. do not use ILL or go to a nearby university has access to the paper (do not make a list of papers you can get there and do not go there once a month; while there, do not meet up with other scholars to discuss science)
  2. do not ask collaborators or friends at other institutes if they have access
  1. email the corresponding author of the paper (after all, the author decided on a closed access license)
  2. try to ask for a reprint via their publication list provider, such as ResearchGate (the author will love your request via such systems!)
  3. email him again, if you have not received an answer in about a week
  4. repeat this step a few times; they are busy people and may have missed your earlier communication
  5. email other authors of the paper, possibly the ask the dean of the faculty to ask the author to send you a reprint
  6. once you have a copy of the paper, email the authors to say you have a copy of their paper (they may not even have access themselves, which may be the reason why they did not provide a reprint)
Really, this "reprint" concept has been quite formalized and some of you may still remember the term. Several publishers used to have specific URLs that allowed authors to share up to X (normally quite sufficient for most papers) that authors could send back in reply to one of your above emails. Of course, it would be silly to just have the first X PDF downloads for free anyway.

Oh, and some have asked the Open Access community to indicate their position about Sci-Hub practices. Well, as is clear from the above, the Open Access community has long ago responded how to solve the problem of access to literature: by creating the gold Open Access movement.

So, that's why I try to ensure my important work is Open Access. Because then I don't get bothered with tens of reprint requests and give me time to write this personal </rant>. In no way does this post reflect the position of my employer.

Sunday, February 14, 2016

Aggregating data on nanomaterials: eNanoMapper is getting closer to critical mass

Nanosafety data for silver nanoparticles
in visualized
with ambit.js and d3.js.
The last three weeks featured two meetings around data infrastructures for the NanoSafety Cluster. The first meeting was on January 25-26 in Brussels, and last week the eNanoMapper project held its second year meeting with a subsequent workshop in Basel (see the program with links to course material). Here are some personal reflections on these meetings, and some source code updates based on the latter workshop particularly.

For the workshop in Basel I extended previous work on JavaScript and R client code for the eNanoMapper API (which I previously wrote about and see doi:10.3762/bjnano.6.165).

Nothing much changed for ambit.js (see these two posts) and I only added a method to search nanomaterials based on chemistry rather than names before (release 0.0.3 is pending). That is, given a compound URI, you can now list all substances with this URI, using the listForCompound() function:

var searcher = new Ambit.Substance(

var compound =
searcher.listForCompound(compound, processList);

You may wonder how to get the URI used in this code. Indeed, one does rather than hardcode that, as it may be different in other eNanoMapper data warehouse instances. This is where another corner of the eNanoMapper API comes in, which is wrapped by the method. However, I have to play more with this method before I encourage you to use it. For example, this method returns a list of compound matching the search. So, how do we search for fullerene particles?

R package
The renm package for R was demonstrated in Basel too and some 25% in the audience uses R in their research. The has some updated examples, including this one to list all nanomaterials from a PNAS paper (doi:10.1073/pnas.0802878105):

substances <- listSubstances(
    search="10.1073/pnas.0802878105", type="citation"

The 0.0.3 release made just in time of the workshop fixed a few minor issues. The above JavaScript example cannot be repeated in R yet, but this is scheduled for the next release.

Data quality
For a few materials I have now created summary pages. These should really be considered demonstrations of what a database with an API has to over, but it seems that for some materials we are slowly going towards critical mass. Better, it shows nicely what advantages data integration has: the data from silver materials comes from three different data sources, aggregated in the instance. However, if you look at the above codes, it is easy to see how it could easily pull in data from multiple instances. For example, here is LDH release assay results for two of the JRC Representative Materials:

This, of course, taking advantage of the common language that the eNanoMapper ontology provides (doi:10.1186/s13326-015-0005-5). This ontology is now available from BioPortal, Aber-OWL, and the Ontology Lookup Service (via their great beta). Huge thanks to these projects for their work on making ontologies accessible!

But there is a long way to go. Many people in Europe and the U.S.A. are working on the many aspects of data quality. I would not say that all data we aggregated so far is of high quality; that is, it somewhat depends on the use case. The NanoWiki data that I have been aggregating (see release 2 on Figshare, doi:m9.figshare.2075347.v1) has several goals, but depending on the goal varies in quality. For example, one goal is to index nanosafety research (e.g. give me all bio assays for TiO2) in which case it is left to the user to read the discovered literature. Another goal is to serve NanoQSAR work, where I focused on accurately describing the chemistry, but have varying levels of amount of info on the bioassays (e.g. is there a size dependency for cytotoxicity).

There is a lot of discussion on data quality, as there was two years ago. I am personally of the opinion that eNanoMapper cannot solve the question of data quality. That ultimately depends on the projects recording and dissemination the data. Instead, eNanoMapper (like any other database) is just the messenger. In fact, the more people complain about the data quality, the better the system managed to community the lack of detail. Of course, it is critical to compare this to the current situation: publications in journals, and it seems to me we are well on our way to improve over dissemination of data via journal articles.

Oh, and the view from my room in the Merian Hotel was brilliant!