Pages

Sunday, March 16, 2014

Publishers #fail to innovate knowledge dissemination

Source: Wikipedia, public domain.
I have ranted often enough about publishing. I have also often enough indicated how publishers (or journals) could improve their act. Enough to find in the archives of this blog. Even the more innovative publishers have a long way to go. The reason why I blog about this, is why I can be happy with something like a rrdf package (doi:10.7287/peerj.preprints.185v3). Seriously, it is far away from where my heart is: understanding the underlying chemistry of biology. Really, I rather study how phosphorylation really causes signaling; at some level this is just protein interacting with another protein, small molecule, or something. But what? Still, the package makes me happy. No one else is doing it; I need it. We all need it to make science more reproducible. We need good tool and we do not need excuses for not doing it right (tm).

And just to make the point, we do need tools like this. We did 20 years ago. And publishers have done way too little. I really understand innovation is slow, is expensive. But, come on, use your imagination. I cannot solve everything in the world and really on others to implement stuff too. And here is an idea.

What if publishers could actually solve this problem. I know plenty of people are talking about it, and give it funny names, like nanopublications. That idea too existed for more than 20 years now. In fact, CMLRSS is not far from the nanopublication (doi:10.1021/ci034244p). And it was functional. Really, the implementation and standard is not even the issue. The key is adoption. Adoption may be slow, but it must exist. And for adoption to happen, you need commitment. For example, by promising that the time and resources invested in the adoption will have a return in investment. For example, have a guarantee that your solution won't go commercial at some point (causing a vendor lock in!).

But that something must happen is clear if you return to the science. Have you ever tried to do some theoretical study of some phenomenon? Than you know that data availability is a problem. And this data scarcity is exactly the reason why it has become valuable, and causing people to sit on top of it like a hen on her egg(s). If you ever have been involved in getting some good quality data together (ever noticed that much commercial data does not have the data you really need?), you know how expensive data is then. Recovering it costs more after the publishing process then before. Really, the original notebook has more information, likely be more informative then the formal publication.

Not just has the publishing model itself become more expensive than needed (just think about the APC of newer publishers, like PeerJ!), publishers also make access to the data more expensive than really needed.

This is a huge fail is the Western approach to science: we enormously disrespect data.

If you are not convinced, please give me answers to these questions (read active ingredient for "drug"):

  1. how were the CYP experiments performed for the top ten selling drugs and what are the main human transformations?
  2. what is the experimental errors on pKa measurements of the top ten selling drugs (uncharged and single charged, positive and negative)?
  3. how were the logP values measured for the top ten selling drugs and at what pH?
  4. what are the size distributions of samples of nanomaterials reported in literature?
  5. what are the different forms of a protein (not shape, but in terms of structure; so, phophorylation states, exact position, relevant SNPs, etc) of the top ten proteins relevant to pancreatic cancer?
If you can answer any of these questions in less than one hour with provenance (list of DOI and/or PubMed IDs), then I love to hear that. It would give an estimate of the problem. However, my estimate currently is that you cannot fully answer these questions, and most certainly not within one day. Had publishers taken their goal of knowledge dissemination seriously in the past 20 years, it would have been a lot simpler. But they failed. Why should I trust them to do better in the next 20 years? Meanwhile, with the limited funding I get, I will keep being happy with things I can contribute.

Now, if you do not understand why those details matter, start doing a multivariate statistics course. </rant>