Pages

Saturday, February 09, 2019

Comparing Research Journals Quality #1: FAIRness of journal articles

What a traditional research article
looks like. Nice layout, hard to
reuse the knowledge from.
Image: CC BY-SA 4.0.
After Plan S was proposed, there finally was a community-wide discussion on the future of publishing. Not everyone is clearly speaking out if they want open access or not, but there's a start for more. Plan S aims to reform the current model. (Interestingly, the argument that not a lot of journals are currently "compliant" is sort of the point of the Plan.) One thing it does not want to reform, is the quality of the good journals (at least, I have not seen that as one of the principles). There are many aspects to the quality of a research journal. There are also many things that disguise themselves as aspects of quality but are not. This series discusses quality of a journal. We skip the trivial ones, like peer review, for now, because I honestly do not believe that the cOAlition S funders want worse peer review.

We start with FAIRness (doi:10.1038/sdata.2016.18). This falls, if you like, under the category of added value. FAIRness does not change the validness of the conclusions of an article, it just improves the rigor of the knowledge dissemination. To me, a quality journal is one that takes knowledge dissemination seriously. All journals have a heritage of being printed on paper, and most journals have been very slows in adopting innovative approaches. So, let's put down some requirements of the journal of 2020.

First the about the article itself:

About findable

  • uses identifiers (DOI) at least at article level, but possibly also for figures and supplementary information
  • provides data of an article (including citations)
  • data is actively distributed (PubMed, Scopus, OpenCitations, etc)
  • maximizes findability by supporting probably more than one open standard
About accessible
  • data can be accessed using open standards (HTTP, etc)
  • data is archived (possibly replicated by others, like libraries)
About interoperable
  • data is using open standards (RDF, XML, etc)
  • data uses open ontologies (many open standards exist, see this preprint)
  • uses linked data approaches (e.g. for citations)
About reusable
  • data is as complete as possible
  • data is available under an Open Science compliant license
  • data is uses modern and used community standards
Pretty straightforward. For author, title, journal, name, year, etc, most journals apply this. Of course, bigger publishers that invested in these aspects many moons ago can be compliant much easier, because they already were.

Second, what about the content of the article? There we start seeing huge differences.

About findable
  • important concepts in the article are easily identified (e.g. with markup)
  • important concepts use (compact) identifiers
Here, the important concepts are entities like cities, genes, metabolites, species, etc, etc. But also reference data sets, software, cited articles, etc. Some journals only use keywords, some journals have policies about use of identifiers for genes and proteins. Using identifiers for data and software is rare, sadly.

About accessible
  • articles can be retrieved by concept identifiers (via open, free standards)
  • article-concept identifier links are archived
  • table and figure data is annotated with concept identifiers
  • table and figure data can be accessed in an automated way
Here we see a clear problem. Publishers have been actively fighting this for years, even to today. Text miners and projects like Europe PMC are stepping in, but severely hampered by copyright law and publishers not wishing to make exception.

About interoperable
  • concept are describes common standards (many available)
  • table and figure data is available as something like CSV, RDF
Currently, the only serious standard used by the majority of (STM?) journals are MeSH terms for keywords and perhaps CrossRef XML for citations. Table and figures are more than just a graphical representations. Some journals are experimenting with this.

About reusable
  • the content of the article has a clear licence, Open Science compliant
  • the content is available with relevant standards of now
This is hard. These community standards are a moving target. For example, how we name concepts changes over time. But also identifiers themselves change over time. But a journal can be specific and accurate, which ensures that even 50 years from now, the context of the content can be determined. Of course, with proper Open Science approaches, translation to then modern community standards is simplified.

There are tons of references I can give here. If you really like these ideas, I recommend:
  1. continue reading my blog with many, many pointers
  2. read (and maybe sign) our Open Science Feedback to the Guidance on the Implementation of Plan S (doi:10.5281/zenodo.2560200), where many of these ideas are part of


Tuesday, February 05, 2019

Plan S: Less publications, but more quality, more reusable? Yes, please.

If you look at opinions published in scholarly journals (RSS feed, if you like to keep up), then Plan S is all 'bout the money (as Meja already tried to warn us):


No one wants puppies to die. Similarly, no one wants journals to die. But maybe we should. Well, the journals, not the puppies. I don't know, but it does make sense to me (at this very moment):

The past few decades has seen a significant growth of journals. And before hybrid journals were introduced, publishers tended to start new journals, rather than make journals Open Access. At the same time, the number of articles too has gone up significantly. In fact, the flood of literature is drowning researchers and this problem has been discussed for years. But if we have too much literature, should we not aim for less literature? And do it better instead?

Over the past 13 years I have blogged on many occasions about how we can make journals more reusable. And many open scientist can quote you Linus: "given enough eyeballs, all bugs are shallow". In fact, just worded differently, any researcher will tell you exactly the same, which is why we do peer review.
But the problem here is the first two words: given enough.

What if we just started publishing half of what we do now? If we have an APC-business model, we have immediately halved(!) the publishing cost. We also save ourselves from a lot of peer-review work, reading of marginal articles.

And what if we just the time we freed up for actually making knowledge dissemination better? Make journals articles actually machine readable, put some RDF in them? What if we could reuse supplementary information. What if we could ask our smartphone to compare the claims of one article with that of another, just like we compare two smartphones. Oh, they have more data, but theirs has a smaller error margin. Oh, they tried it at that temperature, which seems to work better than in that other paper.

I have blogged about this topic for more than a decade now. I don't want to wait another 15 years for journal publications to evolve. I want some serious activity. I want Open Science in our Open Access.

This is one of my personal motives to our Open Science Feedback to cOAlition S, and I am happy that 40 people joined in the past 36 hours, from 12 countries. Please have a read, and please share it with others. Let your social network know why the current publishing system needs serious improvement and that Open Science has had the answer for years now.

Help our push and show your support to cOAlition S to trigger exactly this push for better scholarly publishing: https://docs.google.com/document/d/14GycQnHwjIQBQrtt6pyN-ZnRlX1n8chAtV72f0dLauU/edit?usp=sharing

Sunday, February 03, 2019

Plan S and the Open Science Community

Plan S is about Open Access. But Open Science is so much more and includes other aspects, like Open Data, Open Source, Open Standards. But like Publications have hijacked knowledge dissemination (think research assessment), we risk that Open Access is hijacking the Open Science ambition. If you find Open Science more important than Open Access, then this is for you.

cOAlition S is asking for feedback, and because I think Open Science is so much more, I want the Guidance on the Implementation of Plan S to have more attention for Open Science. I am submitting on Wednesday this Open Science Feedback on the Guidance on the Implementation of Plan S outlining 10 points how it can be improved to support Open Science better.

Please read the feedback document and if you agree, please join Jon Tennant and co-sign it using this form:


Wednesday, January 30, 2019

Plan S and the Preprint Servers

In no way I meant to compare Plan S to the hero Harry P....

Oh wait, but I am, and it's quite appropriate too. Harry was not a hero by himself; Harry was inevitable, he existed because of evil. Furthermore, Harry did not solve evil by himself. He needed Hermione (the scholars), he needed Ron (umm....), he needed Marcel (ummm....).  Likewise, evil has Voldemort (the impact factors), deatheathers (ummm....)... Okay, okay, let stop pushing the parallel before it gets embarrassing. Point is, Harry was insensitive, clumsy, in many ways naive. And so is Plan S. Harry did not want to have to fight Voldemort. But evil demanded Plan S, ummm, Harry to exist.

So, with the big Plan S event tomorrow in The Netherlands I am trying to organize my thoughts. We've seen wonderful discussions over the past month, which have highlighted the full setting in a lot of detail. Just this week, a nice overview of how learned societies do not make profit by making profit but spending that on important thing (doi:10.1073/pnas.1900359116 and draft blog analysis). Neither provide all the details, partly because this publishing world is not fully transparent.

Another wonderful aspect of the effect of Plan S is that people seriously talk about Open Science. Many against the current Plan S still find Open Science important, and the details of the arguments are exciting and complex. I understand most of the concerns, tho I do not believe all are realistic. For example, I honestly do not believe that researchers would turn their (financial) back at their learned societies if they moved to a full OA model (an actual argument I have heard). But then again, I'm naive myself.
Preprint servers
And people come with suggestions. Sadly, we have not seen enough of them since we started discussion Open Access, now almost 20 years ago. Fairly, better late then never, but I wish people realized that Harry was desperate in his last year at Hogwarts and someone had to do it. All other students at Hogwarts kept quite (in movie 7, you can hear some students suggest alternatives to the final battle...)

Now, last week Plan U suggests that preprint servers provide a better solution. I disagree. The current Plan U is too risky. I tweeted some considerations yesterday, which I'll put below. Let me put up front, I like preprint servers, see the 21st tweet.
  1. scholars discussing #Plan_S would do good to study the history of source code... that too started with free software ("shareware") but people quickly realized that did not work, and the community moved to #opensource 1/n
  2. Ignoring that free access is not enough and that you need open licenses is important: learn from history, don't make the same mistakes again. CC-BY-ND is not an proper open license, no license even worse. 2/n
  3. no, think about the role of preprints. First, the name preprint already makes clear it is not the same as a print. I don't care about the journal formatting, but I do care about the last edits. 3/n
  4. @jbrittholbrook used that argument in favor of ND clauses: yes, it *is* essential that we know that the version we read is accurate. Versioning is essential, changelogs even more. Is the latest preprint identical (except formatting)? With/-out ND clauses, this is critical. 4/n
  5. currently, without much effort and therefore high cost, I cannot reliably determine of a preprint version is identical (except formatting) as the published version. Those last changes are essential: that's the added value of the journal editorial role. 5/n
  6. but let's assume this gets solved (repeated errors by commercial publishers do not bode well). How about the #openscience rights (reuse, modify, reshare)? Many preprints do not require an open license. Without an open license it's merely shareware. 6/n
  7. Free reads (also temp free by journals) is nice, accept it's only thinking about now, not tomorrow. It's thinking only about yourself, not others. 7/n
  8. With a shareware article you are not allowed to share with your students. They need to download that themselves (fair, doable). You cannot include it in your coursepack. This is what reuse is about. 8/n
  9. With a shareware article you are not allowed do change it. No change of format (to match your coursepack), no text-mining, no data extraction., etc. This is what the right to modify is about. 9/n
  10. With a shareware article you are not allowed to redistribute it. I already mentioned courseware, but libraries are affected by this. But resharing is also about changing your improved version. 10/n
  11. Resharing after removal of that glaring typo. After rewriting German into English. Old-English into modern English. After fixing this number typo in that table that caused you time to figure out what the hell the authors were thinking (true story). 11/n
  12. These three core #openscience right (reuse, modify, reshare) are essential to science. Just think what would happen if you could not use a new theory/method published? Would you accept that? 12/n
  13. My guess is: No, but you do when it comes to articles. Why? Is money more important that the essence of doing science? Are society activities more important that these basic things? I hope not. 13/n
  14. this is not a discussion of the now. This was 1947 14/n:
  15. of course, one can argue that if you can read the paper, you have all the access you need. But we know this is false. Text mining is essential. Reformatting, data extraction, is essential. 15/n
  16. We now spend millions of extracting knowledge from articles, bc we decided a PDF was the proper way to share knowledge. Disallowing that makes it even more expensive. Money that could be spend on actual research. 16/n
  17. now back to preprints and preprints as a replacement to openaccess articles. I think you see where I am going. 17/n
  18.  
    1. a preprint without a proper license is not #openscience and does not optimally help raise the level of science.
    2. a preprint not identical (except formatting) to the published version, is not a replacement of the published version 18/n
  19. unfortunately, many journal-preprint server combinations do not simply not guarantee a way forward. That must be solved and currently is not an alternative to #Plan_S. Current preprint servers have a different purpose: release soon. 19/n
  20. preprints have many (better) alternatives (open notebook science, #opensource, #etc, etc), but they can be a step forward towards #openscience, but if, and only if, they follow the basics of doing (open) science 20/n
  21. I'll wrap up this thread with linking to my first preprint, of Aug 2000: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2969350 … It's part of the Chemistry Preprint Server (CPS) archive, hosted by Elsevier. More about CPS in #Scholia: https://tools.wmflabs.org/scholia/topic/Q50123525 … 21/21


Sunday, January 20, 2019

Updated HMDB identifier scheme #2: Wikidata updated

About a year ago the HMDB changed there identifier scheme: they added two digits to accommodate for more metabolites. They basically ran out of identifiers. This weekend I updated the HMDB identifiers in Wikidata, so that they are all in the new format, removing a lot of secondary (old) identifiers. The process was simple, combining Bioclipse, the Wikidata Query Service, and QuickStatements:

  1. use SPARQL to find all HMDB identifiers of length 9
  2. make QuickStatements to remove the old identifier and add the new identifier
  3. run the QuickStatements
QuickStatements website with the first 10 statements to update the
HMDB identifiers for 5 Wikidata compounds.
I ran the statements in batches, allowing me to keep track of the progress. Some reflection: there was quite a bit of references on the statements that got lost. The previous HMDB identifiers were often sourced from ChEBI. But the new identifiers do not come from there, they're sourced from Wikidata and adding "stated in" "Wikidata" did not make sense to me. Another thought is that it would have been nice to combine the removal and addition in one edit, but since they are executed right after each other, the version control will keep them together anyway.

The Bioclipse script can be found here.

Sunday, January 13, 2019

cOAlition S is requesting feedback

Plan S (Wikipedia, Scholia) is here to stay. The pros and cons are still being explored, but with the list of participants and endorsements still growing (the latest endorsement comes from the African Academy of Sciences), it seems unlikely the ten principles will be disregarded. The implementation, however, may still change. I'm still looking forward to hearing more about the alleged improper lobbying and CoI of Frontiers. I have some thoughts and observations about Frontiers myself.

Meanwhile, one aspect Plan S highlights are the huge differences in politics between European countries. An ongoing debate in Germany about whether scholars have a legal unlimited freedom to publish where they want (ethical or not) is yet unsettled, I think. And Norway did not seem to have discussed Open Access publishing much yet, and it has been suggested that the sudden introduction of Plan S may be unlawful.

The situation in The Netherlands regarding the latter is different, in that Plan S here naturally follows from a direction chosen by the Dutch government some years ago. This resulted in a formal advice of the Dutch Presidency, the Amsterdam Call for Action on Open Science (2016) and governmental policies based on that. During the Presidency a formal meeting was organized in, no surprise, Amsterdam, where a draft was presented and formally discussed with stakeholders. Individual researchers had been invited, and with some luck my reply to the invitation was accepted and I joined the meeting.

My main comment on the draft Call.
I cannot say the meeting left a lot of room for improvement, and the draft was shared with participants only very shortly before the meeting. I stressed the importance of three core (user) rights of Open Science: reuse, modify, redistribute, but while some other points were picked up (I don't think the organizers had a lot of room, as it would be signed on the spot and presented to the European Union), this point did not get picked up.

Now, Plan S also fails to mention these rights, which I consider a serious flaw. Instead, they choose to focus on a specific implementation of those three rights. This is counter normal procedures in European politics or at least Dutch European politics, where things are generally kept vague to be refined later.

Of course, the discussions on Open Science did not start early 2016. Dutch politics is not that fast. One aspect of the Dutch discussion has been that the focus has been too much on the cost of open access publishing, and this leaks into Plan S, is my impression. But Dutch research institutes (particularly via their libraries) have brought up the unsustainable situation of journal subscriptions for quite a bit longer. I seem to remember the discussion of big package deals when I was a student in the nineties, a time where individual researchers and researchers still have "personal" subscriptions. I used to read JCIM (JCICS at the time) at the CAOS/CAMM (see doi:10.1007/978-3-642-74373-3_51). Yes, the need for this reform has been discussed for at least 20 years in The Netherlands.

For me, as a Dutch researcher, Plan S is not radical nor a surprise (*): it is a natural consequence of publishers resisting the needed reforms that have been started years ago, and upon which subscription deals between the Dutch universities and publishers have been based. With less than two years to go, the ambition set out by the Dutch government to be 100% Open Access by 2020 was and is far away (unless there is some radical change to an exponential increase in these last months). So, if the Dutch government wants to keep its political promise, a radical change was needed. The only surprise (hence the *) is, perhaps, that they wanted to keep their promise.

Since the Amsterdam Call, the Dutch government further involved Dutch researchers, via the National Platform Open Science (NPOS), where various researcher organizations are actively involved (postdoc network, VSNU, etc, etc). NPOS has been underfunded and the involvement of researchers could have been a lot better.

It must also be mentioned that Plan S, as far as I know, has not been discussed at this level of NPOS. I expect it did get discussed by the participating NPOS partners (which included NWO). This is not surprising to me either, though I would very much appreciate a weaker hierarchy. But that hierarchy is very Dutch, and even researchers indicate to not have time for all those discussion, so things are self-organized in representing organizations: it seems the general Dutch consensus was (mind you, I'm an ECR, I did not design this, and this approach is not uncontroversial, as #WOinActie makes clear) that representation is the best way forward. And as far as I can see, this is how Plan S came about. But the discussions around Plan S make clear that a lot of researchers feel left out. Understandably, but that is the Dutch academic culture to blame, not the Dutch funders: individual researchers rarely get asked for feedback on national guidance/policy documents. At the same time, that does not invalidate their concerns either, of course.

So, while the above may not have said it, I like what cOAlition is attempting to do: the publishing system is breaking down and must be fixed and only few publishers are making a serious effort (at a time where some publishers make huge profits). The discussion has been nasty on both sides. Insinuations that gold Open Access journals are not interested in quality are hurtful (remember, I'm editor-in-chief of such a journal, and I work overtime to ensure the highest standards for our articles, but know the limitations of publication platforms (see this post) and peer review, despite the journal having access to very qualified researcher pool).

Being an academic, you need holiday to sit down and do something you care about but that is not paid for. For me, commenting on Plan S, or contributing as critical observer to the NPOS, is one of that. Even finding time for that interview about Plan S in ScienceGuide has been hard.

But now that the deadline for the call by cOAlition S for feedback is nearing, it's time to get my points written up. I decided to use Hypothes.is for this (yes, I have not used it enough for someone who joined it in 2012...):


I started with the Too Risky? Open Letter, followed with commenting on Plan S. The fact that I do not like at all how the Open Letter was formulated (see my comments; the letter has a number of fallacies), does not mean I like Plan S as it is (which some seem to assume). These annotations looks like this and can be read with and without a browser plugin:


There is plenty more to annotate, including the letter of support of the principles (see doi:10.1038/d41586-018-07632-2). I signed this letter: while I do not agree with the wording of all principles, putting it into context of the Dutch situation, their intentions make a lot of sense. But I have to say, I may have been rushed into signing it, with the Too Risky? using various fallacies and suggesting it represents researchers in general (the letter does not say that literally, neither do they make it clear to just speak on behalf of the signers, causing a lot of press media to misrepresent the letter). The letter sketches a doom scenario and is worded what the open source community would refer to as FUD: fear, uncertainty, and doubt.

Effectively what happened is that this Too Risky? letter has caused (for me), is how hard some serious effort is needed to make the required changes. Every change has consequences. One team focuses on the positive consequences, another team focuses on the negative consequences.

I do not know what Plan S will bring us. My name is not Nostradamus (or this Dutch reference). I do know the risks of the current system. Those are easily named and have been apparent for at least two decades. Too Risky? The current system has already done a lot of harm and is proven risky. I am happy that Dutch funders dare to invest in the future. Plan S may not be the right option, and I am looking forward to alternative solutions that ensure the three rights of Open Science (reuse, modify, redistribute). People have freedom to choice if they want to practice Open Science or not. That freedom is, in The Netherlands in recent times, limited by several things, but the limitation that the Dutch national funders wants research to benefit Dutch society is in line with Dutch political climate of the past decade (think Nationale Wetenschapsagenda).