Tuesday, July 22, 2014

Open Notebook Science ONSSP #1:

As promised, I slowly set out to explore ONSSPs (Open Notebook Science Service Providers). I do not have a full overview of solutions yet but found LabTrove and Open Notebook Science Network. The latter is a more clear ONSSP while the first seems to be the software.

So, my first experiment is with Open Notebook Science Network (ONSN). The platform uses WordPress, a proven technology. I am not a huge fan of the set up which has a lot of features making it sometimes hard to find what you need. Indeed, my first write up ended up as a Page rather than a Post. On the upside, there is a huge community around it, with experts in every city (literally!). But my ONS is now online and you can monitor my Open research with this RSS feed.

One of the downsides is that the editor is not oriented at structured data, though there is a feature for Forms which I may need to explore later. My first experiment was a quick, small hack: upgrade Bioclipse with OPSIN 1.6. As discussed in my #jcbms talk, I think it may be good for cheminformatics if we really start writing up step-by-step descriptions of common tasks.

My first observations are that it is an easy platform to work with. Embedding images is easy, and there should be option for chemistry extensions. For example, there is a Jmol plugin for WordPress, there are plugins for Semantic Web support (no clue which one I would recommend), an extensions for bibliographies are available too, if not mistaken. And, we also already see my ORCID prominently listed, and I am not sure if I did this, or whether this the ONSN people added this as a default feature.

Even better is the GitHub support @ONScience made me aware of, by @benbalter. The instructions were not crystal clear to me (see issues #25 and #26), some suggested fixes (pull request #27), it started working, and I now have a backup of my ONS at GitHub!

So, it looks like I am going to play with this ONSSP a lot more.

Friday, July 18, 2014

Open Notebook Science: also for cheminformatics

Last Monday the Jean-Claude Bradley Memorial Symposium was held in Cambridge (slide decks). Jean-Claude was a remarkable man and I spoke at the meeting on several things and also how he made me jealous with his Open Notebook Science work. I had the pleasure to work with him on a RDF representation of solubility data.

It took me a long time to group my thoughts and write the abstract I submitted to the meeting:
    I always believed that with Open Data, Open Source, and Open Standards I was doing the right thing; that it was enough for a better science. However, I have come to the realization that these features are not enough. Surely, they aid Open collaborations, though not even sufficient there, but they fail horribly in the "scientific method." Because while ODOSOS makes work reproducible, it lacks the context needed by scholars to understand what it solved. That is, it details out in much detail how some scientific question is answered, but not what question that was. As such, it fails to follow the established practices in scholarly research. In this presentation I will show how I should have done some of my research, and ponder on reasons why I had not done so.
And it also took me a long time and a lot of stress to get together some slides, but I managed in the end:

During the talk I promised to start doing Open Notebook Science (ONS) for my research, and I am currently exploring ONS platforms.

The meeting itself was great. There was a group of about 40 people in Cambridge and another 15 online, and most of them into Open Science or at least wanting to learn what it is about. I met old friends and new people, including a just-graduated Maastricht Science Programme student (one that I did not have in my class last year). Coverage on Twitter was pretty good (using the #jcbms hashtag, an archive) with some 90 people using the hashtag.
Several initiatives seem to be evolving, including an ONS initiative and a memorial special issue. All these will need to help from the community. The time is right.

Sunday, July 06, 2014

#JChemInf Volume 5 as PDF on @FigShare

One of the things I do to prepare for holiday, is get some reading stuff together. I haven't finished Gödel, Escher, Bach yet (a suggested from the blogosphere), with a bit of luck there are new chapters of HPMOR, and I normally try to catch up with literature. One advantage of Open Access is that you can remix. So, I created a single PDF of all JChemInf Vol. 5 articles (last year I did volumes 1, 2, 3, and 4). This PDF is about 75 MB in size, and therefore fits on most smartphones. The PDF has an index, but doesn't have entries for each paper, but jumping from abstract to abstract works fine. It has a bit over fifty peer-reviewed papers.

Another advantage of Open Access is that you can reshare. And so I did, and the volumes are available from FigShare:
  1. JChemInf Vol.1
  2. JChemInf Vol.2
  3. JChemInf Vol.3
  4. JChemInf Vol.4
  5. JChemInf Vol.5
Of course, a clear downside it, is that it interferes with #altmetrics. And, I am wondering if a similar thing can be done with ePubs.

Saturday, July 05, 2014

Journal Open Data Guidelines: plenty of room for clarifications

J. Gray, Wikipedia. CCZero.
Several journals are playing with statements about Open Data, and, for example, F1000Research and require Open Data. When publishers are judged in their implementation on Open Access, so should we critically analyze journals that claim to be an Open Data journal. Well, such claims I have not seen, but some journals have promising statements, like:
BioMed Central
    Data associated with the article are available under the terms of the CCZero.
However, this claim is vague, or, at least, too vague for a paper I am currently reviewing. The fuzziness lies in the word "associated". What defines associated data? How does this relate to reproducibility? If the purpose of Open Data is that the results of the paper can be reproduced, it means all data? And what happens if some of the data is from a previous paper? Or from a proprietary database? Is a paper that has data from proprietary database as key steps in the argumentation acceptable to a data that demands Open associated Data? What if the authors do not have control over the the license? Or is it limited to new data? But what defines new data here? Because it is a really hard question in an era where data has very limited provenance (versioning, author attribution, etc).