Tuesday, December 29, 2020

Journal of Chemical Information and Modeling introduces new editorial guidelines around Open Science

    Alt Hybrid OA logo.    

This week, the Journal of Chemical Information and Modeling introduced new editorial guidelines around data and source code sharing. They basically do not change much around their open science policies: they continue want to support closed science. Well, there has to be a journal for that too, perhaps. So, what changed? 

The editorial is aimed at the needs of the reviewer. With that, it puts additional stress on that the review process as sole gatekeeper of the publication process. The two or three reviewers will now have new responsibility to assess the potentially temporarily access to the data and source code. As a reader, you have to trust that those reviewers actually reviewed the data and code sufficiently.

One very visible change is that articles will have a Data and Software Availability section from January 1 onward (when the new editorial policy kicks in). This is a section that BMC journals have had for a very long time. In fact, I am actually pondering of proposing an update for the Journal of Cheminformatics to change this. We need to move to proper data and software citations. Think DataCite.

Any step towards more Open Science is often a good step. This editorial is a good step. It is recognition for the people supporting Open Science in chemistry for the past twenty+ years. My once favorite journal now saying Open Science is to be encouraged is just awesome!

But we should be aware it is not an Open Science policy. It is quite different from the editorial standards of the Journal of Cheminformatics. While JCIM encourages open science, JCheminform expect it. The new editorial fits the hybrid open access nature of the journal.

Nevertheless, congratulations to the editorial team for this step towards Open Science!

Monday, December 28, 2020

21 Tips on how to sound #openscience

Jon, RIP.

One of the things around Open Science is how some think they can use the term. To me, when I was introduced to the term, back in 1999, it was from a USA-centric view that originated from and based on the ideas of open source software: You can find this back in literature. There is even earlier literature that uses the term in a more economic context tho.

And this USA community defined Open Science as something that provided rights to users: the right to use it (normally with some minimal restrictions), the right to modify it (for example, to curate the output), and to redistribute the result. Sounds pretty useful to me. In fact, I think this is the core of just doing science. This is where the slogan "Open Science is just science done right" comes from, I guess.

However, like any buzzword, it quickly gets picked up by, ummm, creative people that like to take benefit from the popularity of the term. Creative enough to brand themselves as Open Science. Well, fairly, they have been openly fighting against Open Science. That's open science too, right? </not>

So, to raise a bit of awareness of what is important to keep in mind when doing Open Science and to encourage equity among users, I wanted to highlight some of the creative uses of the term I have seen, each one of them not really open science, but they sound like that. I hope to achieve it makes you wonder next time: "Is that really Open Science? Am I indeed personally and actively included in the dissemination of this research output? Can I use this in my own work and share those results with others?"

Making this series was actually harder than I imagined. The misuse turns out to have some common patterns and I quickly ran into the notion that an earlier tweet already covered the essence. Anyone, please enjoy the tweets. You can jump to the opening tweet or use this conveniently unrolled thread by Thread Reader App.

Saturday, December 26, 2020

Bacting: Code coverage, JaCoCo,, and API coverage

          A good bit of work to do.      

I have been fan of code coverage. When combined with (unit) testing, it indicates which code of your software has been run and therefore tested. Some 15 years ago, when I worked on making the Chemistry Development Kit code base more stable, I worked on various things: modularization, documentation, (unit) testing. I explored the option in Java. I even extended PMD with CDK-specific unit tests. And my StackOverflow question on JUnit test dependencies still gives me karma points :)

Fast forward to now. Routinely building software has become quite common place as is unit testing. The tools to support this have changed the field. And tools come and go. Travis-CI will became rare for open science projects, but where GitHub replaced SourceForge, GitHub Actions step it.

But I submitted a manuscript to the Journal of Open Source Software, to learn from their submission system (which is open and just plain awesome). One reviewer urged me to test the test coverage of my code and give me a pointer to JaCoCo and I am not sure if the CDK used JaCoCo in the past too, but getting all info on a website was not trivial, tho we got that done. Rajarshi may remember that. but with continuous building and it is automatically available on a website. With every commit. Cool!

However, autumn had already started and I had plenty of project work to finish. But it is holiday now, and I could start working on the reviewer comments. It turned out the pointers were enough, and I got working for Bacting. Not being tested with a test suite does not mean it is not tested at all. I use Bacting daily, and this use will only grow in the coming year.

That brings me to another reviewer question. How much of the Bioclipse 2 API does Bacting support. Now, that question is a bit tricky. There is the code Bioclipse 2.6 release (doi:10.1186/1471-2105-10-397), but there were a few dozen plugins with many more Bioclipse managers. So, I checked what managers I had locally checked out and created a GitHub Project for this with various columns. And for each manager I have (or want) in Bacting, I created an issue with checkboxes, one for each method to implement. And that looks like this:

I really hope the Maastricht University GitLab will become more user visible in the next year.

Tuesday, December 15, 2020

new: "Can an InChI for Nano Address the Need for a Simplified Representation of Complex Nanomaterials across Experimental and Nanoinformatics Studies?"

Table of Contents graphics.

I do not like questions as titles of articles: you either found an answer or not. In this case, the answer is likely a maybe. The paper milestones a discussion that started some years ago with the aim to create a standardized identifier for nanomaterials. Quite ambitious. I am happy to have been invited to contribute to this discussions and paper, but can hardly take much credit. The team dit an awesome job in capturing the complexities of representation of the chemistry of nanomaterials. Like polymers and mixtures, the chemistry is not as straightforward as organic chemicals (well, most of them anyway; hello five-coordinated carbon).

As visible in the table of contents graphics on the top right side of this blog post, and following habits of the InChI family, the NInChI draft has layers. That allows grouping of nanomaterials at different levels of details, pretty much like the InChI. Please do check out this article:

Lynch, I. et al. Can an InChI for Nano Address the Need for a Simplified Representation of Complex Nanomaterials across Experimental and Nanoinformatics Studies? Nanomaterials 2020, 10, 2493.

So, the next step is with the InChI Trust to respond to the core team. Here, I welcome comments too. For me, I will see if I can do something with the Chemistry Development Kit, if I can find the time :/

Sunday, December 06, 2020

Open Standards mean independence, freedom

   Source: Wikimedia.

Open Standards, or Open Specifications as I personally prefer, remove another hurdle in locked-in science. They allow others to understand your language, the nuances of your message. They mean independence, freedom. One important international standard is the Open Document Format (ODF). It is supported by all major editors (yes, MS Word too). Indeed, scholars have been quite persistent and insisting on using closed source and semi-closed solutions like .docx files. Often this is because of the track changes. We have plenty of alternatives for that too, but old habits die hard.

Anyway, as I am trying to use open formats as much as possible (LaTeX, Markdown), but still have collaborations that do not have computers with open solutions (correlated with some vendor lock-in), I also end up sending around Word files, or use Google Docs, for tracking changes. If you search my GitHub repositories, you undoubtedly will find LaTeX of journal articles, with track changes with git.

So, yesterday I was wondering if I couldn't mix the two worlds. I've done this before. Markdown converts to quite reasonable .docx files. It's just the applying of the changes in the original take a bit more effort. Not that much, really, as I have to check them one by one anyway.

But what if I could automate this? The Word files are semi-closed, but that also means they are semi-open. A Word file, in fact, is just a ZIP archive. This also works great for extracting the images from Word files, did you know that? Well, now you do.

I asked on Twitter and got replies in second. I have yet to explore them, but thanks to Simon and Chris, I now have these two leads to explore if I can convert a Word file into a Markdown/Git patch:

I thought I'd just drop them here. Think of it as open notebook science (at the very least, for my future me).