Pages

Sunday, January 17, 2021

new: "A catalogue of 863 Rett-syndrome-causing MECP2 mutations and lessons learned from data integration"

Figure 2 from the paper.

Some things that affect our lives are too complex to be called a disease and called a syndrome. Or as this paper explains: "A disease may initially be called a syndrome to describe a collection of symptoms." Understanding the problem in enough detail that the collection of symptoms can be understood is often not trivial. Rett syndrome is one such situation, and mapping out all the aspects of it is needed for complex diseases to find a solution (cure or just something to address the resulting symptoms).

Rett syndrome has one gene strongly associated, MECP2, but not every change in this gene is a problem. And there are thousands of changes. Friederike Ehrhart just published work on creating a knowledge base of 10,968 MECP2 variants of which 863 are Rett causing (doi:10.1038/s41597-020-00794-7). That is a lot of biology.

However, because science is currently neither FAIR (enough) nor Open (enough) getting the knowledge together on these thousands of variants is a significant amount of work. That did not stop Ehrhart and co-authors. My contributions to this work are only minor: I know almost nothing about Rett syndrome. Instead, I contributed only to data structure and metadata standards, but if that helps others to get better insight into an disease or syndrome, I do such gladly.

Tuesday, December 29, 2020

Journal of Chemical Information and Modeling introduces new editorial guidelines around Open Science

    Alt Hybrid OA logo.    

This week, the Journal of Chemical Information and Modeling introduced new editorial guidelines around data and source code sharing. They basically do not change much around their open science policies: they continue want to support closed science. Well, there has to be a journal for that too, perhaps. So, what changed? 

The editorial is aimed at the needs of the reviewer. With that, it puts additional stress on that the review process as sole gatekeeper of the publication process. The two or three reviewers will now have new responsibility to assess the potentially temporarily access to the data and source code. As a reader, you have to trust that those reviewers actually reviewed the data and code sufficiently.

One very visible change is that articles will have a Data and Software Availability section from January 1 onward (when the new editorial policy kicks in). This is a section that BMC journals have had for a very long time. In fact, I am actually pondering of proposing an update for the Journal of Cheminformatics to change this. We need to move to proper data and software citations. Think DataCite.

Any step towards more Open Science is often a good step. This editorial is a good step. It is recognition for the people supporting Open Science in chemistry for the past twenty+ years. My once favorite journal now saying Open Science is to be encouraged is just awesome!

But we should be aware it is not an Open Science policy. It is quite different from the editorial standards of the Journal of Cheminformatics. While JCIM encourages open science, JCheminform expect it. The new editorial fits the hybrid open access nature of the journal.

Nevertheless, congratulations to the editorial team for this step towards Open Science!

Monday, December 28, 2020

21 Tips on how to sound #openscience

Jon, RIP.


One of the things around Open Science is how some think they can use the term. To me, when I was introduced to the term, back in 1999, it was from a USA-centric view that originated from and based on the ideas of open source software: You can find this back in literature. There is even earlier literature that uses the term in a more economic context tho.

And this USA community defined Open Science as something that provided rights to users: the right to use it (normally with some minimal restrictions), the right to modify it (for example, to curate the output), and to redistribute the result. Sounds pretty useful to me. In fact, I think this is the core of just doing science. This is where the slogan "Open Science is just science done right" comes from, I guess.

However, like any buzzword, it quickly gets picked up by, ummm, creative people that like to take benefit from the popularity of the term. Creative enough to brand themselves as Open Science. Well, fairly, they have been openly fighting against Open Science. That's open science too, right? </not>

So, to raise a bit of awareness of what is important to keep in mind when doing Open Science and to encourage equity among users, I wanted to highlight some of the creative uses of the term I have seen, each one of them not really open science, but they sound like that. I hope to achieve it makes you wonder next time: "Is that really Open Science? Am I indeed personally and actively included in the dissemination of this research output? Can I use this in my own work and share those results with others?"

Making this series was actually harder than I imagined. The misuse turns out to have some common patterns and I quickly ran into the notion that an earlier tweet already covered the essence. Anyone, please enjoy the tweets. You can jump to the opening tweet or use this conveniently unrolled thread by Thread Reader App.

Saturday, December 26, 2020

Bacting: Code coverage, JaCoCo, codecov.io, and API coverage

          A good bit of work to do.      

I have been fan of code coverage. When combined with (unit) testing, it indicates which code of your software has been run and therefore tested. Some 15 years ago, when I worked on making the Chemistry Development Kit code base more stable, I worked on various things: modularization, documentation, (unit) testing. I explored the option in Java. I even extended PMD with CDK-specific unit tests. And my StackOverflow question on JUnit test dependencies still gives me karma points :)

Fast forward to now. Routinely building software has become quite common place as is unit testing. The tools to support this have changed the field. And tools come and go. Travis-CI will became rare for open science projects, but where GitHub replaced SourceForge, GitHub Actions step it.

But I submitted a manuscript to the Journal of Open Source Software, to learn from their submission system (which is open and just plain awesome). One reviewer urged me to test the test coverage of my code and give me a pointer to JaCoCo and codecov.io. I am not sure if the CDK used JaCoCo in the past too, but getting all info on a website was not trivial, tho we got that done. Rajarshi may remember that. but with continuous building and codecov.io it is automatically available on a website. With every commit. Cool!

However, autumn had already started and I had plenty of project work to finish. But it is holiday now, and I could start working on the reviewer comments. It turned out the pointers were enough, and I got codecov.io working for Bacting. Not being tested with a test suite does not mean it is not tested at all. I use Bacting daily, and this use will only grow in the coming year.

That brings me to another reviewer question. How much of the Bioclipse 2 API does Bacting support. Now, that question is a bit tricky. There is the code Bioclipse 2.6 release (doi:10.1186/1471-2105-10-397), but there were a few dozen plugins with many more Bioclipse managers. So, I checked what managers I had locally checked out and created a GitHub Project for this with various columns. And for each manager I have (or want) in Bacting, I created an issue with checkboxes, one for each method to implement. And that looks like this:


I really hope the Maastricht University GitLab will become more user visible in the next year.

Tuesday, December 15, 2020

new: "Can an InChI for Nano Address the Need for a Simplified Representation of Complex Nanomaterials across Experimental and Nanoinformatics Studies?"

Table of Contents graphics.

I do not like questions as titles of articles: you either found an answer or not. In this case, the answer is likely a maybe. The paper milestones a discussion that started some years ago with the aim to create a standardized identifier for nanomaterials. Quite ambitious. I am happy to have been invited to contribute to this discussions and paper, but can hardly take much credit. The team dit an awesome job in capturing the complexities of representation of the chemistry of nanomaterials. Like polymers and mixtures, the chemistry is not as straightforward as organic chemicals (well, most of them anyway; hello five-coordinated carbon).

As visible in the table of contents graphics on the top right side of this blog post, and following habits of the InChI family, the NInChI draft has layers. That allows grouping of nanomaterials at different levels of details, pretty much like the InChI. Please do check out this article:

Lynch, I. et al. Can an InChI for Nano Address the Need for a Simplified Representation of Complex Nanomaterials across Experimental and Nanoinformatics Studies? Nanomaterials 2020, 10, 2493.

So, the next step is with the InChI Trust to respond to the core team. Here, I welcome comments too. For me, I will see if I can do something with the Chemistry Development Kit, if I can find the time :/