Pages

Sunday, August 11, 2019

Structure of colibactin elucidated

Structure of colibactin.
Structure elucidation is still a thing. C&EN reported yesterday that a team has published the structure of colibactin (doi:10.1126/science.aax2685), previously not known, despite the major human health impact (cancer). Now, since the article did not seem to have a SMILES, InChI, InChIKey, or even an IUPAC name, I hope I redrew it correctly (see right). The manuscript and supplementary information is, btw, massive in experimental data. Sadly, little of that is FAIR :(

And because there is no open source IUPAC name generator, I cannot provide that either. But I've submitted the structure to PubChem, so hopefully we have the IUPAC name soon.

In the past I would have provided this info in my blog, but we now have Wikidata and Scholia. So, I created a new Wikidata item for the structure, with some initial info, like SMILES, InChI, and InChIKey (using Bacting, of course):


The new publication does not seem to provide experimental physchem properties of colibactin, but before reading the article in detail, I get the impression they simply do not get to synthesize enough of the compound to do such measurements. They do provide NMR and MS data, though. A lot.

Colibactin is one of those compounds a lot was already known about the biology, and there are some 42 articles in Wikidata that discuss the compound and its biological properties, and I linked them to the new item for the compound, and did some additional annotation, giving this nice Scholia page with this topic graph:


Sunday, August 04, 2019

Contributing to Climate Research?

As a chemist/biologist, my day-to-day work is not really related to climate research. Yet, the effects of the crisis are, of course. I have been pondering how I could contribute my small bits. And after some weeks, I realized that I could repurpose the Zika Corpus idea developed by Daniel Mietchen. And, of course, then there is our Scholia project, where annotation of research articles are visualized. So, given that the climate crisis is a truly global problem, I continued what others had started before me: annotating climate research articles with the region or location they are associated with. That way, you can look up the effects of the climate crisis in your own region.

Mind you, most literature is not annotated with main subject yet, let alone country. But that's at least something I can do (along with taking the train as often as possible, to replace the airplane). And you can join: here's the list of climate change articles without (additional) subject annotation. Another interesting annotation you can do: species.

Europe




Africa (part of it; it's a huge continent!)



U.S.A.



Nanoinformatics page in Wikipedia

This spring I contributed to a joined project, coordinated by the NanoWG, to write a Wikipedia article about nanoinformatics (funded by NanoCommons). I dived into digging up the history of the term nanoinformatics, and isolate a few early sources where the terms was first used, coined if you like. At the same time, the page needed to give an encyclopedic summary of the research field. Thanks to everyone who contributed, in particularly John, Mark, and Fred!


I think we succeeded quite well, and the page has become a rich source, tho far from extensive, of literature. If you want a longer list of nanoinformatics literature, then perhaps check out the Scholia page about nanoinformatics (and notice the RSS feed, to get informed about new nanoinformatics articles):


Saturday, July 13, 2019

Standing on the shoulders: but the shoulders are 200 years old

"Houston, we have a problem. We're standing on the shoulders of old scholars, but it feels a bit shaky."

Well, no wonder. While rocket science has clear foundations, the physical laws of nature, for many other research fields it's trickier. We rely on hundreds of years of knowledge and assume (not trust) that work to be true. And that knowledge is seemingly disappearing very fast (remember my graveyard of chemical literature observation). Published literature, generally, is too hard to reproduce to be seen as an accurate capture of research history. In other words, these shoulders are 200 years old, and our support is failing. 

Open Science attempts to overcome these issues. It defines an environment where all research output is important, where every one has access to shoulders, and trust can be replaced by reproducibility. This is a huge transition, ongoing for some 20 years now.

With my work as one of the two Editors-in-Chief of the Journal of Cheminformatics, I try to contribute to making this happy, sooner than later. It's not been an easy ride, and there is so much left to do. And I do not always agree well with the effort put in by Springer Nature here, as clear from this reply.

Figure 1 from the latest editorial.
But I am happy to work with Rajarshi, Nina, Matthew, and Samuel to supporting the Open Science community in chemistry, for example, by allowing publications that describe a piece open source cheminformatics of software (Software article type). We're limited by what BioMedCentral can offer us, but within that context try to make a change.

The journal now exists 10 years, as marked by our latest editorial. We here describe our adoption of GitHub as a free, extra service, where we fork source code published in our journal, and announce our adoption of the obligatory ORCID for all authors.

These things bring me back to those shoulders. The full adoption of the ORCID allows research to be more easily found (more FAIR) and the copying of the source code aims at making the shoulders on which future cheminformatics stands more solid. Minor steps. But even minor steps matter.

Let's see where our journals takes open science cheminformatics.

Oh, and since you are reading this, I would love to see the American Chemical Society be more open to Open Science too. Please join me in requesting them to join the Initiative for Open Citations.

Saturday, June 22, 2019

Bacting: Bioclipse from the command line

Source. Wikiepdia. Public Domain.
Because more and more cheminformatics I do is with Bioclipse scripts (see doi:10.1186/1471-2105-10-397) and that Bioclipse is currently unmaintained and has become hard to install, I decided to take the plunge and rewrite some stuff so that I could run the scripts from the command line. I wrote up the first release back in April.

Today, I release Bacting 0.0.5 (doi:10.5281/zenodo.3252486) which is the first release you can download from one of the main Maven repositories. I'm still far from a Maven or Grapes expert, but at least you can use Bacting now like this without actually having to download and compile the source code locally first:

@GrabResolver(
  name='ossrh',
  root='https://oss.sonatype.org/content/groups/public'
)
@Grab(
  group='io.github.egonw.bacting',
  module='managers-cdk',
  version='0.0.5'
)

workspaceRoot = "."
cdk = new net.bioclipse.managers.CDKManager(workspaceRoot);

println cdk.fromSMILES("CCO")

If you have been using Bacting before, then please note the change in groupId. If you want to check out all functionality, have a look at the changelogs of the releases.

If you want to cite Bacting, please cite the Bioclipse 2 paper and for the version release, follow the instructions on Zenodo. Pending an article. The Journal of Open Source Software? Sounds like a good idea!