Thursday, November 21, 2019

new preprint: "Wikidata as a FAIR knowledge graph for the life sciences"

Entity diagram of life sciences data in Wikidata.
I do not want to make it a habit to send out blog messages about preprints (you see them on your BiGCaT website) and wait with blogging until the article is formally published, but since I have mentioning this preprint to so many people know, it's likely worth checking out.

I'm happy to have been able to contribute to this story by Andrew Su's team and the many other people involved, as I do think Wikidata is a game changer. Our work lies in the corner of small compounds in Wikidata, as you will have been able to see from various presentations at conferences.

Some further posts about what I have been doing in Wikidata, related to this preprint:

Or just generally search for Wikidata in my blog, because there is a lot more to check up on.

Friday, November 01, 2019

Google Scholar has become a channel of spam

Apparently, some culprits have found a way to escape the otherwise impressive spam detection by Google, and managed to spam Google Scholar:

Now, Google Scholar has a long history of learning with a serious paper looks like, and I'm sure they will learn to recognize this kind of spam too. If not, it will be the end of Google Scholar, I'm afraid.

Monday, October 14, 2019

ChemCuration 2019 Poster Conference: Call for Posters

Twitter profile.
It giet oan! That it a Frisian phrase for something unlike is going to happen, like and particularly related to the Elfstedentocht.

ChemCuration 2019 is a go. The website is online, the Twitter account and hashtag are ready, we got a poster prize, and here is the call for posters!

    On December 3 the first ChemCuration conference will take place. ChemCuration 2019 is a one day, online-only conference around data curation and curated data in the chemistry domain. During the entire conference day, you can participate by tweeting about the poster that you uploaded, along with the meeting hashtag, and responding to questions about your poster in the 24 hours of the conference day. The poster must be available in an online repository (e.g. Zenodo or Figshare) under the CCZero, CC-BY or CC-BY-SA license prior to the conference.

    This is the meeting scope: anything around data curation and curated data of open science data in chemistry. This includes but is not limited to: 1. a new release of curated open data; 2. FAIR metadata around open data; and 3. open source tools for data curation.

    How do I participate in ChemCuration?
    You can participate in this online poster conference by presenting your poster on Twitter
    during the conference day. You do this by first archiving your poster via Figshare or Zenodo,
    with an open license (e.g. CCZero or CC-BY). Then, during the day you tweet an image of
    (part of) your digital poster with the #chemcur2019 hashtag, a short summary, and a link to
    your online poster with its DOI. The archived poster should be a regular A0 poster (WxH =
    841 x 1189 mm or 33.1 x 46.8 in)

    Do I need to register?
    Registration is not obligatory to participate. However, if you would like to be eligible for a poster prize, then registration is required, by Nov. 30th, 2019. The registration form is found at

    More information can be found on the website ( and on Twitter

Wednesday, October 09, 2019

ChemCuration: a small trick to fix the SMILES of glucuronides

Glucuronide functional group.
Now that the ChemCuration 2019 online poster conference is nearing, and my upcoming talks about chemistry in Wikidata (also needing curation), and the much longer process of curation of metabolite (-like) structures in WikiPathways, I decided that something I tweeted earlier this week is actually quite useful, and therefore something I should really write up in my lab notebook.

Glucuronide is an example (biological) functional group. And there are several databases that represent the stereochemistry now always correct. That is an interoperability (and thus FAIR) problem. Correcting this is not trivial, particularly if you have to redraw the same glucuronide group again and again.

So, not looking forward to that, I invested a bit of time to find a SMILES trick. What if I had a SMILES snippet that I could easily copy/paste and attach to the SMILES of the chemical structure it is attached to? Here goes.


I just realized that the original 3 I used can better be a 9, which is less likely to occur in the SMILES of the rest of the molecule. The period at the end is also deliberate. That way, I can just copy past the SMILES of the rest directly after that period. Then I get a disconnected structure, but I only have to put a 9 next to the atom that is binding to the glucuronide. So, let's see the R group is methane, I get:


Now, next stop: CoA and other common biological tags.