Pages

Saturday, May 21, 2022

DECIMER: a new chemical OCR kid on the block

When I started blogging this was in the first place to reduce the amount of time it took me to inform people about cool new open science. That became really hard: my group grew and the amount of open science nowadays is enormous. We were right in the late nineties that open science was the way forward Anyway, I am trying to blog more about noteworthy things again. I am about 2 months behind and have not even blogged about the CDK meeting in April. 

Today I want to highlight DECIMER. Steinbeck's group has been doing several awesome cheminformatics projects, and DECIMER is one of them (doi:10.1186/s13321-021-00538-8Scholia summary). At the CDK meeting we had fun using it for ACS Disclosures.

This morning I passed it a Compound Interest infographics image (about poisons in spring flowers). Because it has both structures and text, I converted it to a PDF first and then uploaded it to decimer.ai. It detected three out of the four chemicals and it struggled with the R-groups:


But since the model can be trained with more data, it is easy to where this is going.

No comments:

Post a Comment