For those who are not familiar with the CDK Literature series, the posts discuss recent literature that cites one of the two CDK papers (the first one is now Open Access). A short description explains what the paper is about and why the CDK is cited. For that I am using the CiTO, of which the data is available from CiteULike. That allows me to keep track how people are using the CDK, resulting, for example, in these wordles.
I will try to pick up this series again, but may be a bit more selective. The number of CDK citing papers has grown extensively, resulting in at least one new paper each week (indeed, not even close to the citation rate of DAVID). I aim at covering ~5 papers each week.
Ring perception
Ring perception has evolved in the CDK. Originally, there was the Figueras algorithm (doi:10.1021/ci960013p) implementation which was improved by Berger et al. (doi:10.1007/s00453-004-1098-x). Now, John May (the CDK release manager) has reworked the ring perception in the CDK, also introduction a new API which I covered recently. Also check John's blog.

Screening Assistant 2
A bit longer ago, Vincent Le Guilloux published the second version their Screening Assistant tool fo rmining large sets of compounds. The CDK is used for various purposes. The paper is already from 2012 (I am that much behind with this series) and the source code on SourceForge does not seem to have change much recently.
![]() |
Figure 2 of the paper (CC-BY) shows an overview of the Screening Assistant GUI. |

Using fingerprints for compound enrichment, i.e. finding the actives in a set of compounds, is a common cheminformatics application. This paper by Avram et al. introduces a new metric (eROCE). I will not go into details, which are best explained by the paper, but note that the CDK is used via PaDEL and that various descriptors and fingerprints are used. The data set they used to show the performance is one of close to 50 thousand inhibitors of ALDH1A1.

The International Chemical Identifier
It is only because Antony Williams advocated the importance of the InChI in this excellent slides that I list this paper again: I covered it here in more detail already. The paper describes work by Sam Adams to wrap the InChI library into a Java library, how it is integrated in the CDK, and how Bioclipse uses it. It does not formally cite the CDK, which now feels silly. Perhaps I did not add because of fear of self-citation? Who knows. Anyway, you find this paper cited on slide 30 in aforementioned presentation from Tony.

Predictive toxicology
Cheminformatics is a key tool in predictive toxicology. I starts with the assumption that compounds of similar structure, behave similarly when coming in contact with biological systems. This is a long-standing paradigm which turns out to be quite hard to use, but has not shown to be incorrect either. This paper proposes a new approach using Pareto points and used the CDK to calculate logP values for compounds. However, I cannot find which algorithm it is using to do so.

Cheminformatics in Python
ChemoPy is a tool to do cheminformatics in Python. This paper cites the CDK just as one of the tools available for cheminformatics. The tool is available from Google Code. It has not been migrated yet, but they still have about half a year to do so. Then again, given that there does not seem to have been activity since 2013, I recommend looking at Cinfony instead (doi:10.1186/1752-153X-2-24): exposed the CDK and is still maintained.

No comments:
Post a Comment