+Mark Fortner commented on Google+ that my script was missing a @Grab statement. I had seen that mentioned before, but never looked at it. It turns out the be very useful, and it makes Groovy scripts standalone. That is, it will resolve the missing dependencies, using Maven repositories. Fortunately, CDK modules are available from repositories, e.g. the one at Plovdiv University, and I gave it a try.
2

The next CTR I picked is not particularly hard either, given the functionality provided by the CDK. In fact, the fingerprinting functionality I will use for this CTR is actually one of the most used and oldest features of the CDK. CiteULike has a list of 26 papers using the CDK fingerprinting functionality.
2

This one was relatively easy, and roughly based on the first CDK-JChemPaint tutorial. Key aspects are the SMILES parsing, 2D coordinate generation with the StructureDiagramGenerator. The solution does not render the structure's title yet. I have do not have a solution for that right now (the CDK may; I am not sure).
3

The first Chemistry Toolkit Rosetta task is to count the number of heavy atoms in the structures given in a MDL SD file. This tasks starts with an SD file and counts for each structure in the file the number of heavy atoms (non-hydrogen atoms). Because we simply handle the structures one by one, the solution uses the IteratingMDLReader reader. The input file (benzodiazepine.sdf.gz) is a gziped file, which we handle by using a GZIPInputStream.

The Chemistry Toolkit Rosetta wiki was set up some time ago by Andrew Dalke to demonstrate how certain basic cheminformatics tasks are done in the various cheminformatics toolkits around. I think it is a great idea, but never found enough time to do much with it, unfortunately. But it is holiday now, which is a time to take your mind of your work, and then some random hacking with the CDK is what I like to do.
1

Already the 7th edition of my Groovy Cheminformatics with the Chemistry Development Kit book (and PDF eBook). It has been almost two years since the first release and has grown from an initial 72 pages to 212 pages today. There is still a lot I still want to write about, but only during the holidays I have time for it. New content includes: Chapter 6. Reactions Chapter 7. From IChemObject to IChemFile Section 17.1.2. Stereoisomerism (in InChIs) Rewrote Chapter 20. Documentation Appendix D.1.

For some time I have been stealing an hour here and there for the rrdf package for R. This package is based on Apache Jena and allows reading and writing of RDF triples, as well as doing local and remote SPARQL querying. BTW, rrdf is not only R package to provide SPARQL functionality, and another package will be demoed at SWAT4LS.
2

The past few months has seen an increasing paper trail for our Open PHACTS projects. Lot's of cool stuff is ongoing, and more and more is getting openly available. There is a steep learning curve within the project on being Open, and the project makes sure it is done properly. But it takes time. With the Open Standards and Open Source getting out now, I think we have a reasonable start.

One day on, and still struggling with the chemistry behind gene regulation. Let no biologist ever tell me again not to use acronyms (yes, I am looking at you!). But it is interesting. I learned a lot about ChIP, histone modifications, etc, etc. This is an amazing world, where specific histone complex protein residues get methylated, acetylated, citrullinated, and phosphorylated.

I have started learning about epigenetics, and particularly the regulatory effects of DNA methylation and histone acetylation. It's cool, it's hot, it's everything we hope will explain genetics, because genes certainly did not.

The chemistry behind this involves interesting pathways, involves storage of information that passes from one generation to another... epigenetic effects down to the grandchild generation have repeatedly been shown now.
Loading