Sunday, January 29, 2012

First month back in NL...

Moving country is exhausting. Living in a house full of boxes for a few weeks. Finding a house. Changing culture. Maybe it's a linguistic thing, but EU countries do not share the same culture. OK, we too have a McDonalds on every corner, but that's about it. But returning to The Netherlands was a cultural shock. A shock? Yes. I thought I knew the country I lived in most of my life.

Then, switching position. Posthopping (=post-doc here and there, attempting to find some local optimum where you both work on exiting things and try to set up a research group) around Europe (I have pension in four EU states now), while trying to keep writing papers and on top of that try to do something that in fact has impact on our science, means that every three months before the end of a post-doc position, and three months after you started the next, it's double work: finding your way around at the new university, while finishing those studies that almost were finished, in random, unpredictable order.

And, of course, being annoyed if your prime minister then claims he sometimes cannot get his work done in 40 hours. Well, one would actually think that a country in an economic crisis, with people eating up all their hard-worked-for saving just to get around, would do all his best to turn the future of the country around... oh well...

Sometimes I really wonder what I'm doing.

And then, in a spare hour here and there do something for myself. Like writing up this post, in an attempt to give all a place. Or finishing up a further paragraph of my book(let), or working on my contributions to the Pharmaceutical Bioinformatics book (molecular representation, semantic web for the life sciences). For my own Groovy Cheminformatics book(let): seventy more pages, and it's a book. Hard-cover, and I can start touring around Europe. BTW, I enjoy and can recommend reading Reinventing Discovery. Done the first 30 pages or so, and keep wondering how those examples can be scaled down to cheminformatics.

Sometime I really wonder why I keep working in an area that everyone just takes for granted and hardly cares about.

I'm tired, and this is slowly becoming a really boring and depressing blog post. That's a shame, because I have had a really great time in Roland Grafstr√∂m and Bengt Fadeel, working among and with one of the greatest, enthusiastic research teams I have seen around Europe. Having to leave that makes me sad too. In fact, I have never ever been homesick, and now going back to the country I grew up, I am homesick. Well, it's a feeling I don't like.

Weirdly, I have many really exciting ideas, research-wise, and my exciting daily work at BiGCaT, which is now in Open PHACTS, the network in The Netherlands, I have much to enjoy here. Yes, it is again hopping to another application area of cheminformatics, after interaction of cheminformatics and chemometrics (my thesis), more fundamental cheminformatics, metabolite identification, pharmaceutical research, toxicity, and now back to drug discovery but also the metabolome. But I love the complexity of the metabolome, and have so much detailed insight in the other fields now... oh, the endless possibilities!

And then I remember why I am doing this to myself.

All the endless possibilities! All the research we can do so much better than now is done! The more accurate answers we get, and actually be in a situation where we can start identifying limitations of cheminformatics! Ha, and you know I love to look beyond the edge of the world.

But, then I realize again that I need funding, and wonder how I can live my dream, if no one believes in it.

Not that I have been completely unsuccessful. Au contraire. I did get funding, for travel on many occasions, and recently small bits for research too. But I am really eager to get some funding to have research the ideas I have, rather than working on them myself. And eager to get a fixed position. Though I am grateful to Chris Evelo for offering the three-year position I am in now.

Next time someone starts talking about interdisciplinary research, get a trout out of your bag. Interdisciplinary research is a buzz word that only works when you already have a single-disciplinary fixed position. Advice to students: never start an interdisciplinary research topic. You will never be the expert people will want to fund, because interdisciplinary research can simply be done by single-discipline experts in a collaboration, and much better than you could, with your years of experience (n=1).

I also now realize that strengthening another project is also no good for your own career. Your hard work will just go to that project. You can contribute as much to some project as you like, but the corresponding Dr. Who will get the fame. No wonder people rename, brand, and use rather than collaborate. We desperately need #altmetrics.

Yes, I realize this applies to the CDK too. I am trying hard to get recognition with those who deserve it. But who reads a copyright statement. Who remembers blog posts with change logs and statistics on who did the work. Scientists in charge of funding remember only the top person.

Ha, you see that pattern applies the publishing too, right? Scientists only too often care more about the JIF of the top concept, the journal, than the actual work, your actual damn paper.

Oh well, fortunately it's almost Monday again, so that I can focus on science again, and don't have to think about these things.

And, I am deeply grateful to all that publicly support my output. A citation to one of my papers, a public review of my book, a new tool that makes stands on the shoulders of your work! That makes a difference!

Then I remember again why I am doing all this. I can make a difference.

Monday, January 16, 2012

CDK 1.5.0: the changes, the authors, and the reviewers

Yeah, I did it. I made the first new development release (1.5.0) for the CDK after the fork of the stable 1.4.x series. It had to happen after the removal of IMolecule and IMoleculeSet.  Well, in fact, while the list just lists all the patches specific for the current master branch, it is still fairly long. Then again, quite a few of my 'commits' are probably just merges.

Just to make clear, this is a development release, and until we freeze this branch, we expect, and actually intentionally add, API changes. Of course, we intent those to improve things, and please shout out your wishes.

First of all, this release removed IMolecule and IMoleculeSet. That was a big effort, explaining why Rajarshi and I have so many commits in this release. This release also adds the LINGO fingerprint type and a atomic signature-based fingerprint. It removes the nonotify module, as the silent module should be used instead. IChemObjectIO now extends Closeable, making it more Java7-friendly. Also noteworthy is the API for raw fingerprints, which are not of fixed length, but have key and counts. On the implementation side, the QueryAtomContainer now uses local, custom implementations of IAtom and IBond, making the module independent from the data module, freeing the rest of the library for more code clean up.

Mind you, this release shows an increased number of failing unit tests on Nightly.

The changes

  • This commit puts a single comment line for the molecule @param in placeAliphaticHeavyChain() method. 61377fb
  • Fixed JUnit version 41075f3
  • Upgraded PMD to 4.3 79e4532
  • Upgraded JUnit to 4.10 67a1f23
  • Updated the @TestMethod to match the unit test method renames that recently happened 0ad0fd0
  • Fixed comparison, to use the proper double assertEquals method 1ecc22b
  • Renamed the overriding methods to match the method rename of the methods in AbstractChemModelTest they are overriding. This is why one should use @Override. 1d34aac
  • Switched order of IRing and IAtomContainer matching, to fix construction if IRing classes 4741983
  • Removed IMolecule 2fc6b61
  • Replaced more IMolecule by IAtomContainer, fixing more unit test regressions 611aaca
  • A set of casting fixes to reduce the number of unit test regressions 2599bc2
  • Proper typing of the DefaultIteratingChemObjectReader, so that other classes can safely extend it (thanx to Nina) de67121
  • Removed some more usages of Molecule b84587f
  • Updated casting to remove usage of Molecule 04b1542
  • Typed the iterator, removing the need for casting when used 6e6cf8e
  • Two s/IMolecule/IAtomContainer/ patches to fix a few unit test regressions f6ed3d7
  • More Molecule use removed 0a04d3f
  • Update some return types to not use Molecule 1d8ee25
  • Removed field declarations of Molecule 88a2935
  • Removes instanceof and .class usage of Molecule 436dba5
  • Another batch of s/IMolecule/IAtomContainer/ 5da1d1b
  • Made SmilesParser independent from IMolecule 6bf4aa2
  • Got rid of debug and silent Molecule implementations and respective tests 9f05e5c
  • Refactored classes extendign Molecule to use AtomContainer 2ccbe95
  • Removed the last traces of the IMoleculeSet interface a8c03af
  • Got rid of MoleculeSet from the debug module 50aa11f
  • Got rid of MoleculeSet from the silent module 2250763
  • Updated IPolymer to extend IAtomContainer rather than IMolecule 5b7ad0f
  • cleaned up a bunch of MoleculeSet usages 150fee9
  • Updated various reaction classes to remove all usage of Molecule* 0a98a7f
  • Updated IReactionMechanism to use IAtomContainerSet. Also fixed some castings. 738eece
  • Convert IMolecule* to IAtomContainer* across many more classes c9847d2
  • Some IMolecule/IAtomContainer fixes, to get things to compile here c950448
  • Convert IMolecule* to IAtomContainer* across some charges and test classes f74a3ad
  • Removed the nonotify module (replaced by silent) 51af326
  • A bit more IMolecule removal d8d6eee
  • Fixed castings, to solve failing unit tests d273aa4
  • A quick fix. This class will soon be removed, so no peer review needed. 9e97f2d
  • Fixed a casting 3c1086a
  • Removed methods, to get the number of unit test fails down; this two classes will be remove completely, so skipping peer review 1bbfe5a
  • More modifications to use AtomContainer rather than Molecule f4647d8
  • Moved TetrahedralChirality from data to core. 46e3ddc
  • Replace IMolecule* with IAtomContainer* 27c6f7d
  • Updated io classes to use IAtomContainer* 68f2a65
  • Converted template handler to use atomcontainerset f0cf950
  • Refactored the setters from IReaction to use IAtomContainer* 04cbb29
  • Refactored the last getter from IReaction to use IAtomContainer* 91f8083
  • Made sure that we use IAtomContainerSet rather than casting back to Molecule* 104ecd1
  • Modified IReaction and related classes to migrate from IMoleculeSet to IAtomContainerSet 19b7673
  • Replaced IMolecule* with IAtomContainer* 7f6617d
  • Replaced IMolecule with IAtomContainer for some IReaction methods 3d3037f
  • Replaced use of addMolecule() with addAtomContainer() 2c1b31a
  • Replaced use of getMoleculeCount() with getAtomContainerCount() b27ee12
  • Replace use of IMoleculeSet by IAtomContainerSet in ConnectivityChecker.partitionIntoMolecules() 42861a9
  • Replaced IMoleculeSet with IAtomContainerSet, fixing a compile issue 2ac19f1
  • Patched bpol descriptor with cheminf ID a4db4a2
  • Make IChemObjectIO extend Closeable as suggested at a250cb3
  • Makes the smiles module independent from the data module a7920c1
  • More use of interfaces rather than implementations for the smiles module c3040e8
  • Removed non-existing dependency 3eaeaa7
  • Work with AtomContainers (fixes #2788763) c72d442
  • Allow IChemModels to contain IAtomContainerSets (addresses #2784940) 4cc5ff8
  • Replaces junit.framework.Assert from JUnit3 with org.junit.Assert from JUnit4 (fixing #2831081) 6ff1f78
  • Fix to permutation unit tests 45a7599
  • Split AtomContainerPermutorTest into one for the Atom and Bond permutors 40fe774
  • Unit tests for the Permutor class and fix to permutor. 88ceb38
  • ISBN number for the reference 217977c
  • Re-write of the atom container permutation classes. An extra class 'Permutor' is now the base class for permutations, and contains some more functionality than before. acd8e9b
  • Re-write of the atom container permutation classes 04adfca
  • Removed commented out code 16efbc0
  • Removed needless explicitly qualified vecmath types c1a69a4
  • Convenience method for getting an atom container directly from a formula string a1894f2
  • Handle a CDKException thrown by the Parser by throwing an IllegalArgumentException in the constructor face5c5
  • Removed hardcoded String use of InvPair constants in favor of their actual constants 4777958
  • Compile fix: get a builder from the input parameter 98f16f8
  • Removed dependency of io on data again df6ea2e
  • Corrected spelling from CanonicalLable to CanonicalLabel c0e59c8
  • Updated test case to note that SMARTS lexical errors now throw illegal argument exception dc60bc3
  • Added missing dependency fb4ba45
  • failing test for Tanimoto on "raw fingerprints" b42fbae
  • Implemented missing method, copied from Fingerprinter 2731d6f
  • Updated SMARTS parser JJT file, to drop the use of CDKException when a invalid SMARTS is provided. 1ef89fb
  • fix bug #3305550 - moved SMARTSQuery to where used dfb3e86
  • changed from no-op logger call to syserr 4c5beba
  • spelling ac6a9eb
  • Updated Copyright dates 51f2db8
  • Added another missing dependency 416a4cc
  • Added missing dependency 6fc4803
  • Split out unit tests only for fixed-length fingerprints 37e81e8
  • Added a signature-based fingerprinter 688da8b
  • MDL reader deals with query bond types, plus tests c95541f
  • QueryAtomContainer to allowe atoms and bonds 6a8f4c0
  • Another MOL file with query bond types b6bf3b5
  • CTFile query bond class b0227ba
  • MOL file with query bond types 3546ba4
  • Use DefaultChemObjectBuilder directly, not using getBuilder() method d0dc1ef
  • Added constructor for QueryChemObject to prevent nullpointers on flags property 6a75767
  • Cleaning up testing of the isomorphism module, e.g. classes in the proper test module bc1e3e6
  • Minor clean up: use interfaces, depend on Query implementaions, and now compiles without dependency on the data module 6913b9d
  • Reworked QueryAtomContainer to not depend on the data module either a5439dc
  • Use interfaces instead of implementations 076a870
  • More QueryBond 0e14c77
  • Extend QueryAtom instead of Atom 0ee08a6
  • Extend the new QueryChemObject b0a97a6
  • A shared IChemObject implementation for QueryAtom and QueryBond b98f650
  • Added constructors for QueryAtom 3708c8d
  • Cleaned up the code a bit 1654e94
  • Added an abstract QueryAtom class ae347ce
  • Match the IBond interface, as I get a compile error here otherwise 43cb829
  • Query bonds b7e8e3a
  • Placed the LingoFingerprinterTest in the rest module, and added missing dependency 16e456d
  • Fixed wrong module assignment 81db713
  • moved lingo fps to smiles module 1eb9590
  • Replaced the NotImplemented exception with UnsupportedOperation 6071796
  • Updated Javadocs 15be38b
  • Updated issues in Javadocs, copyright and method names dee515e
  • Added similarity method for Lingo's 133f314
  • Updated Tanimoto similarity to handle feature,count type fingerprints f487c30
  • Added implementation of LINGO 72bbae3
  • Added copyright and ref 9707f66
  • Added implementation of LINGO 572018f
  • Updated fingerprinter interface to support access to raw fingerprint of the form Map. Updated implementing classes b0f4050
  • Replaced the broken links with new ones (fixes #3108471) dc504db
The authors

89  Egon Willighagen
37  Rajarshi Guha
 9  Mark Rijnbeek
 8  Gilleain Torrance
 4  Jonathan Alvarsson
 1  Daniel Szisz
 1  John May
 1  Ola Spjuth
 1  Christoph Steinbeck

The reviewers

66 Rajarshi Guha 
30 Egon Willighagen
 4 Gilleain Torrance
 1 Jonathan Alvarsson
 1 Nina Jeliazkova

Sunday, January 15, 2012

Groovy Cheminformatics 4th edition

Six month was not quite the amount of time I anticipated between the third and fourth edition, but I finally managed to upload edition 1.4.7-0 of my Groovy Cheminformatics book. The first three editions sold 37 copies, including two for myself. Enough to feel supported and to continue working on it.

So, this new edition is again thicker, summing up to 152 pages now, which is 28 pages more than the 3rd edition. Indeed, the table of contents is more than half a page longer in itself, though, just barely, still fitting on four pages. In fact, I had to remove one (new) subsection title, because it would take otherwise two further pages.

The new content is again a mix of sections and chapters. While writing new chapters, I find myself realizing I need to cover more basics. Those get typically added as new sections. I did not get many feature requests, except for one email pointing me the text promised how to interpret and handle failing atom type perception, which explains one of the new sections. The full list of new content is:
  • Section 2.1.4: explaining the three flavors of atomic coordinates
  • Extended Section 2.2: added detail about electron counts of bonds (partly in reply to this post by Rich)
  • Chapter 5 "Protein and DNA": four pages, mostly about PDB files, and the matching CDK data structure
  • Chapter 6 "IChemObjectBuilders": four pages explaining the four alternative builders CDK 1.4.7 has
  • Section 7.8: a new section with recipes on how to post-process read input, discussing MDL molfiles only now. It talks about what information is present in the file format, and what steps must be untertaken to add missing information
  • Section 8.2.4 "No atom type perceived?!"
  • Section 11.4: describes how to depict aromatic rings
  • Section 11.5: describes how to change the background color of depictions
  • Section 13.4: explains how to calculate the Van der Waals volume of molecules
  • Section 18.1.3: discussing the API improvement in the iterating readers
  • Appendix C: a list of all descriptors provided by the CDK
  • Appendix D: a list of file formats known by the CDK, indicating which has readers and writers
On top of that, I improved other bits of the book too, such as the resolution of the depictions of molecules, as well as those of various diagrams. Also the number of scripts has seriously gone up, from 94 to 134!

Appendix C is a prelude to a chapter I am already writing, but did not get finished yet: a chapter about descriptor calculation. But since I just started a new post-doc position, it may take another six months for that chapter to make it into print.

The paperbak is available from, an on-demand publisher, as well as this ebook version.

Sunday, January 08, 2012

CDK-JChemPaint #10: background color

I found in the Groovy JChemPaint repository a script I had not blogged about yet, explaining how to change the default background color. It's fairly simple, and just uses parameters. Starting from the common pattern to set up a renderer, you set the background parameter:

backgroundColor = Color.lightGray;
model = renderer.getRenderer2DModel()

The full script can be found here. The resulting output looks like that given below.