Pages

Tuesday, October 23, 2012

#oaweek2012 : Open Science: examples in bioinformatics

My presentation I will give in about 75 minutes at Maastricht University as part of #oaweeknl.

Sunday, October 07, 2012

CDK 1.5.1: the changes, the authors, and the reviewers

Before you continue reading this post, please be aware that the CDK 1.5.x series is the not a new stable release, but the current unstable, development release, where all the API changes happen. For stable releases, only look at 1.4.x, such as the just released CDK 1.4.14.

The first alpha release from the cdk-1.5.x series (we do not have a special branch for that; the matching branch is master) was release 1.5.0 which was released in January!.

It took me some effort to remove all patches in the cdk-1.4.x branches, but I think the below list shows all changes since 1.5.0. And since that first alpha version, the changes in the releases 1.4.8 through 1.4.14 have been included. Therefore, you may also want to read the changelogs for 1.4.8, 1.4.9, 1.4.10, 1.4.11, 1.4.12, and 1.4.14.

Significant changes in this release include that IO settings do now use an enum, as well as a matching new implementation for IO readers and writers to handle settings (done by John). The getHillString() API has been improved, to make it more consistent with the matching getString(). The iterating SD file reader has been renamed from IteratingMDLReader to IteratingSDFReader, and the Elements static fields for the elements in the periodic table now use a final class called NaturalElement, independent from any data interfaces implementation. Daniel worked a lot of the 3D structure builder, improving the code significantly, and Jonathan revamped the fingerprint stack, and introduced two new interfaces, IBitFingerprint and ICountFingerprint, making the framework more uniform. On top of that, there is a new ShortestPathFingerprinter and new IO classes for the Mopac 7 input and output formats.

All in all, quite a lot, but that was to be expected after 8 months. Mind you, like 1.5.0 this release too shows an increased number of failing unit tests on Nightly. Nothing severe, so if you are in a development branch, with the first betas a few months from now, it may be tempting to migrate.

The changes
  • s/Molecule/AtomContainer/ to fix a compile issue with the port of the SDG patch to master 4d8be8a
  • Added the missing MDL molfile :( 96382dc
  • Set bond order 4 bonds to UNSET e65ef9e
  • Introducing a new flag: SINGLE_OR_DOUBLE 55b1494
  • Minor cleanup for the migration of the SINGLE_OR_DOUBLE code to master 3429e61
  • - Simplified getMaximumBondOrder for bonds - Added exception when both bond orders are unset - Added null check for getMaximumBondOrder - Added additional unit test for getMaximumBondOrder 7734cb4
  • Removed NoNotification (deprecated) from test cases - these were calling failures as the NoNotify module is no longer a dependency on cdk-core a7c54a5
  • Finds the location of double bonds. 6e04d3a
  • Added testing that the new AT properties are set e70786f
  • Refactored the atom types to have bonding patterns explicit 71929f3
  • Restore original bond order sum and max bond order properties, as descriptors are not allowed to change t he data structure 58cfaad
  • Additional testing, based on bond order information, now possible 8e00a75
  • Split the C.sp atom type into C.sp for (-C#) and C.allene for (=C=) 001f488
  • Introduced a more explicit way to define the number of connections and allowed bond orders 492a9cd
  • Added a few convenience methods to get the max bond order 4e45c7d
  • Marked the atoms, bonds and molecule with the SINGLE_OR_DOUBLE-flag if needed and tested it 6408139
  • Added a new flag: SINGLE_OR_DOUBLE 46b1e9e
  • Two more tests, reflecting the assumptions that the Number is different for different flags, and the same when the same flag is set ed7a3fc
  • Added another getFlagValue() method, here testing non-zero when a flag is set 96032d8
  • Added a test for a default 0 value of getFlagValue() 255e25c
  • Added missing tests, and particular one about a default 0 value of getFlagValue() 42db307
  • Added a new jar d7d9003
  • added ShortestPathFingerprinter with recommended changes Signed-off-by:Syed Asad Rahman ecfccea
  • added ShortestPathFingerprinter with recommended changes Signed-off-by:Syed Asad Rahman 29aabd3
  • added test case to test module Signed-off-by:Syed Asad Rahman ca4bd8d
  • added test case Signed-off-by:Syed Asad Rahman eff9e23
  • added code dependencies for the Shortest path fingerprinter Signed-off-by:Syed Asad Rahman b356d8c
  • added commons math library Signed-off-by:Syed Asad Rahman 5763df2
  • Implemented new flag storage on IChemObject implementations. Flags are now stored on a single numeric val ue (currently a short) and flags are accessed/mutated via bit shifting of this value. This implementation pro vides space saving over using a boolean array however getFlags() and setFlags() now have a overhead due to co nversion from the array to the numeric value. Usage however indicates the singular setFlag/getFlag is used >1 000 times where as the setFlags/getFlags is used ~50. 1a1b03d
  • Converted flags from incremental integers to bit masks 0352d30
  • Normalised all getFlag and setFlag usages to use the CDKConstants 2db7a94
  • Added unit testing for the dict module bac4e07
  • Removed unused class FixedSizeStack 3f37d8c
  • Removed local variable declarations d5c9a77
  • Replaced field declaration of Vector with List d796b4a
  • Removed output to STDOUT f97b493
  • Added a missing test, and updated a test for the getBitFingerprint method 47e2f1d
  • Point to the master branch on GitHub 56caf17
  • new junit test for @cdk.bug #3526295 d618d98
  • new test method added to test class 29366f0
  • Added the @cdk.module annotation and the class to the builder3d test suite 1ffe232
  • new builder3d template handler test class 21b9941
  • some import deletions to clean up test code dfb1e46
  • bug report and unit test 8dadc75
  • modified license header 6367939
  • Added the test classes to the appropriate test suites 5a27f24
  • new bug unit tests and bug references 5ae0bfe
  • new bug unit tests and bug references 069f06b
  • patch for @cdk.bug #3525144; see sourceforge for bug report 373d6ec
  • Patch for @cdk.bug 3523247 deebfab
  • Patch for @cdk.bug 3515122 by danielszisz 0b31c15
  • Added the new test class to the module test suite 428 623d
  • changed license header with correct author and email dd45524
  • Header and modifications to MMFF94BasedParameterSetReaderTest.java e39a647
  • New MMFFBasedParameterSetReaderTest 3a33765
  • Added another thing to ignore: SWITCH_TABLE stuff 376 28f0
  • Updated for the new FP api by Jonathan 8939f04
  • cleaned up imports 2abf5a1
  • changed variable name to all caps 2686a16
  • added missing dot in javadoc 1c7d281
  • whitespace fixes 227280e
  • Fixed JavaDoc fe466f1
  • fixed dependencies 2558411
  • Made test compile after rebase 1d839e0
  • s/Molecule/IAtomContainer/g 718fa78
  • updated to correspond to version of Junit jar f03d22c
  • added method for studying hashes in FP eafbd8c
  • made fingerprint Serializable e0a77e9
  • clean up imports c67a7ff
  • made 0 param constructor public 8bdb66d
  • constructor for creating the FP from an int array 3db 0a18
  • changed to IBitFingerprint f0e45bf
  • changed to IBitFingerprint b1ccac6
  • added method for getting array of set bits 73a1be8
  • support IBitFingerprint instead of BitSet 0a1257d
  • provided general from-IBitFingerprint-constructor b6e 92dc
  • use IntArrayFingerprint if fp to compare is that bce5 79c
  • fixed wrong index 91d0590
  • added equals and hashcode to the bit fingerprints 7ff a16e
  • tests and solution for bitfp tanimoto trouble 9b713be
  • let's use double for tanimoto coefficients 66a2e2a
  • fix for tanimoto method 2 + test c6e85e0
  • fixed cdk modules and deps f2b2067
  • Added second tanimoto method 2420a70
  • added merge method for count fingerprints f3f941f
  • tanimototests 0b41afa
  • cardinality should be an int 9891a77
  • light-weight binary version of signature fp 8597974
  • Introduced IBitFingerprint and ICountFingerprint 7004511
  • Also updated the test classes for the UIT change b8e6 e1e
  • Make it compatible with the new UniversalIsomorphismTester. 299ce67
  • Make it compatible with the new UniversalIsomorphismTester. 5ac4197
  • Make it compatible with the new UniversalIsomorphismTester. 7439b08
  • Make it compatible with the new UniversalIsomorphismTester. 0a70c86
  • Make it compatible with the new UniversalIsomorphismTester. 542b3b8
  • Make it compatible with the new UniversalIsomorphismTester. 3802b28
  • Make it compatible with the new UniversalIsomorphismTester. cb2c58e
  • Make it compatible with the new UniversalIsomorphismTester. 4bec0a4
  • Make it compatible with the new UniversalIsomorphismTester. 99cc828
  • Make it compatible with the new UniversalIsomorphismTester. 48ba7e3
  • Make it compatible with the new UniversalIsomorphismTester. 2dca75a
  • Modify to make UniversalIsomorphismTester usable in a threaded environment. Remove keyword static from variables start and timeout. Remove keyword static from methods isIsomorph, getIsomorphMap, getIsomorphAtomsMap , getIsomorphMaps, getSubgraphMaps, getSubgraphMap, getSubgraphAtomsMaps, getSubgraphAtomsMap, isSubgraph, getOverlaps, search, getMaximum, setTimeout. Added constructor. c241461
  • Fixed the same problem as in DefaultChemObjectBuilder (see commit 3d0b0e5f329e9256638ce18e4b5024e2d348474 a and 2a2aecc077add716309591e2fae9832dfcfc64cf) 510a753
  • Corrected method call in test 28b0629
  • Added Test[Class|Method] annotation 075d518
  • Added @link's to documentation 102bad6
  • spelling dc90dd2
  • Taking advantage of new has2DCoordinates behvaiour 32 fda31
  • Changed behaviour of has2DCoordinates to mirror has3DCoordinates 19be6ca
  • Revert "Patched bpol descriptor with cheminf ID" (fixes #3541366) 6a7674c
  • updated entry for bpol to be in sync with other descriptors 1be5396
  • Added a missing test method for getFingerPrint() 5916 23f
  • Double comparisons needs an epsilon d648b3e
  • Resynched the test method name e9555ce
  • Read the test MDL V2000 molfile with the appropriate reader dd39f24
  • Fixed the name of the test class 862f478
  • Added a missing dependency d98db46
  • Fixed a typo in test name (Whit -> With) 27cb6fc< /li>
  • Marked the getHillString() method as deprecated, because it is a duplicate of the getString() method. feebefe
  • Added the option to specify element order as a parameter of getHTML() 6df66fb
  • Added two more recent authors 76d32b9
  • Added a new author 1457148
  • Update for a patch from cdk-1.4.x: s/NoNotification/Silent/ 3179394
  • fixes #3525144; from danielszisz 6626a2d
  • Simple period put at the end of the comment line of JavaDoc of method getHybridisationState() 1cfe4c1
  • Simple JavaDoc @param for molecule added to method zmatrixChainToCartesian(). eca9603
  • This commit puts a simple period at the end of sentence 'Method assigns 3Dcoordinates...' at the JavaDoc of placeAliphaticHeavyChain() method. 1a03972
  • A closing period is given to the JavaDoc of the method markPlaced(). 7afb7c9
  • Another missing @param added to findHeavyAtomsInChain() method into line 80. 3238e26
  • Patches for @cdk.bug #3524092 and #3524093 ec6d92d
  • s/IMolecule/IAtomContainer/ fbfab7b
  • Return a default array instead of null, fixing NullPointerExceptions caused otherwise when instantiating via e.g. IAtom(Elements.CARBON) 19bb74a
  • Updated to not use IMolecule 53aa18c
  • Replaced IMolecule* by IAtomContainer* 50536d4
  • It turns out that Java creates an inner class for switch blocks, for which we do not require testing; the refore, this additional excemption 2026a25
  • Set the electron counts when a bond order is set 7c65 06c
  • Added a unit test that sets electron counts when a bond order is set 1faf986
  • Some minor changes to ForceFieldConfiguratorTest by danielszisz 4494af4
  • additions for AtomPlcer3D 36db127
  • Updated and corrected new AtomPlacer3D Test class by danielszisz ea2b55e
  • FurtherAtomPlacer3DTest corrections d83c937
  • New AtomPlacer3DTest class bc83829
  • TestClass annotation for AtomPlacer3D 08e6b58
  • Fixed 'dist-large' by removing non-existing include 7 351e8e
  • Missing annotations are now given to the ForceFieldConfigurator class after new test class has been added f7654cb
  • Use interfaces instead of implementations 829da6c
  • Now independent from the data interface implementation b2838d5
  • Moved to the pdb module 4bcc074
  • Removed redundant dependencies. 026d9d2
  • The qsaratomic module is now also independent from the data module 1a2e93a
  • Made qsarmolecular independent from data, by creating a new fragment module, allowing to not depend on extra, and cleaning up some code to not use data module classes 646c37b
  • Moved Elements to standard, making it independent from the data module, by introducing a read-only IElement implementation 941a459
  • Removed a non-existing dependency 546b737
  • updated ForceFieldConfiguratorTest with bug annotations d30874c
  • New ModelBuilder3D package test 07dbddb
  • Makes the reaction module independent from the data module a21ccc2
  • Added some checks for null specification references. Code is now more robust to ill-formed dictionary elements faa5404
  • Removed the unneeded catching of CDKException 9fbf219
  • Added a new dependency 0040b00
  • Added test case to check for UIT failure when matching symmetric query 8e67dca
  • Some minor modification to the code to make it a little more readable a23d689
  • Implemented new dynamic settings for all usages d5e71 8a
  • Dynamic Settings 175bb47
  • Use the interface instead of the implementation 6955c 25
  • Added a bit of missing test method annotation d9b1b08
  • Corrected JUnit version 751c1b9
  • Renamed the IteratingMDLReader to a IteratingSDFReader, matching the format name 08a06c0
  • Write a CHARGE command when the entity has a non-zero charge 5e81816
  • Added a unit test for charged entities b635613
  • Removed two static fields that are already provided by RingSizeComparator aec7fb3
  • Updated to CDK coding standards: 44764da
  • Updated to match the CDK hierarchy and classes; LEGO building block style: no processing, just writing 6391477
  • Added a Mopac 7.10 test file 4b57b57
  • Added code from Ambit2 for reading Mopac output files ee33e4e
  • Added code from Ambit2 for writing Mopac input files 04b687a
  • Added missing @TestMethod annotation, and two test methods cf7bf06
  • Added missing test classes for the POVRay, SVG, and CDKSourceCode formats 444af43
  • Fixed API: the setResourceFormat() should accept a IResourceFormat 9403d1c
  • Added missing TestMethod annotations for the matching methods 8392519
  • Added a missing and corrected two other test class annotations c11a1e3
  • Fixed extending the most specific test class 74a7dc4< /a>
  • Updated unit test for commit #3093241, where null's are always larger than an actual object ab8aa48
  • Removed two static fields that are already provided by RingSizeComparator 5e7772a
  • Very basic tests for the setWriter() methods (it cannot test if something is really written, as we do not know what objects are supported by a random reader; therefore, we just expect that no exception is thrown) 3cc4232
  • Use an enum, instead of ints 1cdd680
  • Updated the copyright range f878785
  • In Eclipse, use the Eclipse way to find JUnit db6b021
But while I was able to isolate the patches, the below two lists are based on all patches since 1.5.0, and thus include statistics on patches from the CDK 1.4.x series.

The authors

211  Egon Willighagen
 34  Daniel Szisz
 32  John May
 32  Jonathan Alvarsson
 17  Rajarshi Guha
 13  Arvid Berg
 13  Yap Chun Wei
  6  Syed Asad Rahman
  4  Klas Jönsson
  4  Tomas Pluskal
  3  Nina Jeliazkova
  3  Ralf Stephan
  3  Gilleain Torrance
  2  Kevin Lawson
  1  Jonty Lawson
  1  Stefan Kuhn
  1  Stephan Beisken

Yes, that is a respectable list of authors indeed! It also covers nine different countries, if not mistaken.

The reviewers

57  Egon Willighagen 
45  Rajarshi Guha 
25  John May 
 8  Nina Jeliazkova
 2  Arvid Berg 
 2  Ralf Stephan 
 1  Jonathan Alvarsson 

From now on, I will try to increase the frequency of 1.5.x releases to once every three months, or better.

CDK 1.4.14: the changes, the authors, and the reviewers

And here are the changes in CDK 1.4.14. Compared to 1.4.12/1.4.13 I think this release is much more interesting. For example, as of this release, we report details on the IO options for readers and writers automatically in the JavaDoc (see this post), it has improvements to the CML stack, and tetrahedral stereochemistry encoded with the ITetrahedralChirality interface is now reflected in generated InChIs.

Again, the number of changes is not that large, reflecting that we are really moving towards development in the master branch for the 1.5.x releases. The first alpha version was already released a while ago, and I will try to make a 1.5.1 release soon.

The changes
  • Added unit tests for two CML bugs - both use the same molecule to test - 3553328: Atoms missing explicit atomic number default to 1. - 3557907: Only support for bond stereo with attribute dictRef d953285
  • Implementing fix for bugs 3557907 and 3553328 3557907: Previously only the dictRef attribute of bondStereo was supported. This patch adds support for the 'content/text' of the bondStereo element to be set. This patch allows the bondStereo to be added from the charContent when the end of the element is detected. 3553328: Added support for CML files missing atomic number information. As the starting atom is a Hydrogen in the passer if no atomic number is provided the atomic number will default to '1'. This fix checks if the atom 'hasAtomicNumber' before the atom data is stored - if there is no atomic number specified but the symbol has been the atomic number is looked up in the periodic table (as per Atom constructor). 89ce74a
  • Added null check before input close. If the reader was created with a URL the input is never created and invoking '.close()' will throw a null pointer exception 345c0be
  • Implemented test for conversion of SMILES with a topological chiral centre. bd922f6
  • Added output of ITetrahedralChirality to InChI 3c0d36b
  • - Added properties for JVM arguments this allows us to switch on/off debugging and stdout via ant. This is useful as it can be seen from the run target debug had been commented out. The properties allow us to explicitly turn off debugging (on by default) - Used properties for junit-test, run-test and run targets - Added jarTestData as a required target before junit-test can be run 913c796
  • - StringBuilder instead of buffer - Separated the determination of the class name to a method - Separated the processing of the IOSetting to a method - Added default value as an extra column in the output - Added table headers for the output - Added closing - Replaced "/" with "File.Separator" - Added logging statements for null settings/no default constructors - Made toString(Tag[]) return empty string - Added @override/@inheritDoc (not really need) aed8a5d
  • Added @cdk.iooptions to all IO classes 086b8c8
  • Merged the toString(Tag) and expand(Tag) methods, and complete all output 1dc58bf
  • Added the dependencies to the taglet so that it can use reflection to extract the IO settings synamically; thanx to Stefan Ferstl, see http://stackoverflow.com/a/11819202/217943 2f3b542
  • A go at @cdk.ioopionts d8a5504
  • Make sure we continue with an IMolecule, fixing ClassCastException's later on 0015e6f
The authors
8  Egon Willighagen
6  John May

The reviewers
3  Egon Willighagen 
2  John May 
2  Ralf Stephan 

CDK 1.4.12: the changes, the authors, and the reviewers

I was just about to write up the changes of CDK 1.4.14 I uploaded to SourceForge last week, when I noticed that I forgot the blog the changes for CDK 1.4.13 and 1.4.12. Well, fortunately, those two releases are identical, caused by me fighting the SourceForge file system. So, first I will post the changes of that release then.

This release contains a few patches to improve the packaging on Debian, now reports the version too in the JavaDoc window title along with a few other JavaDoc fixes, adds the Co.plus atom type, has bug fixes for the MolecularFormulaManipulator, DebugChemObjectBuilder, and the NonotificationChemObjectBuilder, and add roundtripping of aromaticity in the CML format. All in all, unless you were affected by one of the fixed bugs, this release is not overly interesting.

But, also a big welcome to Tomáš Pluskal as a new CDK patch author!

The changes
  • Use properties for jar locations 13c63cd
  • Report the CDK version *and* the date 4e85157
  • Set the encoding (patch by Onkar Shinde ) 08b9d92
  • Removed output to STDOUT 4e0f4a3
  • Split out Doxygen generation, so that creating JavaDoc does not require Doxygen 7ac0705
  • Synchronized referral to vecmath to all list vecmath*.jar 0050e24
  • Refactored to have jar file names as properties and thus customizable, and split out development targets into devel.xml, reducing the dependencies for compiling the CDK (via the taskdef, it dependend on JavaNCSS too) 18d4cb2
  • Fixed pointing to the development libraries, using the same customizable approach as in the rest of the build.xml a22f151
  • Fixed the labeling, which was the wrong way around c7ac530
  • Added roundtripping of atom-based aromaticity (fixes #1709130) 56b8f9c
  • Throw an IllegalArgumentException instead of a NullPointerException when an atom is configured of an unknown element 75b58bf
  • Added a missing dependency 8a041b5
  • Throw an exception when the given atom is not found in the chemmodel (fixes #3530861) 04f686f
  • Fixed the @cdk.githash links dd9cdb8
  • Fixed the target to be called 99c2171
  • Fixed the module assignment 77d8bd6
  • Fixed the same problem as in DefaultChemObjectBuilder (see commit 3d0b0e5f329e9256638ce18e4b5024e2d348474a and 2a2aecc077add716309591e2fae9832dfcfc64cf) b5caeba
  • Added proper module assignment for RandomGenerator 57117c3
  • Added a note about atom types to be perceived before using this class (closes #3513957) 6a1100c
  • Updated for the bug fix in commit 37ca43cc4ea0f0a7efaf10edbbe60ef57a44e8ce 74ee663
  • Updated test for the bug fix in commit 37ca43cc4ea0f0a7efaf10edbbe60ef57a44e8ce to include the implicit hydrogens. 948f02f
  • Added Co.plus atom type 631c399
  • Added a unit test for the perception of Co.plus (bug #3529082) 232657a
  • Added a new author 32b1b33
  • Updated the README (fixes #3538451) 378ffa9
  • Added Nina's modifcations to ensure that getting a molecular formula includes implicit hydrogens. Addressed bug 2983334. Also updated a unit test to take into account that H's are being considered 37ca43c
  • Updated a method based on Ninas suggestion to avoid modifiying a atom container when looping over it. This avoids a concurrent modification exception. Also updated some Javadocs eba09ca
  • Change the getHTML method to return the elements in the Hill System order (bug #3432131). 3f3fab5
The authors
28  Egon Willighagen
  3  Rajarshi Guha
  1  John May
  1  Tomas Pluskal
The reviewers
5  Egon Willighagen 
3  Rajarshi  Guha 
3  John May 
1  Nina Jeliazkova 

As always, I made these overviews with these scripts.

What would you do with 5M euro?

Two weeks ago an experienced researcher asked me this question. I was speechless. I am not sure the other understood that I was more wondering where to start, with so many things I want to get done, or whether I had literally no idea how to spend that other than propagating my current post-doc position.

Thoughts that blasted through my mind in random order: make the CDK 1000x faster, finish the orbital development kit, make a chemically decent Connectivity Map, run weekly, untargetted metabolomics on my body fluids for one full year (and three months) - that is about where my opponent began wondering if I had any ideas, I think -, do the same for the plants in my garden, do nanoQSAR (and in fact, I am writing this up for 4 million as we speak), develop a linked data, CCZero PubChem/ChemSpider alternative, make a database with the original literature on the first 1M organic compounds ever discovered (starting with ureum)... seriously, I have many more ideas; don't get me started on doing boring things like studying a single disease; we have M.Sc. students for that.

Five million? Bring it on!

Saturday, October 06, 2012

LinkedChemistry.info now listed on identifiers.org as ChEMBL-RDF provider

As part of the Open PHACTS hackathon in Manchester last week, we working on further integration with the identifiers.org project at the EBI. There was a lot of talk with Nick Juty and Camille Laibe on integration with BridgeDB and the OPS Identify Mapping Service (IMS), but I also asked Nick to get LinkedChemistry.info listed as provider of information from ChEMBL using the ChEMBL-RDF data: