Monday, December 21, 2009

BlueObelisk StackExchange: summary of the first month

The Blue Obelisk StackExchange (BOx) has seen a relatively good start, but the number of questions is dropping. The average number of unique visits is about 23-30 now:

The number of registered users is not insignificant but also has not been growing much lately:

At the same time, the quality of the questions are high, and have real users questions:
The overall state is 37 questions with about 50 different tags:

To the make progress with BOx, we primarily need to promote it more as central point of entry to people who want to know what free tools they can use to perform there need, and to the people who want to contribute to ODOSOS cheminformatics, by pointing out the unsolved problems.

Saturday, December 19, 2009

December wrap up. X-mas holidays at last!

Wow, I just saw it has been 17 days since my last post already :( That's a new record, I think! A lot has happened actually, but I have not had time to write up things. Actually, I have still have SWAT4LS coverage left to do :(

Anyway, one of the things our group has been up to in the last two weeks, is writing a book to support of the free, online Pharmaceutical Bioinformatics course. The material includes a good deal of cheminformatics (molecular representation: chemical graph theory, 3D geometries, file formats, line notations, InChI), bioinformatics (sequence analysis), and statistics (PLS, PCA, proteochemometrics). All in light of drug discovery. Of course, we're using LaTeX, and I asked around here and there about related things. For example, on StackOverflow on educational book styles. But also on FriendFeed on tautomerism in relation to drug activity.

I also hacked up a Bioclipse plugin that allows me to convert a Bioclipse matrix resource into LaTeX source code, but that will not be part of the Bioclipse 2.2 release, as it requires quite some updating of the statistics functionality. BTW, the LaTeX plugin is hosted at Gitorious, which is an GitHub alternative, but does not seem to have post-commit hooks :(

Also, the Bioclipse2 paper "Bioclipse 2: A scriptable integration platform for the life sciences" has been published now in BMC Bioinformatics (DOI:10.1186/1471-2105-10-397)!

New student
I am also happy to have a second student starting in January, who will work primarily on an RDF version of the ChEMBL data. Her work will extend on the excellent work being done right now by Samuel on comparing Prolog with DL reasoning.

CDK Licensing
Another thing that required my attention was the problem brought up by Andew on licensing. There was considerable out-of-date problems with the statements the CDK makes on the license and copyright informations certain CDK modules use, and the implications that has on what the CDK project is required to do (e.g. link to source code of third party libraries) and for downstream CDK distributors, like the Debian and Ubuntu projects. For example, it became apparent that the Debian package cannot distribute the XML Schema of CML, which is CC-BY-ND which is not DSFG-compatible. A few bugs have been reported, and work is ongoing to fix the issues.

Wednesday, December 02, 2009

CDK 1.3.1: the changes

Two weeks ago, I released CDK 1.2.4. Anay reported fails with generating the JavaDoc from the packages, which I think I both fixed now; the uploaded packages on SourceForge include these fixes.

The 1.2.4 release was soon followed by 1.3.1. Unfortunately, uploading the packages to to SourceForge over 3G with Chrome did not work well, so only finished that today. CDK 1.3.1 is the second release in the development branch, and brings in new functionality but also API changes. Here are the changes since the 1.3.0 release:
  • Bumped version for 1.3.1 release c341095
  • Added some extra lines, hopefully fixing the conflicts all the time 6dab943
  • Fixed param name 743bad3
  • Updated the makefp3d target to work with the current build system bbb78ee
  • Set up a branch for the 1.2.4 release 4801d79
  • Fixes bug 2898399. Updates to the SMARTS parser to handle proper matching for explicit hydrogens (including H, 1H, 2H and 3H). SMARTSQueryVisitor updated to take into account different isotopes of H. Also updated unit tests to take into account proper H matching. Added a unit test to further check H matching. b67d76a
  • Added tests to match hydrogens 45a7f54
  • Fixed junior issue 1816529: Missing Java5 generics for atomContainers() Iterator 484619e
  • Reworked the tests for bug 2898032. Updated Javadocs for smiles generator 7f68b07
  • Added unit test to confirm and check for bug 2898032 924b563
  • Fixed junior issue 1802586: Misuse of assertTrue for tested strings 12bec4f
  • Made the AtomContainerPermutors IAtomContainer implementation independent 4748098
  • Updated UIT to handle single atom queries and added a unit test for bug 2888845. Also updated Javadocs to specifically note behavior of single atom queries dfb2805
  • Fixed the dist-large target: removed to no longer existing .libdepends after the log4j module patch 9dc13e3
  • Implemented instantiating custom loggers; example in the unit test class 2771eb9
  • Added the use of the SystemOutLoggingTool as back up acf5953
  • Added a ILoggerTool implementation for STDOUT 921447a
  • Dig up and updated the copyright history a3cc876
  • Factored out initialization of the tool, to allow reusing the code for other logger class names 2af5f24
  • Moved the log4j.jar depending LoggingTool into a separate module 112f64d
  • Introduces the ILoggingTool interface and a factory so that CDK code no longer needs to depend on LoggingTool which depends on Apache's Log4j library. c6c8d38
  • Added generation of java source jars e33fba2
  • Fixed matchers to allow XML without new lines (closes #2832835) f9a0552
  • Added unit tests for detection of PubChem XML files. 571f434
  • Fixed matchers to allow XML without new lines (closes #2832835) a1f25d8
  • Added unit tests for detection of PubChem XML files. 1cec794
  • Added reading of E/Z stereochemistry from double bonds in MDL V2000 molfiles. cb824f1
  • A minor fix to clean up a PDMD warning 024499e
  • Overwrite unit tests, because there are no change events passed around at all for the NoNotification interface implementations 36f295b
  • Added missing unit tests for IChemModel event propagation for the ICrystal field 2993e0c
  • Fixed propagation of change events to IChemModel when modifications are made in child IChemObjects 0c8a88f
  • Fixed unit tests: the IChemModel.setFoo(null) should actually give a change event on the listener of the IChemModel, and not after unregistering of the Foo object. b833176
  • Synchronized with the Blue Obelisk version a91062b
  • Added unit test to the function of the new IO setting to force 2D coordinate output. 4e2b2bf
  • Added writer IO option to force writing of 2D coordinates if 3D coordinates are present too, which now are preferably outputted. 0e6aa2c
  • Added unit test to verify that if 2D and 3D coordinates are available, the 3D coordinates are outputted. 56852f8
  • Changed IBond.get/setStereo() to use a IBond.Stereo enumeration instead of an int (fixes #2855850): 46893ed
  • Fixed Taglets: only return HTML if the Tag is really given; the toString() method is given for all cases, not just when the tag is found 1107fb2
  • Added the Mannhold LogP descriptor 1e6b6cd
  • Added the Mannhold LogP descriptor to the ontology a7adc9f
  • Fixeda bug which was causing various parts of the DescriptorEngine to fail - it was trying to instantiate a non-descriptor class which happens to reside in the descriptor package directory. This fix is a bit kludgy - ideally only descriptors should be in that directory 0242d9a
  • Fixes ClassCastException when not IMolecule 6f3e848
  • Upgraded to PMD 2.4.5 with many bug fixes, giving more accurate error reports f29a66b
  • Added missing dependency on cdk-diff, being used in one of the unit tests 0e287dd
  • Fixed methods names to match those in the test class 789a314
  • Fixed test method name to match the expected patters, fixing a coverage test fail ac13619
  • Removed duplicate code: MolecularFormulaTest now extends AbstractMolecularFormulaTest b8651c7
  • Fixed test method annotation to point to the right method bb7d341
  • Added missing @TestMethod annotation f6f759b
  • Added modules that were missing from the PMD testing 073e5ec
  • Added modules that were missing from the doccheck testing 10dc19c
  • Added reference to IUPAC documentation about stereochemistry visualization. 56adf23
  • Patch for bug 2843445. Aims to fix generation of NaN coordinates by SDG d1397fe
  • Added missing dependency introduced by the use of AbstractFingerprinterTest in test-standard. b26eb93
  • Updated the unit test classes for all IFingerprinter implementations to use the new AbstractFingerprinter class; a few unit tests actually fail 1989fa5
  • Extracted an AbstractFingerprinterTest with unit tests that should really apply to all IFingerprinter implementations 8bc42dc
  • Clean up of layout. 5f7cb53
  • Fix the unit test to not give a 'input must support mark' exception on some platforms, by wrapping the InputStream in a BufferedInputStream. 6f6f41e
  • Added missing dependencies 8759481
  • Added ioformats to modules to test 56289e2
  • Use StringBuilder to aggregate the field data, which gives an huge performance boost for SD file where multiline field data is found. df35f02
  • Use StringBuilder to aggregate the field data, which gives an huge performance boost for SD file where very much field data, like the ChEBI_complete.sdf eac8266
  • Factored out steps in reading the SD file data block 678e7ca
  • Bumped version, to make it clear this is not the 1.2.3 release 8c8166a
  • Bumped version, to make it clear this is not the 1.3.0 release eeda652
  • Fixed registering on the cdk.threadnonsage tag (closes #2796362) d451576
  • Removed obsolete pattern from old svnrev tag c8f5a72
  • Fixed JavaDoc to remove traces of the old svnrev Tag 1a70488
  • Synchronized exception message with implementation (fixes #2844333) c70b79c
  • Made class private again, per authors request fa7ba02
  • Any class will do, not just public, final and abstract dc9e8c5
  • Added ant task to calculate JavaNCSS code statistics a8b313e
  • Added JavaNCSS 32.53 (LGPL 3.0) 6753a8c
  • The Pauling Electronegativity is copied in configure as well. I can't see why not copy everything we have. 3fd2b17
  • Revert "added a test for bug 2831420": 2c2add6
  • Patch for bug 2843445. Aims to fix generation of NaN coordinates by SDG 963b0a7
  • added a test for bug 2831420 5d15222
  • added a test for bug #2831420 93536f0
  • Made InChIGeneratorFactory a singleton. 242da91
  • Layout. af4fac7
  • Added bug annotation 38d0235
  • test case for bug #2846213 f84c53b
  • Fixed perception of N.planar3 where N.sp2 was detected, by now taking into account the given hydrogen count. 1714de2
  • Fixed perception of benzene with all single bond, but hydrogen count 1 and bonds flagged aromatic. In this case, the type is C.sp2 not C.sp3. 05e0be3
  • Added assertions to unit test for values being not null 863b0a5
  • Added two unit tests for the same problem: carbon atom types are not correctly perceived if bond order info is SINGLE only, and hydrogen count and aromaticity flag is set. f19a451
  • Moved class into a org.openscience.cdk package, which seems to work now. I'm puzzled why it did not before. Solved several unit test fails. b055c6b
  • Unsealed the XOM jar to allow having the CustomSerializer 3b82340
  • Fixed Javadocs error e0304bf
  • Fixed a wrong javadoc tag. Also removed svn tag in the SMARTS parser JJT file, replaced with git tag c888773
  • Added support for 'public enum's 4bf822d
  • corrected bug in bondtools.isStereo(IAtomContainer container, IAtom stereoAtom). A comparision of atom symbols in a nested loop was using the counter of the outer loop twice. Note it worked before, because there is a sort of fallback to Morgan numbers. fallback to morgan (fixes #2830287) 025fb47
  • added a new test for bondtools 13f72bd
  • Fixed inconsistency between accepts() and write: also support writing of IAtomContainerSet and IAtomContainer as accepts() indicates (fixes #2827745) 6380578
  • General test for testing consistency between write() and accepts(), testing that all accepted IChemObject's can also be written f0678eb
  • Added unit test for bug #2826961: inconsistent atom typing for two SMILES. Unit test does not show a fail, ruling out a CDK bug 42e45ef
  • Remove erroneous throws statement f8cfea8
  • Bug found calculating the exact mass given a molecular formula when it is negative charged. 3d1de45
  • Fixed reading of the cdk/dict/data/elements.owl database which is now in OWL 73225a0
  • Fixed issue 2458210: use assertNotNull(foo) etc instead of assertTrue(foo != null). 182afe6
  • Added minimum equivalents for BondManipulator.getMaximumBondOrder() methods 6e12696
  • Fixes asserts: after removal *no* change should be recorded 3b9fa30
  • Added IO option to disable generator of XML declaration statements in the output CML. 74451b8
  • Added generics, and consistified code by always returning a List of the same '?'. (And some 80 chars fixes in the JavaDocs.) d6337cd
  • Added unit tests to test that when a [Molecule|Reaction|Ring]Set has been removed from a ChemModel, the ChemModel should unregister as listener. 63e6c01
  • Added unit tests for event propagation from [Molecule|Reaction|Ring]Sets to ChemModel. e011035
  • More testing of flags. abb5384
  • Fix for junior job id: [ 1837692 ] Test methods should throw only one Exception. 8c38536
  • Fixed missing imports and wrapped to 80 chars fd2d2df
  • Better excpetion handling in builder3d: bc5837d