Tuesday, February 05, 2008

Performance: C, C++, C#, Java, Perl and Python

Mathieu Fourment (et al.) just published a paper on some performance testing on 6 programming languages in BMC Bioinformatics: A comparison of common programming languages used in bioinformatics (doi:10.1186/1471-2105-9-82). The below figure is from the paper, for a sequence alignment exercise (copyright with paper authors, OpenAccess license of journal):

Nothing shocking, I'd say; Java is similar in performance to C++.

What I'd love to have seen, was the performance of compiled Java too, using the java compiler (gcj) which comes with GCC 4.1.1. No idea why that was left out. One could also question why they did no use the 1.6 JVM of Sun, which is more faster (see these results on running the CDK unit tests). And, a major omission is Fortran.

Anyway, the authors provide the source code, so we can easily test ourselves the effects of that.

BTW, first post? :) update: At least I beat Carlos.


  1. Let the grudge match begin! I predict that this will cause a flurry across the blogosphere. I need to read the paper more closely, but I think they probably didn't use Numpy, which potentially could speed things up for Python. Also there's Jython, whose performance would be interesting to see.

  2. I've seen a bit of their code, and it's littered with Java-isms that indicate they aren't familiar with the basics of idiomatic Python.

    Here's an example: in the file, += is used to join strings, files are iterated inefficiently, and no-op function calls are performed.

    Maybe Python is much slower than these other languages even in the best case, but I have a feeling the authors' programming fluency is being tested more than the language itself.

  3. Various bits of commentary on the biology-in-python list.

    Short version - what's the point of the paper? That one person with different skill levels in different languages has varying abilities? The programs for a given problem don't even do the same thing.

    The psyco version of one of the Python programs -- that's adding two lines -- is over 10 times faster and it's easy to make the line count shorter, the run time faster, or the memory use less (in some cases all three).

    My version - the reviewers should have sent this one back for a lot of improvements.

  4. Reading this paper made me feel like reading a comment marked 'troll' in .