Saturday, April 13, 2019

Bioclipse on the command line

Screenshot of Bioclipse 2.
Over the past seven years there has been a lot of chemistry interoperability work I have done using Bioclipse (doi:10.1186/1471-2105-8-59, doi:10.1186/1471-2105-10-397). The code is based on Eclipse, which gives a great GUI experience, but also turned out hard to maintain. Possibly, that was because of a second neat feature, that you could plugin libraries into the Python, JavaScript, and Groovy scripting environment which allows people to automate things in Bioclipse. Over the course of time, so many libraries have been integrated, making so many scientific toolkit available at the tip of your fingers. Of the three programming languages, I have used Groovy the most, being close to the Java language, but with with a lot of syntactic goodies.

In fact, I have blogged about the scripts I wrote on my occasions and in 2015 I wrote up a few blog posts on how to install new extensions:

But publishing and installing new Bioclipse 2.6.2 extension remained complicated (installing Bioclipse itself it quite trivial). And that while the scripts are so useful, and I need others to start using them. I do not scale. Second, when I cite these scripts, they were too hard to use by reviewers and readers. To get some idea of a small subset of the functionality, read our book A lot of Bioclipse Scripting Language examples.

So, last x-mas I set out with the wish to be able to have others much more easily run my scripts and, second, be able to run them from the command line. To achieve that, installing and particularly publishing Bioclipse extensions had to become much easier. Maybe as easy of Groovy just Grab-bing the dependencies from the script itself. So, Bioclipse available from Maven Central, or so.

Of course, this approach would likely loose a lot of wonderful functionality, like the graphical UX, the plugin system, the language injection, and likely more. So, one important requirements was that any script using the command line must be identical to the script in Bioclipse itself. Well, with a few permissible exceptions: we are allowed to inject the Bioclipse managers manually.

Well, of course, I would not have been blogging this had I not succeeded to reach these goals in some way. Indeed, following up from a wonderful metaRbolomics meeting organized by de.NBI (~ ELIXIR Germany), and the powerful plans discussed with Emma Schymanski (and some ongoing work of persistent toxicants), and, fairly, actually not drowning in failed deadlines, just regularly way behind deadlines, and since I have a research line to run, I dived into hackmode. In some 14 hours, mostly in the evening hours of the past two days, I got a proof of principle up and running. The name is a reference to all the wonderful linguistic fun we had when I worked in Uppsala, thanks to Carl Mäsak, e.g. discussing the term Bioclipse Scripting Language and Perl 6.

It is not available yet from Maven Central, so there is a manual mvn clean install involved at this moment, but after that (the command installs it in your local Maven repository which will be recognized by Groovy), you can get started with something like (I marked in blue to extra sugar needed on the command line; the black code runs as is in Bioclipse 2.6.2):


workspaceRoot = "."
def cdk = new net.bioclipse.managers.CDKManager(workspaceRoot);

list = cdk.createMoleculeList()
println list
println cdk.fromSMILES("COC")

What now?
In the future, once it is available on Maven Central, you will be able to skip the local install command, and @Grab will just fetch things from that online repository. I will be tagging version 0.0.1 today, as I got my important script running that takes one or more SMILES strings, checks Wikidata, and makes QuickStatements to add missing chemicals. The first time you've (maybe) seen that, was three years ago, in this blog post.

You may wonder: why?? I asked myself the same thing, but there are a few things over the past 24 hours that I could answer and which may sketch where this is going:

  1. that BSL book can actually show running the code and show the output in the book, just like with my CDK book;
  2. maybe we can use Bioclipse managers in Nextflow;
  3. Bioclipse offers interoperability layers, allowing me to pass a chemical structure from one Java library to another (e.g. from the CDK to Jmol to JOELib);
  4. it allows me to update library versions without having to rebuild a full new Bioclipse stack (I'm already technically unable, let alone timewise unable);
  5. I can start sharing Bioclipse scripts with articles that people can actually run; and,
  6. all scripts are compatible, and all extensions I make can be easily copied into the main Bioclipse repository, if there ever will be a next major Bioclipse version (seems unlike now).

Now, it's just being patient and migrating manager by manager. It may be possible to use the the existing manager code, but that comes with so much language injection, that I decided to just take advantage of Open Science and just copy/paste the code. Most of the code is the same, minus progress monitors, and replacing Eclipse IFile code with regular Java code. But there are tons of managers, and reaching even 50% coverage will take, at the speed I can offer, months. Therefore, I'll focus on scripts I share with others, focus on reuse and reproducibility.

More soon!

1 comment:

  1. Cool stuff, and here I thought Bioclipse was dead :)