## Tuesday, October 11, 2011

### Blogs I Follow: Henry Rzepa

Just in case you have not run into Henry's blog yet, check it out. His blog makes me so jealous I did not follow up on my basic quantum chemistry education. Implementing Hartree-Fock in Fortran is not nearly as interesting or useful as the stuff he has been blogging about. A second reason you should, is his brilliant use of Jmol (Henry is one of those using Jmol for more than 10 years). This is his blog in action:

His double gravatar is not vanity, but a bug in one of blog extensions he is using creating those blue z icons. I think the French in the blog title is, while unlikely for British I always understood, deliberate.

Henry, if you're reading this... what about adding permalinks to those Jmol visualizations (or DOIs, like data cites)? Second, I'd love to be able to download to be able to download a visualization, and open it in Jmol or Bioclipse. Would such be possible in CML or JVXL (think datument)?

1. "Denouement" is the technical term for the conclusion in classic dramatic structure. We learned that term in my high school English class in the US, and its use here seems more a literary than French allusion.

2. Yes, the double Avatar is a bug! It seems to apply only to the latest post and seems to be due to the Chemicalize extension.

As for permalinks, a small number of the posts are in fact permanently archived using WebCite; http://www.webcitation.org/ It would be nice to have a WordPress extension that could do this routinely, but no such exists. Unfortunately, WebCite does not of itself allocate a DOI handle to this.

I do try to quote a DOI handle for the digital repository entry for a calculation. This would in principle allow for the entry to be scrapped by a script. See this slide via something like:

wget --quiet --no-check-certificate https://spectradspace.lib.imperial.ac.uk:8443/dspace/handle/10042/to-$T/logfile.out -O to-$T.log;

I also wrote this post which describes how data can be extracted from a Jmol instance. But that method depends on a human; to my knowledge there is no way of scripting this to happen automatically. However, Bob Hanson has achieved miracles with scripting Jmol, and it might be possible.

Clearly however, a mechanism for exposing better metadata declarations of the data contained in a blog post such that bots can acquire it without human intervention are needed. If they exist, do let me know!

3. Henry, I was more thinking about citations for the Jmol visualizations... but when you mention a repos for the calculations, you must be using Quixote, I guess... that would be a great combo and boost for Quixote!

Maybe we should talk to Sam and ask him how a Quixote page can easily be embedded in blog posts :)

Regarding the wget command... how would I learn what URL to use?

4. The first point; citations for the Jmol visualisations. Unfortunately, when Webcite archives a blog post, it does not replicate the Jmol functionality. That aspect of it is not active in the archive.

Second point. We use Dspace, and have done so for 6 years (see here). Quixote is quite new. To my knowledge it does not have Handle functionality, and in effect cannot be resolved using the same mechanisms as a DOI. Dspace has many faults, but it has been stable for us over this period, and we now have some 10,000 deposits using it. In our case, the Dspace repository is actually linked to the batch queues for our high performance computing cluster, which means the two talk to each other very effectively.

Third point. Dspace actually archives a whole collection of files. Thus see here for the files. These include an RDF declaration of appropriate metadata in the form of a METS manifest. In the wget command I listed, replace the string "logfile.out" by any of the other files you might want to acquire. These include a CML file, and possibly other useful information.

5. Ah, I did not know you had been using DSpace that active! Did you ever write up how you've integrated it into your workflow? Particularly the HPC platform?

6. An article got half written once! But I have now been asked by a forward looking publisher to write something called a data descriptor to help both prospective authors of research articles, and readers of journals, to understand the chemistry data life cycle better. Its almost finished, and it will be distributed under a CC 2.5 license. This contains something on the topic you allude to.

7. I'm looking forward to reading that!

8. You can read my take on a data-descriptor at www.ch.ic.ac.uk/rzepa/data-descriptors/.

It is not meant as a data schema, but more of a human readable data life cycle, linked to a real example. It could of course expand (possibly uncontrollably), but I would hope it could induce other chemists who create and share data to write something similar, as a way of improving the profile of molecular data amongst researchers, and not just amongst cheminformaticians.