Saturday, October 25, 2014

The Web - What is the issue?

From Wikipedia.
Last week I gave an invited presentation in the nice library of the Royal Society of Chemistry, at the What's in a Name? The Unsung Heroes of Open Innovation: Nomenclature and Terminology meeting. I was asked to speak about HTML in this context, something I have worked with as channel for communication of scientific knowledge and data for almost 20 years know. Mostly in the area of small molecules, starting with the Dictionary of Organic Chemistry, which is interesting because I presented the web technologies behind this project also in London, October 10 years ago!

As a spoiler, the bottom line of my presentation is that we're not even using 10% of what the web technologies have to offer us. Slowly we are getting there, but too slow in my opinion. For some weird behavioral law, the larger the organization the less innovation gets done (some pointers).

Anyway, I only had 20 minutes, and in that time you cannot do justice to the web technologies.

Papers that I mention in these slides are given below.
Wiener, H. Structural determination of paraffin boiling points. Journal of the American Chemical Society 69, 17-20 (1947). URL
Murray-Rust, P., Rzepa, H. S., Williamson, M. J. & Willighagen, E. L. Chemical markup, XML, and the world wide web. 5. applications of chemical metadata in RSS aggregators. J Chem Inf Comput Sci 44, 462-469 (2004). URL
Rzepa, H. S., Murray-Rust, P. & Whitaker, B. J. The application of chemical multipurpose internet mail extensions (chemical MIME) internet standards to electronic mail and world wide web information exchange. J. Chem. Inf. Comput. Sci. 38, 976-982 (1998). URL
Willighagen, E. et al. Userscripts for the life sciences. BMC Bioinformatics 8, 487+ (2007). URL
Willighagen, E. L. & Brändle, M. P. Resource description framework technologies in chemistry. Journal of cheminformatics 3, 15+ (2011). URL

The history of the Woordenboek Organische Chemie

Chemistry students at the Radboud University in Nijmegen (then called the Catholic University of Nijmegen) got internet access in spring 1994. BTW, the catholic part only was reflected in the curriculum in that philosophy was an obligatory course. The internet access part meant a few things:
  1. xblast
  2. HTML and web servers
  3. email
Our university also had a campus-wide IT group that experimented with new technologies. So, many students had internet access via cable early on (though I do not remember when that got introduced).

During these years I was studying organic chemistry, and I started something to help me learn name reactions and trivial names. I realized that the knowledge base I had built up would be useful to others too, and hence I started the Woordenboek Organische Chemie (WOC). This project no longer exists, and is largely redundant with Wikipedia and other resources. The first public version goes back to 1996, but most of the history is lost, sadly.

Here are a few screenshots I have been able to dig up from the Internet Archive. A pretty recent version is from 2003 and this is what it looked like in those days:

The oldest version I have been able to dig up with from January 1998:

Originally, I started with specific HTML pages, but then quickly realized the importance of separating content from display. The first data format was a custom format which looks an awful lot like JSON but we later moved to the easier to work with XML. The sources are still available from SourceForge where we uploaded the data once we realized the importance of proper data licensing. This screenshot also shows that the website won Ralf Claessen's Chemistry Index award. That was in December 1997.

Unfortunately, I never published the website, which I should have because I realize each day how nice the technologies were we played with, but at least I got it mentioned in two papers. The first time was in the 2000 JChemPaint paper (doi:10.3390/50100093). JChemPaint at the time had functionality to download 2D chemical diagrams from the WOC using CAS registry numbers. The second time was in the CMLRSS paper where the WOC was one of the providers of a CMLRSS feed.

In 2004 I gave a presentation about which HTML technologies were being used in the WOC, also in London, almost 10 years ago! Darn, I should have thought of that, so that I could've mentioned that in my presentation this week! Here are the slides of back then:

Krause, S., Willighagen, E. L. & Steinbeck, C. JChemPaint - using the collaborative forces of the internet to develop a free editor for 2D chemical structures. Molecules 5, 93-98 (2000).
Murray-Rust, P., Rzepa, H. S., Williamson, M. J. & Willighagen, E. L. Chemical markup, XML, and the world wide web. 5. applications of chemical metadata in RSS aggregators. J Chem Inf Comput Sci 44, 462-469 (2004).

Friday, October 03, 2014

Jenkins-CI: automating lab processes

Our group organizes public Science Cafes where people from Maastricht University can see the research it is involved in. Yesterday it was my turn again, and I gave a presentation showing the BiGCaT and eNanoMapper Jenkins-CI installations (set up by Nuno) which I have been using for a variety of processes which Jenkins conveniently runs based on input it gets.

For example, I have it compile and run test suits for a variety of software projects (like the CDK, NanoJava), but also have it build R packages, and even daily run Andra Waagmeester's code to create RDF for WikiPathways. And the use of Jenkins-CI is not limited to dry lab processes: Ioannis Moutsatsos recently showed nice work at Novartis that uses Jenkins for high-throughput screening and data/image analysis.