Pages

Saturday, October 25, 2014

The Web - What is the issue?

From Wikipedia.
Last week I gave an invited presentation in the nice library of the Royal Society of Chemistry, at the What's in a Name? The Unsung Heroes of Open Innovation: Nomenclature and Terminology meeting. I was asked to speak about HTML in this context, something I have worked with as channel for communication of scientific knowledge and data for almost 20 years know. Mostly in the area of small molecules, starting with the Dictionary of Organic Chemistry, which is interesting because I presented the web technologies behind this project also in London, October 10 years ago!

As a spoiler, the bottom line of my presentation is that we're not even using 10% of what the web technologies have to offer us. Slowly we are getting there, but too slow in my opinion. For some weird behavioral law, the larger the organization the less innovation gets done (some pointers).

Anyway, I only had 20 minutes, and in that time you cannot do justice to the web technologies.

Papers that I mention in these slides are given below.
Wiener, H. Structural determination of paraffin boiling points. Journal of the American Chemical Society 69, 17-20 (1947). URL http://dx.doi.org/10.1021/ja01193a005.
Murray-Rust, P., Rzepa, H. S., Williamson, M. J. & Willighagen, E. L. Chemical markup, XML, and the world wide web. 5. applications of chemical metadata in RSS aggregators. J Chem Inf Comput Sci 44, 462-469 (2004). URL http://repository.ubn.ru.nl/bitstream/2066/60101/1/60101.pdf.
Rzepa, H. S., Murray-Rust, P. & Whitaker, B. J. The application of chemical multipurpose internet mail extensions (chemical MIME) internet standards to electronic mail and world wide web information exchange. J. Chem. Inf. Comput. Sci. 38, 976-982 (1998). URL http://dx.doi.org/10.1021/ci9803233.
Willighagen, E. et al. Userscripts for the life sciences. BMC Bioinformatics 8, 487+ (2007). URL http://dx.doi.org/10.1186/1471-2105-8-487.
Willighagen, E. L. & Brändle, M. P. Resource description framework technologies in chemistry. Journal of cheminformatics 3, 15+ (2011). URL http://dx.doi.org/10.1186/1758-2946-3-15.

The history of the Woordenboek Organische Chemie

Chemistry students at the Radboud University in Nijmegen (then called the Catholic University of Nijmegen) got internet access in spring 1994. BTW, the catholic part only was reflected in the curriculum in that philosophy was an obligatory course. The internet access part meant a few things:
  1. xblast
  2. HTML and web servers
  3. email
Our university also had a campus-wide IT group that experimented with new technologies. So, many students had internet access via cable early on (though I do not remember when that got introduced).

During these years I was studying organic chemistry, and I started something to help me learn name reactions and trivial names. I realized that the knowledge base I had built up would be useful to others too, and hence I started the Woordenboek Organische Chemie (WOC). This project no longer exists, and is largely redundant with Wikipedia and other resources. The first public version goes back to 1996, but most of the history is lost, sadly.

Here are a few screenshots I have been able to dig up from the Internet Archive. A pretty recent version is from 2003 and this is what it looked like in those days:



The oldest version I have been able to dig up with from January 1998:



Originally, I started with specific HTML pages, but then quickly realized the importance of separating content from display. The first data format was a custom format which looks an awful lot like JSON but we later moved to the easier to work with XML. The sources are still available from SourceForge where we uploaded the data once we realized the importance of proper data licensing. This screenshot also shows that the website won Ralf Claessen's Chemistry Index award. That was in December 1997.

Unfortunately, I never published the website, which I should have because I realize each day how nice the technologies were we played with, but at least I got it mentioned in two papers. The first time was in the 2000 JChemPaint paper (doi:10.3390/50100093). JChemPaint at the time had functionality to download 2D chemical diagrams from the WOC using CAS registry numbers. The second time was in the CMLRSS paper where the WOC was one of the providers of a CMLRSS feed.

In 2004 I gave a presentation about which HTML technologies were being used in the WOC, also in London, almost 10 years ago! Darn, I should have thought of that, so that I could've mentioned that in my presentation this week! Here are the slides of back then:


Krause, S., Willighagen, E. L. & Steinbeck, C. JChemPaint - using the collaborative forces of the internet to develop a free editor for 2D chemical structures. Molecules 5, 93-98 (2000).
Murray-Rust, P., Rzepa, H. S., Williamson, M. J. & Willighagen, E. L. Chemical markup, XML, and the world wide web. 5. applications of chemical metadata in RSS aggregators. J Chem Inf Comput Sci 44, 462-469 (2004).

Friday, October 03, 2014

Jenkins-CI: automating lab processes

Our group organizes public Science Cafes where people from Maastricht University can see the research it is involved in. Yesterday it was my turn again, and I gave a presentation showing the BiGCaT and eNanoMapper Jenkins-CI installations (set up by Nuno) which I have been using for a variety of processes which Jenkins conveniently runs based on input it gets.

For example, I have it compile and run test suits for a variety of software projects (like the CDK, NanoJava), but also have it build R packages, and even daily run Andra Waagmeester's code to create RDF for WikiPathways. And the use of Jenkins-CI is not limited to dry lab processes: Ioannis Moutsatsos recently showed nice work at Novartis that uses Jenkins for high-throughput screening and data/image analysis.

Thursday, September 25, 2014

Slides at the Open PHACTS community workshop (June 26)

First MSP graduates.
It seems had not posted my slides yet of the presentation at the 6th Open PHACTS community workshop. At this meeting I gave an overview of the Programming in the Life Sciences course we give to 2nd and 3rd year students of the Maastricht Science Programme (MSP; some participants graduated this summer, see the photo on the right side).

This course will again be given this year, starting in about a month from now, and I am looking forward to all the cool apps the students come up with! Given that the Open PHACTS API has been extended with pathways and disease information, they will likely be even cooler than last year.


OpenTox Europe 2014 presentation: "Open PHACTS: solutions and the foundation"

CC-BY 2.0 by Dmitry Valberg.
Where the OpenTox Europe 2013 presentation focused on the technical layers of Open PHACTS, this presentation addressed a key knowledge management solution to scientific questions and the Open PHACTS Foundation. I stress here too, as in the slides, that the presentation is on behalf of the full consortium!

For the knowledge management, I think Open PHACTS did really interested work in the field of "identity" and am happy to have been involved in this [Brenninkmeijer2012]. The platform implementation is, furthermore, based on the BridgeDb platform, that originated in our group [VanIersel2010]. The slides outline the scientific issues addressed by this solution:



PS, many more Open PHACTS presentations are found here.

Brenninkmeijer, C. et al. Scientific lenses over linked data: An approach to support task specific views of the data. a vision. In Linked Science 2012 - Tackling Big Data (2012). URL http://ceur-ws.org/Vol-951/paper5.pdf.

van Iersel, M. et al. The BridgeDb framework: standardized access to gene, protein and metabolite identifier mapping services. BMC Bioinformatics 11, 5+ (2010). URL http://dx.doi.org/10.1186/1471-2105-11-5.