Thursday, March 24, 2011

Supplementary files, publishing, and standards #2

You must read this previous post first.

Now, it is important to realize there are standards at many levels. Open specifications allow people to implement the specification without having to pay fees, run into patents, etc. To me, an Open Specification is something you can take, modify, and propose to the community as new Standard.

Standards themselves are basically something orthogonal (IMHO, not uncriticized): if something is a standard is just the result of the community picking up and using the specification. Something doesn't have to be Open to be a standard, nor does it have to undergone year-long debates (like HTML5). A standard doesn't even have to be fixed to a version (they can be backwards compatible). There are therefore many kinds of standards, including de facto standards.

These distinctions are crucial. Another is that standards rarely cover everything. For example, it is ridicule to talk about Excel as a de facto standard in data exchange in science. Now, the Microsoft formats are 'Open Standards' (they sneaked in, when the world was complaining). But, they are standards at the wrong level for scientific computation: they define a standard container, and not any semantics. That's up to the user: "Hey, why should I waste a column on units... everyone knows we put in temperatures as Kelvin, ummm, Fahrenheit, ummm... Kelcius, or what is it again the eurotrash uses?"

This problem holds for very many fields, and you see it in many different formulation. Or the whole discussion about ScHTML and PDF, on what is more semantic. Or this one: "Oh, let's use a database so that our statisticians can do there work." Been there, done that.

Now, just to end a bit more positively, here's a group of bright, visionary people using the standards at the right levels in this spreadsheet:


  1. You mention Excel - we tried to avoid using it for our archives but couldn't find an open format for saving a Google Spreadsheets while retaining formulas and web service calls - do you know of any?

  2. No, that would make an interesting experiment... I guess the OpenDocument Format (OpenOffice) should capture some of it, if not all, but that depends not just on the format, but also on Google to implement the writer completely...

    More importantly... the MS Office Open XML format (.xslx, .docx, etc) ) is not that bad. It's not binary, it's XML-based, wrapped in a zip file. Anyone can make implementations, etc. The documentation is not perfect, and the format still allows for unspecified binary blobs, etc, but for your purposes might be 'Open' enough.