Sunday, February 22, 2009

Solubility Data in Bioclipse #2: handling RDF

RDF is swiftly becoming the lingua franca of life sciences (see for example [1,2]). Bioclipse is an excellent platform to visualize results from analysis of the network, both for graph visualization (see [3]), as well of visualization of domain specific data types (e.g. sequences, molecules, ...).

Yesterday I uploaded a Bioclipse feature that adds a rdf manager to handle RDF content, which includes SPARQL support. The below snippet shows application to the solubility data [3]:

See also:
  1. One Billion Biochemical RDF Triples!
  2. RDF-ing molecular space
  3. Solubility Data in Bioclipse #1


  1. Egon,

    I keep hearing about RDF, but nowhere have I seen a compelling case for why I need to care about it.

    Why does the average person (non-informatician) need to care about RDF? Is there a video, graphic, or site concisely explaining the problem RDF solves and why RDF is the best solution?

  2. The average person does not really need to care about RDF any more than they need to care about HTML. If you want to share visual representations of your data on the Web, then you might begin care about HTML. If you want to share computationally useful representations of your data on the Web, then you might begin to care about RDF. If you are just a browser of information then it is not your concern.

    RDF is valuable infrastructure for distributing and sharing data between different applications automatically. By sharing it (or another pattern but it is what we have) as an upper-level data model it makes it much easier to integrate data generated by different sources. The extreme alternative to this idea is that everyone makes up their own non-interoperable equivalent and every time we try to merge data we have to write another parser and another custom hack to figure out how items in the different sources might fit together. (Which is exactly what bioinformaticians have had to do over and over again and is why they find RDF appealing.)

    If you don't believe infrastructure is valuable I suggest you avoid the use of trains, roads or road signs on your way to work tomorrow.

  3. Rich, RDF has little meaning to intradisciplinary research, with a well defined vocabulary, a few well defined data types, etc.

    Interdisciplinary who makes a huge difference: every discipline speaking the same language with: 1) the same grammar (RDF), 2) well-defined vocabularies (ontologies) and dictionaries (owl:sameAs).

    This allows linking QSAR activities to clinical records from hospital, in a meaningful way.

    Not sure if RDF is the best solution, but I think it is very powerful, and greatly complements what tools we have available right now.

    In the rest of this year, we'll see several really cool applications in chemistry... Rich, I can highly recommend making your vendor-oriented software RDF-capable. Please contact me offline, if you want to explore and chat about that direction.

  4. @Benjamin, I do get the part about RDF being useful as an information exchange infrastructure element. I just don't see why I should care.

    All of the descriptions about RDF I've seen so far have been, well, academic.

    Even in the earliest days of the Web, I could say to a complete novice: HTML is the code behind Web pages. (but not before Web browsers).

    Today I can say: RSS is how your blog posts get delivered to a much wider audience. (but not before blogs started to catch on).

    There may not be a non-academic use for RDF, which is OK. I'm just wondering if there's anything more to it than that.

    Which makes me think - either RDF is going nowhere, or it's just missing its natural application.


    "This allows linking QSAR activities to clinical records from hospital, in a meaningful way."

    This sounds like an interesting, specific example. In the simplest possible terms, how would it work (might make a good separate post). I'll take you up on your offer, BTW - thanks.