Sunday, November 16, 2014

Programming in the Life Sciences #18: Molecular weight distribution of compounds with measured activities against a target (and other examples)

Eating your own dog food is an rather useful concept in anything where a solution or product can change over time. This applies to science as much as programming. Even when we think things are static, they may not really be. This is often because we underestimate or are just ignorant against factors that influence the outcome. By repeatedly dogfooding, the expert will immediately recognize the effect of different influencing factors.

Examples? A politician that actually lives in a neighborhood where he develops policies for. A principle investigator that tries to reproduce an experiment himself from one of her/his postdocs or PhD students. And, of course, the programmer that should use his own libraries himself.

Dogfooding, however, is not the single solution to development; in fact, it can be easily integrated with other models. But it can serve as an early warning system, as the communication channels between you and yourself are typically much smaller than between you and the customer: citizen, peer reviewer, and user, following the above examples. Besides that, it also helps you better understand the things that is being developed, because you will see factors that influence in action and everything becomes more empirical, rather than just theoretical ("making money scarce is a good incentive for people to get of the couch", "but we have been using this experiment for years", "that situation in this source code will never be reached", etc).

And this also applies when teaching. So, you check the purity of the starting materials in your organic synthesis labs, and you check if your code examples still run. And you try things you have not done before, just to test the theory that if X is possible, Y should be possible too, because that is what you tell your students.

As an example, I told the "Programming in the Life Sciences" students that in literature researchers compare properties of actives and inactives. For example, the molecular weight. Just to get some idea of what data you are looking at, up to uses of things like Lipinski's Rule of Five. Therefore, I developed a HTML+JavaScript page using Ian Dunlop's excellent ops.js and the impressing d3.js library to use the Open PHACTS Application Programming Interface:

And compared to last year when only the source was available, all these examples can now be tested online on the following GitHub pages (using their brilliant gh_pages system):

  • Example 1: simple example where the Open PHACTS Identity Resolution System (name to identifier) system is used
  • Example 4: uses d3.js to show a bar plot of the number of times a particular unit is used to measure activities of paracetamol
  • Example 5: the same as example 3, but then as pie chart
  • Example 6: the above molecular weight example
Of course, what the students last year and probably this year will produce is much more impressive. And, of course, compared to full applications (I recommend browsing this list by the Open PHACTS Foundation), these are just mock ups, and they are. These examples are just like figures in a paper, making a specific point. But that is how these pages are used: as arguments to answer a biological question. In fact, and that is outside the scope of this course, just think of what you can do with this approach in terms of living research papers. Think Sweave!