Pages

Sunday, April 18, 2010

BitTorrents for Science

The idea has been lingering in the air for a long time now: sharing large science data sets using bittorrent. Over the past couple of years I have seen a lot of science related software being distributed over torrents, and the use in open source in general is abundant. Given a good network of so-called seeders, download times go down dramatically, and the overall energy consumption goes down too, as data has to follow a much shorter path.

It could very well be that the uptake of this technology for sharing data is only now coming about because only recently we started caring about Open Data licenses, which formally take care of rights of redistribution, which is obviously crucial to setting up a torrent network. Initiatives like the Panton Principles are changing this, even though we had a good deal of Open Source-licensed data for many years already.

So, with an increasing amount of Open Data, the time was now right, according to the authors Morgan and Jonathan, to set up BioTorrents, and publish a paper in PLoS ONE: BioTorrents: A File Sharing Service for Scientific Data (doi:10.1371/journal.pone.0010071). I have to admit, that I do not particularly like the design of the website, and I think it could do with more social web integration, but importantly, they provide a tracker. Trackers are key parts (well, they are being made obsolete, though I am not up-to-date with the state of that evolution), and work as a service discovery hub. Additionally, the website gives means to find data, and allow categorising torrents.

It is worth nothing that the uptake has been minimal so far, since the idea was posted last October. But it is slowly being picked up, or at least blogged about.

How to make BioTorrent work?
The success of BioTorrent will very much depend on the user base. This is common to social web applications, and a recent accidental loss of torrents is unforgivable; well, personally, I was happy to upload my torrent once more, but would not have done that if I had many torrents uploaded already. Torrent content is distributed, but the tracker information is not. Backup, backup, backup. Oh, and backup :) It happens to the best of us. Additionally, it is worth realising that the service needs to give something back to the user. Traditionally, I always thought this had the be of actual use, but a recent post by Rich actually suggested that even a game mechanic may be enough. Indeed, websites like ChemPedia.com and Blue Obelisk eXchange implement this by means of personal karma, allowing people to compete in high score lists. Also, APIs to integrate with other tools are crucial, such as personal RSS feeds to allow posting my new torrents to, for example, FriendFeed and Identica.

But, the by far most important feature for BioTorrents will be to set up a reliable network of seeders. I already mentioned this on FriendFeed where I suggested university libraries to get involved. Ideally, every library will act as seeder in torrent networks, so that typically, you will download the data directly from your local library, instead of the other end of the world. For data sets of GB size or larger, this is going to have an important environmental impact, on top of the much higher download speed.

Update: to make my message a bit more clear, please start uploading your torrents!

Langille, M., & Eisen, J. (2010). BioTorrents: A File Sharing Service for Scientific Data PLoS ONE, 5 (4) DOI: 10.1371/journal.pone.0010071

6 comments:

  1. Hi Egon,

    Thanks for the honest review of BioTorrents. Good in-depth feedback is definitely welcome. I think I agree with all of the points you have made, but I feel like I should respond to some of them.

    1)As you suggest the style of the website could probably be improved. I am not much of an artist and I must confess that my design abilities are a bit limited. If you have specific suggestions on how to make the site look better I would gladly make the changes.

    2)Backups are made daily now and the previous problem was fixed. It was an error that I am glad I made before BioTorrents manuscript came out.

    3)Yes, we need seeders. Institutions and libraries would be the most ideal seeders. However, I am surprised by the number of users that have shown interest in helping with seeding that are not even interested in the data.

    4)Growth since I first launched the site has been slow. However, I was waiting for the manuscript to come out before worrying about this too much. Indeed the number of registered users went from 60 to almost 500 in about 3 days, so I am hoping this burst in popularity will help with seeding and uptake of the technology.

    5)I think the biggest need is more people uploading torrents. I am working on setting up some pipelines to automatically add data from public repositories such as NCBI. However, the whole idea is that users help in creating torrents and not be just consumers. I think this is mostly a culture change in getting scientists to share data and also getting people used to sharing scientific data via bittorrent.

    6)More social web integration would be ideal. The RSS feeds have come a long way since I began and they are available for almost every slice of torrents on BioTorrents (including per user).

    7)Lastly, I feel I should note that this project has been solely developed by me and has been currently a side project to my regular research. I really don't want this to be an excuse for any shortcomings of BioTorrents, but rather that people understand that I don't have a dedicated team to improving all of the things I would like to. Hopefully, in the near future there may be a change to get additionally developers on the project either through grants or by opening the project up for others to contribute to.

    ReplyDelete
  2. Hej Morgan,

    ad 1: yeah, I know that feeling; I'm not much of an artist myself :)

    ad 2: indeed :)

    ad 3: I have set up a seeder at one of our servers, but will talk to Ola about covering more torrents.

    ad 4: that is a good growth. I surely hope it will grow even more over the next weeks. Let's hope they start uploading torrents too. I'll try to remember on writing a blog post on how to upload a torrent with ktorrent.

    ad 5: there was a thread this week on the Open Knowledge Foundation mailing list on using torrents for their CKAN packages.

    ad 6: happy to hear that! Should have checked that. Other integration is buttons for easy bookmarking to Delicious, and posting to sites like FriendFeed and Twitter.

    ad 7: can you please comment on how people can help you and your project?

    ReplyDelete
  3. Morgan,

    ad 6: what's the URL for account RSS feeds? I cannot access my account homepage without being logged in; does that mean I need to be logged in to see my personal RSS feed too?

    Otherwise, I updated the post to make clear that people should be starting to upload torrents :)

    ReplyDelete
  4. 6) You don't need to be logged in to get the RSS feed for user specific accounts.There is an RSS link from your profile (if you have uploaded a torrent), also if you click on a user's name under "upped by" on the browse page it will show only torrents uploaded by that user. The RSS button at the top of the page can then be used for that user's RSS feed.

    All URL options for the browse page (browse.php) work for the rss.php page so any combination of search terms, categories, licenses, etc. can be used with the RSS feed.

    7)As far as programming there isn't really a way yet. The project is not set up yet to be worked on as a collaboration (although it is in SVN), but I may set this up soon.

    ReplyDelete
  5. Hey guys: I'd love to help seed data. Shoot me an email at username mvl1014 and host gmail.com to explain how I can help.

    ReplyDelete
  6. Hej amoebemike,

    it basically consists of two steps:

    1. create a .torrent file for your data
    2. upload that .torrent file to biotorrent.net

    The last step is explained in this FAQ item.

    The first step requires a local BitTorrent client; most allow you to create .torrent files now. For example:

    KTorrent: File -> New
    µTorrent: File -> Create new Torrent

    This will create you a .torrent file, which you upload to biotorrent.net here.

    Alternatively, you can also very much help by mirroring existing torrents. This you do by keeping a BitTorrent client running on a fast network, and download as many torrents as possible, and keep the client running to become a seed for those torrents.

    Morgan has written up a technical tutorial on how this latter can be automated:

    http://www.biotorrents.net/forums.php?action=viewtopic&topicid=14

    ReplyDelete