Wednesday, January 19, 2011

Re: How can cancer research be open-sourced?

Mark asked on Quora on how can cancer research be open-sourced. So, far I found Quora to be rather noisy, even after signing up only to science related groups, themes, whatever it is called. However, every now and then there is an interesting question like this one.

The question resonated with discussions I had earlier this week. During Peter's Symposium the discussion was restarted on why publishing data in databases is currently not rewarded. I think the answer is really simple: there is no independent organization counting citation statistics. What if Thomson did not calculate citation counts and impact factors? Would we be using them to judge the careers of fellow scientists? If FooBar would calculate H-indices based on data citations would we ignore that? I hardly think so. However, FooBar does not exists, and FooBar is not getting rich because of its citation counts.

From a scientist point of perspective, we see people hold back data and source code, because releasing it reduces the time for the scientist to bring the idea to Nature and Science. Now, in cheminformatics this is hardly a problem, because Nature and Science do generally not recognize fundamental, methodological work from informatics and statistics, despite their now crucial role in many Nature and Science papers. However, for data this is different. By releasing your data Openly (think Panton Principles), you remove your intellectual property that gives you a nice list of co-author papers for your publication list long tail. Mind you, this is not an argument I make up here, but actual practice: "Sure you can use my data/method, but I like to be co-author on your paper then."

Why this is actual practice? Even a paper in the long tail is rewarding. "Wow, he has 250 papers!" As Rich nicely characterizes it: game theory.

So, what if we would replace the papers in that publication list long tail, by points for releasing Open Data and Open Source? I'm all in favor. And no worries about Handles and DOIs. Forget about them. We had Thomson calculate impact factors very long before we had DOIs.

My reply to Mark's question?

    First thing that needs to be changed is the academic reward system. At this moment, it is rewarding to hold back information, source code, etc. Because if you do, you make yourself more competitive with respect to publishing in high-ranked journals. Now, if we would reward releasing data into public (Open) databases, that would change. Likewise for software. The new journal is an attempt at changing this situation (disclaimer: I'm on the editorial board). Of course, there are many kind of rewards. BMC giving out awards for Open Data is another. Another important reward would be financial. If organizations, foundation, etc, would start giving out financial support for Open projects, that will be a great change too. We are starting to see this with a couple of national founding agencies in Europe to have dedicated funding for Open Access publishing

1 comment:

  1. The Cancer Commons might be of interest to you. It is comprised of different community stakeholders in cancer research (patients, doctors, researchers) sharing data on the web. The idea is that the commons will be used in real-time, allowing rapid dissemination of cutting-edge research. Currently, the Cancer Commons is only being rolled out for melanoma, but plans to add other types of cancer (particularly lung and breast, I believe) are in the works.

    This approach - which is basically an open data project, on a different kind of scale - will change cancer care. It represents a large shift in scientific practice: by aggregating treatment plans for different cancer types, we may be able to understand what drug combinations work best for specific cancer sub-types. Currently, many cancer treatment plans are "N of 1" - an oncologist prescribing an off-label cocktail of drugs to a late-stage cancer patient. Cancer Commons essentially aggregates those "N of 1" informal treatment plans in order to build more formal treatment plans for cancer sub-types.

    This is the type of model that relies on open data and should represent a strong incentive for researchers, who are largely concerned with rigorous testing and rapid dissemination of their findings. The Cancer Commons allows for both these things, cutting across institutional and geographic boundaries. It is a model that will hopefully change the way that cancer is treated.