Pages

Tuesday, February 24, 2015

Migrating to CDK 1.5: SSSR and two other ring sets

I arrived a bit earlier at the year 1 eNanoMapper meeting so that I could sit down with Nina Jealiazkova to work on migrating AMBIT2 to CDK 1.5. One of the big new things is the detection of ring systems, which John gave a major overhaul. Part of that is that the SSSRFinder is now retired into the legacy model. Myself, I haven't even gone into all the new goodies of this, but one of the things we needed to look at, is how to replace the use the SSSRFinder use for finding the SSSR set. So, I just had a check if I understood the replacement code, by using the SSSRFinderTest and replacing the code in those tests with the new code, and all seemed to work as I expect it to do.

So, here are a few examples of code snippets you need to replace. Previously, your code would have something like:

IRingSet ringSet = new SSSRFinder(someAtomContainer).findSSSR();

This code is replaced with:

IRingSet ringSet = Cycles.sssr(someAtomContainer).toRingSet();

Similarly, you could have had something like:

IRingSet ringSetEssential =
  new SSSRFinder(buckyball).findEssentialRings();
IRingSet ringSetRelevant =
  new SSSRFinder(buckyball).findRelevantRings();

This is changed to:

IRingSet ringSetEssential = Cycles.essential(buckyball).toRingSet();
IRingSet ringSetRelevant = Cycles.relevant(buckyball).toRingSet();

Sunday, January 11, 2015

Programming in the Life Sciences #21: 2014 Screenshots #1

December saw the end of this year's PRA3006 course (aka #mcspils). Time to blog some screenshots of the student projects. Like last year, the aim is to use the Open PHACTS API to collect data with ops.js and which should then be visualized in a HTML page, preferably with d3.js. This year, all projects reached that goal.

ACE inhibitors
The first team (Mischa-Alexander and Hamza) focused on the ACE inhibitors (type:"drug class") and the WP554 from WikiPathways. The use a tree structure to list inhibitors along with their activity:


The source code for this project is available under a MIT license.

Diabetes
The second team (Catherine and Moritz) looked at compounds hitting diabetes mellitus targets. They take advantage from the new disease API methods and first ask for all targets for the disease, and then query for all compounds. Mind you, the compounds are not filtered by activity, so it mostly shows interactions that real targets.


This product too is available with the MIT license.

Tuberculosis
The third project (Nadia en Loic) also goes from disease to targets and they looked at tuberculosis.

Asynchronous calls
If you know the ops.js, d3.js, and JavaScript a bit, you know that these projects are not trivial. The remote web service calls are made in an asynchronous manner: each call comes with a callback function that gets called when the server returns an answer, at some future point in time. Therefore, if you want to visualization, for example, compounds with activities against targets for a particular disease, you need two web service calls, with the second made in the callback function of the first call. Now, try to globally collect the data from that with JavaScript and HTML, and make sure to call the visualization call when all information is collected! But even without that, the students need to convert the returned web service answer into a format that d3.js can handle. In short: quite a challenge for six practical days!

Friday, January 09, 2015

"Royal Society of Chemistry grants journals access to Wikipedia Editors"

The Royal Society of Chemistry and Wikipedia just released an interesting press release:
    "The Royal Society of Chemistry has announced that it is donating 100 “RSC Gold” accounts – the complete portfolio of their journals and databases – to be used by Wikipedia editors who write about chemistry. The partnership is part of a wider collaboration between the Society’s members and staff, Wikimedia UK and the Wikimedia community. The collaboration is working to improve the coverage of chemistry-related topics on Wikipedia and its sister projects."
This leaves me with a lot of questions. I asked these in a comment awaiting moderation:
    Can you elaborate on the conditions? Is it limited to wikipedia.org or does it extend to other Wikimedia projects, like Wikidata? Does the agreement allow manual lookup of information only, or does it allow text mining on the literature as well as on the database? How should I put this in perspective with the UK law that allows text mining, and, in particular, can UK Wikipedia editors use text mining anyway, or is that restricted? Is there an overview of the details of what is allowed and not allowed, or a list of restrictions otherwise?
Details on how to apply to access can be found here.

Sunday, December 21, 2014

Data sharing

When you want to share data with others, the first thing you need to decide is under what terms. Do you want others be able just look at the data, change the format (Excel to CSV, for example)? Do you want this person to be allowed to share the data within his research group, or perhaps collaborators?

Very often this is arranged informally, in good faith, or by some consortium, confidentiality, or non-disclosure agreements. However, these approaches do not scale very well. When this matters, then data licenses are an alternative approach (not necessarily better!).

Madeleine Gray has written on her EUDat blog some words about the EUDAT License Wizard. This wizards talks you through the things you like to agree on, and in the end suggests a possible data license. It seems pretty well done, and the first few questions focus on an important aspect: do you even have the rights to change the license (i.e. you are the copyright owner).

Mind you, there are huge differences between countries around data copyright.


Saturday, December 06, 2014

Nature publications and the ReadCube: see but no touch. Or...

Altmetric.com score for the news
item by Van Noorden (see text).
The big (non-science) news this week was the announcement that papers from a wide selection of Nature Publishing Group (NPG) journals can now be shared allowing others without a subscription to read the paper (press release, news item). That is not Open Access to me (that requires the right to modify and redistribute modifications), but does remove the pay-wall and therefore speeding up dissemination. It depends on your perspective if this news is good or bad. I rather see more NPG journals go Open Access, like Nature Communications. But I have gotten used to the publishing industry moving slowly.

Thanks to Altmetric.com (a sister company of ReadCube) and the DOI we can easily find all discussion around the news item by Van Noorden (doi:10.1038/nature.2014.16460) in blogs. From the free, open dissemination of scientific knowledge ideal point of view the product is limited:
Of course, it is a free boon, and from that perspective this is a welcome move:
And it is a welcome move! We have all been arguing that there are so many people not able to access the closed-access papers, including politicians, SMEs, people from 3rd world countries, people with a deathly illness. These people can now access these Nature papers. Sort of. Because you still need a subscription to get one of these ReadCube links to read the paper without pay-wall. It lowers the barrier, but the barrier is not entirely gone yet.

Unless people with a subscription start sharing these links as they are invited too. At this moment it is very clear when and how these links can be shared. Now, this is an interesting approach. In most jurisdictions you are allowed to link to copyrighted material, but the user agreement (UA) can put additional restrictions. When and how this UA applies is unclear.

For example, Ross Mounce suggested to use PubMed Commons. Triggered by that idea, I experimented with the new offering and added a link on CiteULike. I asked if this is acceptable use, and it turned out to be unclear at this moment:
This tweet also shows the state of things, and I congratulate NPG with this experiment. Few established publishing companies seem willing to make these kind of significant steps! Changing the nature of your business is hard, and NPG trying to find a route to Open Science that doesn't kill the company is something I welcome, even if the road is still long!

A nice example of how long this road is, is the claim about "read-only". I think this is bogus. ReadCube has done a marvelous job at rewriting the PDF-viewer and manages to render a paper fully in JavaScript. That's worth a congratulations! It depends on strong hardware and a modern browser. Older browsers with older JavaScript engines will not be able to do the job (try opening a page with KDE's Konqueror). But clearly it is your browser that does the rendering. Therefore:
  1. you do download the paper
  2. you can copy the content (and locally save it)
  3. you can print the content
Because the content is sent to your browser. It only takes basic HTML/JavaScript skills to notice this. And the comment that you cannot print them?? Duh, never heard of screenshots?! Oh, and you will also notice a good amount of tracking information. I noted previously that it knows at least which institute originally shared the ReadCube link, possible in more detail. Others have concerns too.

Importantly, it is only a matter of time that someone clever and too much time (fortunately for ReadCube, scientists are too busy writing grant applications) that someone uses the techniques this ReadCube application uses to render it to something else, say a PDF, to make the printing process a bit easier.

Of course, the question is, will that be allowed. And that is what matters. In the next weeks we will learn what NPG allows us to do and, therefore, what their current position is about Open Science. These are exciting times!