Pages

Sunday, August 30, 2015

Pimped website: HTML5, still with RDFa, restructuring and a slidebar!

My son did some HTML, CSS, JavaScript, and jQuery courses at Codecademy recently. Good for me: he pimped my personal website:


Of course, he used GitHub and pull requests (he had been using git for a few years already). His work:

  • fixed the columns to properly resize
  • added a section with my latest tweets
  • added menus for easier navigating the information
  • made section fold and unfold (most are now folded by default)
  • added a slide bar, which I use to highlight some recent output
Myself, I upgraded the website to HTML5. It used to be XHTML, but it seems XHTML+RDFa is not really established yet; or, at least, there is no good validator. So, it's now HTML5+RDFa (validation report; currently one bug). Furthermore, I updated the content and gave the first few collaborators ORCID ids, which are now linked as owl:sameAs in the RDF to the foaf:Person (RDF triples extracted from this page).

Linking papers to database to papers: PubMed Commons and Ferret.ai

I argued earlier this year (doi:10.5281/zenodo.17892) in the Journal of Brief Ideas that measuring reuse of data and/or results in databases is a good measure of impact of that research. Who knows, it may even beat the citation count, which does not measure quality or correctness of data (e.g. you may cite a paper because you disagree with the content; I have long and still am advocating the Citation Typing Ontology).

But making the link between databases and papers is not only benefiting measuring reuse, it is also just critical for doing research. Without clear links, finding answers is hard. I experience that myself frequently, and so do others, like Christopher Southan, and it puzzles me that so few people worry about this. Of course, databases do a good part of linking, but only if they expose an API (still rare, but upcoming), it is hard to use these links. PubMed Commons can be used to link to (machine readable) version of data in a paper. See, for example, these four comments by me.

Better is when the database provides an API. And that is used by Ferret. I have no idea where this project is going to; it does not seem Open Source, I am not entirely sure how the implemented the history, but the idea is interesting. Not novel, as UtopiaDocs does a similar thing. Difference is, Ferret is not a PDF reader, but works directly in your Chrome browser. That makes it more powerful, but also more scary, which is why it is critical they send a clear message about any involvement of Ferret servers, or if everything is done locally (otherwise they can forget about (pharma) company uptake, and they'd have a hard time restoring trust). That said, there privacy policy document is already quite informative!

Last week, I asked them about their tool and if it was hard to add databases, as that is one thing Ferret does: if you open it up for a paper, it will show the databases that cite that paper (and thus likely have information or data from that paper, e.g. supplementary information). Here's an example:


This screenshots shows the results for a nanotoxicity paper and we see it picked up "titanium oxide" (accurately picking up actual nanomaterials or nanoparticles is an unsolved text mining issue). We get some impact statistics, but if you read my blog and my brief idea about capturing reuse, I think they got "impact" wrong. Anyway, they do have a knowledge graph section, which has the paper-database links, and Ferret found this paper cited in UniProt.

Thus, when I asked them if it would be hard to add new databases to that section, and I mentioned Open PHACTS and WikiPathways, they replied. In fact, within hours they told me they found the WikiPathways SPARQL end point that Andra started, which they find easier to use than the WikiPathways webservices :)  They asked me for a webpage to point users too, and while I was thinking about that, they found another WikiPathways trick I did not know about, you can browse for WP2371 OR WP2059. Tina then replied that, given a PubMed ID, there was even a nicer way, just browse for all pathways with a particular PubMed ID.

Well, a bit later, they release Ferret 0.4.2 with WikiPathways support. The below screenshot shows the output for a paper (doi:10.2174/1389200214666131118234138) by Rianne (who did internships in our group, and now does here PhD in toxicology):


The Ferret infobar shows seventeen WikiPathways that are linked to this paper, which happens to be the collection that Rianne made during her internship leading to this paper, and uploaded to WikiPathways some months ago. Earlier this year we sat down with her, Freddie, and Linda to make them more machine readable. This is what this list looks like in the browse functionality:


Ferret version 0.4.2 did not work for me, but they fixed the issue within a day, and the above screenshot was made with version 0.4.3. So, besides like a bunch of good hackers, they also seem to listen to their customers. So, what databases do you feel they should add? Leave a comment here, or tweet them at @getferret (pls cc me).

Willighagen, E., Capturing reuse in altmetrics. J. Brief Ideas. May 2015. URL http://dx.doi.org/10.5281/zenodo.17892
Fijten, R. R. R., Jennen, D. G. J., Delft, Dec. 2013. Pathways for ligand activated nuclear receptors to unravel the genomic responses induced by hepatotoxicants. Current Drug Metabolism, 1022-1028.
URL http://dx.doi.org/10.2174/1389200214666131118234138

Journal of Brief Ideas: an excellent idea!

Journals, in the past, published what researchers wanted to talk about. That is what dissemination is about, of course. Like everything, over time, the process becomes more restricted and more bureaucratic. All for quality, of course. To provide and to formalize that scientific communication has diversity, many journals have different articles types. Letters to the Editor, Brief Communications, etc. Posting a brief idea, however, is for many journals not of enough interest.

Hence, a niche for the Journal of Brief Ideas. It's a project in beta, any may never find sustainability, but it is worth a try:


I can see why this may work:
  • you teamed up with ZENODO to provide DOIs
  • you log in with your ORCID
  • it is Open Access (CC-BY)
  • it fills the niche that ideas you will not tests never see the light of the day (so, this journal will contribute to more efficient scholarly communication)
I can also see why it may not work:
  • it is too easy to post an idea, leading to too much noise
  • it will not be indexed and therefore not fulfill a key requirements for many scientists (WoS, etc)
  • you cannot add references like with papers
I can also see some features I would love to see:
  • bookmarking buttons for CiteULike, Mendeley, etc
  • #altmetrics output on this site
  • provide #altmetrics from this site (view statistics, etc)
  • integrate with peer review cites (for post-publication peer review)
  • allow annotation of entities in papers (like PDB, gene, protein codes, metabolite identifiers, etc; and whatever else for other scholarly domains)
Things I am not sure about:
  • allow a single ToC-like graphics (as they will give papers more coverage and more impact)
Anyway, what is needs now, is momentum. It needs a business model, even if the turnover can be kept low because of good choices of technology. I am looking forward where the team is going, and how the community will pick up this idea. (For example, despite I know that some ideas are tweeted, I haven't found a donut from Altmetric.com for one of the idea DOIs yet.)

For my readers, please give it a try. You know you have that idea you like to get some feedback on, but you know you will not have funding for it, and it does not really match what general research plans. It would be a shame to leave that idea rot on the shelf. Get it out, get cited!

I tried it too, see below my brief idea as found on ZENODO (where they automatically get deposited), and my experiences are a bit mixed. I like the idea, but it is also getting used to. The number of words are limited, and I really find it awkward not to cite prior art, the things I built on. The above points reflect a good deal of my reservations.


Friday, August 21, 2015

Internet-aided serendipity in science (was: How the Internet can help chemists with serendipity)

The ACS Central Science RSS feed in Feedly.
Finding new or useful knowledge to solve your scientific problem, question, etc, is key to research. It also is what struck me as a university student as so badly organized (mid-nineties). In fact, technologically there was no issue, so why are scientists not using these technologies then?? This question is still relevant, and readers of this blog know this is a toy research area to me, and I have previously experimented with a lot of technologies to see how they can support research, and, well, basically, serendipity. Hence, internet-aided serendipity.

This happened to be the topic of an article by Prof. Bertozzi (@CarolynBertozzi), editor-in-chief of the gold Open Access ACS Central ScienceHow the Internet can help chemists with serendipity, part of the internet.cen.org website. I left a comment, which is currently awaiting moderation, but to keep the discussion on twitter going, here is what I left (the comment on the article may turn out to have lost the formatting still present here):
    Dear Prof Bertozzi,

    the browsing of TOCs is not a lost art, and neither has the Internet solved everything. Where I fully agree that Twitter and other social media have filled a niche in finding interesting literature, it is basically kind of a majority vote and does not really find you the papers interesting to your research. This has to extend, of course, to #altmetrics, which capture the attention on social media and allows creating TOCs on the fly, as do (good) paper bookmarking services like CiteULike (see http://www.citeulike.org/citegeist?days=7). Similarly, people developed tools to find science in blog posts, like the no longer existing Postgenomic.com, continued/forked as Chemical blogspace (see http://cb.openmolecules.net/inchis.php, but consider this code has not been updated in the past 2-3 years). So, creating cross-journal TOCs is a daily habit for many of us still. (BTW, will ACS Central Science fully adopt #altmetrics, as data provider as well as showing #altmetrics on the website?)

    Returning to the single journal TOCs. Here, RSS feeds have shown to be critical, happy to find a RSS feed for ACS Central Science (http://feeds.feedburner.com/acs/acscii). It is good to see that the journal's RSS feed for the ASAP papers contains for each paper the title, authors, the TOC image, and the DOI (possibly, it could also include the abstract and ORCIDs of the authors). Better, it should adopt CMLRSS and include InChIs, MDL molfiles, or SMILES of the chemical compounds discussed in that paper (see this ACS JCIM paper: http://dx.doi.org/10.1021/ci034244p). With proper adoption of CMLRSS, chemists could define substructures and be alerted when papers would be published containing chemicals with that substructure (and it does not have to stop there, as cheminformatically it is trivial to extend this to chemical reactions, or any other chemistry). After all, we don't want to miss the chemistry that sparks our inspiration!

    I personally keep track of a number of journals via RSS feeds which I aggregate in Feedly, which filled the gap after GoogleReader was closed down. Feedly does not support CMLRSS (unfortunately, but I have other tools for that) and there are a few alternatives.

    So, I hope the ACS Central Central journal will pick up your challenge and continue to support modern (well, CMLRSS was published in 2004) technologies to support your past workflows! For example, make the link to the ACS Central Science RSS feed more prominent, and write an editorial about how to use it with, for example, Feedly.

    Egon
    Maastricht University
    The Netherlands
Of course, there is a lot more. It should not surprise you that adoption of PDF and ReadCube as killing internet-aided serendipity, where HTML+RDF, microformats, schema.org, etc would in fact enable serendipity. Chemistry publishers do not particularly have a track record in enabling the kind of serendipity Prof. Bertozzi is looking for. Good thing is that as editor-in-chief of an ACS journal, she can restore this serendipity and I kindly invite her to the Blue Obelisk community to discuss how all the technologies that have been developed in the past 15 years can help chemists. Because we have plenty of ideas. (And where is that website again aggregating chemistry journal RSS feeds...?)

Or, just browse this posts in blog, where I have frequently written about the innovation with publishers (in general; some do better than others).

Update: Other perspectives

Friday, July 31, 2015

WikiPathways and two estrone-x,y-quinones added to Wikidata

WikiPathways does a lot of curation, with a team growing in size. A number of regular jobs is performed weekly by one of a group of some 15-20 curators. On top of that, some curators do much more than this weekly task, e.g. Kristina Haspers. Since I joined the BiGCaT team of Chris Evelo in Maastricht, I have been looking into the metabolites and other small molecules, and did quite a bit of work to make that information machine readable. See, for example, these open notebook science posts.

This curation is partly supported by tools, e.g. bots and tests. Tests are, among others, being run nightly on a Jenkins instance (in various configurations). One of the bots create this report, which Martina Kutmon recently reminded me of. Starting at the end of that, I started browsing it for unrecognized metabolites (for various reasons). My eyes fell on two compounds in the estrogen metabolism pathway, originally created by Pieter Giesbertz: estrone-2,3-quinone and estrone-3,4-quinone (in green):


The website was not showing up mappings to other database for the cross-references from PubChem. A quick check confirmed that HMDB, KEGG and ChEBI did not have this compound. HMDB has an entry for one of the compounds, given the name, but the chemical graph has undefined stereochemistry. That certainly explains why it did not map to the PubChem compound ID. And, indeed, PubChem does have the HMDB as substance, but not linked to a compound. So, I added them to Wikidata: Q20739847 and Q20742851.


Then, when I make the next metabolite ID mapping database for BridgeDb, it will have mappings between the cross-references in WikiPathways for these two compounds to, at the time of writing, ChemSpider, and to the CAS registry number of one of the two. Please also note that Wikidata allowed me to store the information source.

Thus, for me, Wikidata is the place to add new mappings, and I herald work by Andra Waagmeester, Andrew Su, and others to use Wikidata for this kind of purpose. If you agree, you can add your support here.