Tuesday, March 05, 2019

New paper: "Beyond Pathway Analysis: Identification of Active Subnetworks in Rett Syndrome"

Figure 4 of the article.
Ryan Miller and Friederike Ehrhart worked together in this paper on furthering our understanding of the Rett syndrome  (doi:10.3389/fgene.2019.00059). They looked at the following: our biological pathways are social constructs that help us think and talk about the biology in our body (and other animals, plants, etc, of course). What if we ignore the boundaries of those constructs, can we learn the pathways? Turns out, sort of.

Using PathVisio, WikiPathways, and Cytoscape's jActiveModules they developed new modules that capture a particular aspect of the biology, and, as usual, color the transcriptional change on top of that. The Methods is richly annotated and all stuff is open source.

The authors conclude with mostly bioinformatics conclusions. No new shocking new insights into Rett syndrome (yet, but unfortunately), but they indicate that by taking advantage of our interoperability approaches (e.g. the ELIXIR Recommended Interoperability Resource BridgeDb, using mappings from Ensembl, HMDB, ChEBI, and Wikidata) pathway resources can be integrated, allowing these approaches.

Mind you, each pathway, and regularly down to the gene, metabolite, and interaction level, the information is not just built in collaboration with research communities, curated, but also backed by literature: 22494 unique PubMed references of which almost 4000 unique to WikiPathways (i.e. not in Reactome).

Have fun!

Monday, March 04, 2019

New projects: RiskGONE and NanoSolveIT

This January two new Horizon 2020 projects started for me: RiskGone and NanoSolveIT. It kept me busy in the past few weeks, with the kick-off meeting of the latter task week in Athens. Both continue on previous work of the EU NanoSafety Cluster, and I'm excited to continue with research done during the eNanoMapper project.

NanoSolveIT "aspires to introduce a ground-breaking in silico Integrated Approach to Testing and Assessment (IATA) for the environmental health and safety of Nanomaterials (NM), implemented through a decision support system packaged as both a stand-alone open software and via a Cloud platform."

I will be involved here in the knowledge infrastructure. Plenty of research there to be done around the representation of chemical composition of the nanomaterials, the structuring and consistency of ontologies to capture and integrate everything, how to capture our knowledge around the adverse outcome pathways, and how to use this all in predictive computation.

"The focus of RiskGONE will be to produce nano-specific draft guidance documents for application to ENM RA; or, alternatively, to suggest ameliorations to OECD, ECHA, and ISO/CEN SOPs or guidelines. Rather than producing assays and methods ex novo, this will be achieved through Round Robin exercises and multimodal testing of OECD TGs and ECHA methods supporting the “Malta-project”, and on methods not yet considered by OECD." (from the CORDIS website)

Here our involvement will be around similar topics.

Oh, and like all new H2020 projects, FAIR and Open Data is central words.

Sunday, February 17, 2019

Browsing like it's 1990

Ruben Verborgh pointed me this nice CERN side project ("my particles are colliding"): browsing like it's 1990. This is what WikiPathways would have looked like back then:

Comparing Research Journals Quality #2: FAIR metrics of journals

Henry Rzepa pointed me to this helpful CrossRef tool that shows publisher and journal level metrics for FAIRness (see also this post):
FAIR metrics for the Journal of Cheminformatics.
The Journal of Cheminformatics is doing generally well. This is what FAIR metrics are about: they show you what you can improve. They show you how you can become a (better) open scientist. And our journal has a few attention points:
J. Cheminform. does not do well with sending these bits of information to CrossRef.
It's nice to see we already score well on ORCIDs and funder identifiers. I am not sure why the abstracts are not included, and text mining URLs could point to something useful too, I guess. The license URL sounds a bit redundant, since all articles are CC-BY, but downstream aggregators should not guess this from a journal name (or ISSN), and I'd welcome this proper annotation too.

Saturday, February 09, 2019

Comparing Research Journals Quality #1: FAIRness of journal articles

What a traditional research article
looks like. Nice layout, hard to
reuse the knowledge from.
Image: CC BY-SA 4.0.
After Plan S was proposed, there finally was a community-wide discussion on the future of publishing. Not everyone is clearly speaking out if they want open access or not, but there's a start for more. Plan S aims to reform the current model. (Interestingly, the argument that not a lot of journals are currently "compliant" is sort of the point of the Plan.) One thing it does not want to reform, is the quality of the good journals (at least, I have not seen that as one of the principles). There are many aspects to the quality of a research journal. There are also many things that disguise themselves as aspects of quality but are not. This series discusses quality of a journal. We skip the trivial ones, like peer review, for now, because I honestly do not believe that the cOAlition S funders want worse peer review.

We start with FAIRness (doi:10.1038/sdata.2016.18). This falls, if you like, under the category of added value. FAIRness does not change the validness of the conclusions of an article, it just improves the rigor of the knowledge dissemination. To me, a quality journal is one that takes knowledge dissemination seriously. All journals have a heritage of being printed on paper, and most journals have been very slows in adopting innovative approaches. So, let's put down some requirements of the journal of 2020.

First the about the article itself:

About findable

  • uses identifiers (DOI) at least at article level, but possibly also for figures and supplementary information
  • provides data of an article (including citations)
  • data is actively distributed (PubMed, Scopus, OpenCitations, etc)
  • maximizes findability by supporting probably more than one open standard
About accessible
  • data can be accessed using open standards (HTTP, etc)
  • data is archived (possibly replicated by others, like libraries)
About interoperable
  • data is using open standards (RDF, XML, etc)
  • data uses open ontologies (many open standards exist, see this preprint)
  • uses linked data approaches (e.g. for citations)
About reusable
  • data is as complete as possible
  • data is available under an Open Science compliant license
  • data is uses modern and used community standards
Pretty straightforward. For author, title, journal, name, year, etc, most journals apply this. Of course, bigger publishers that invested in these aspects many moons ago can be compliant much easier, because they already were.

Second, what about the content of the article? There we start seeing huge differences.

About findable
  • important concepts in the article are easily identified (e.g. with markup)
  • important concepts use (compact) identifiers
Here, the important concepts are entities like cities, genes, metabolites, species, etc, etc. But also reference data sets, software, cited articles, etc. Some journals only use keywords, some journals have policies about use of identifiers for genes and proteins. Using identifiers for data and software is rare, sadly.

About accessible
  • articles can be retrieved by concept identifiers (via open, free standards)
  • article-concept identifier links are archived
  • table and figure data is annotated with concept identifiers
  • table and figure data can be accessed in an automated way
Here we see a clear problem. Publishers have been actively fighting this for years, even to today. Text miners and projects like Europe PMC are stepping in, but severely hampered by copyright law and publishers not wishing to make exception.

About interoperable
  • concept are describes common standards (many available)
  • table and figure data is available as something like CSV, RDF
Currently, the only serious standard used by the majority of (STM?) journals are MeSH terms for keywords and perhaps CrossRef XML for citations. Table and figures are more than just a graphical representations. Some journals are experimenting with this.

About reusable
  • the content of the article has a clear licence, Open Science compliant
  • the content is available with relevant standards of now
This is hard. These community standards are a moving target. For example, how we name concepts changes over time. But also identifiers themselves change over time. But a journal can be specific and accurate, which ensures that even 50 years from now, the context of the content can be determined. Of course, with proper Open Science approaches, translation to then modern community standards is simplified.

There are tons of references I can give here. If you really like these ideas, I recommend:
  1. continue reading my blog with many, many pointers
  2. read (and maybe sign) our Open Science Feedback to the Guidance on the Implementation of Plan S (doi:10.5281/zenodo.2560200), where many of these ideas are part of