Pages

Sunday, March 29, 2020

Tackling SARS-CoV-2 with big data

This blog post will contain a translation I made of this short "our story" Coronavirus te lijf met big data at the MUMC+ website written by André Leblanc. The Maastricht University Medical Center+ (MUMC+) is a collaboration of our our Maastricht University Faculty of Health, Medicine, and Life Science, of which our BiGCaT research group is part.

Wikidata is a community project and I only use and contribute to it. Scholia is a project started by Finn Nielsen (Technical University of Denmark - DTU), and now has funding from the Alfred P. Sloan Foundation, coordinated by Daniel Mietchen and Lane Rasberry (University of Virginia). Further acknowledgements to Andra Waagmeester (Micelio) and Jasper Koehorst (Wageningen University) for a great collaboration on corona virus information (see also Wikidata:WikiProject_COVID-19). WikiPathways colleagues including, of course, Prof. Chris Evelo and Dr. Martina Kutmon in Maastricht, but also Dr. Alex Pico and others in San Francisco. For me it was one of the selling points of the research group when I joined in 2012.

Tackling the corona virus with big data


Scholars around the world are working relentlessly on the development of a vaccine against the new SARS-CoV-2 coronavirus. Chemist and assistant professor Egon Willighagen contributes in collaboration with colleagues at the BiGCaT Department of Bioinformatics in Maastricht to make data and knowledge easier to find for other scholars. How does that work?

Big data is the new buzz words in the scholarly community. For example, collecting worldwide data around the treatment of cancer, and extracting from the best personal, unique treatment. In the case of the new coronavirus there is a more general need to just have access to data. Since the virus outbreak in Wuhan, China, there has been an explosion of new research articles on the COVID19 and the causing SARS-CoV-2 virus. The total number of scientific publications about corona viruses itself has reached some 29 thousand. These are not only about the new virus, but also the corona viruses that roamed the world before, like SARS and MERS. Either way, this makes it practically impossible to read all these articles. Instead, access to this literature has to be provided in a different way, allowing researchers to find the knowledge and data they need for their research.

Filter
Willighagen does this by organizing scientific literature, linking information, and filtering the collection of data and publications, making it searchable for scholars. He annotates publications with search terms and author names, and uses unique, global identifiers (like personal identification numbers) to support this. This is not unlike the use of phone numbers or dictionaries.

Various tools

Wikidata is the database used by Willighagen to link the information resources, along with Scholia to visualize the results. For example, Wikidata organizes data around the new virus with the https://tools.wmflabs.org/scholia/topic/Q82069695 entry. Willighagen uses these two tools to visualize what this database knows about specific topics.

Research can take advantage of a new open access resource edited by Willighagen: https://egonw.github.io/SARS-CoV-2-Queries/. Also social media are used: Twitter is used to increase awareness and mobilize people. Willighagen: "That is from a personal motivation. I tweet articles that show important changes. Or if they emphasize aspects that show how unique and urgent the situation". And finally there is WikiPathways, a project initiated by colleagues of Willighagen, to collect even more specific knowledge about the COVID19 virus. Here's the pathway about the SARS-CoV-2 virion: https://www.wikipathways.org/index.php/Pathway:WP4846

Thursday, March 19, 2020

new paper: "Wikidata as a knowledge graph for the life sciences"

A figure from the article, outlining the idea
of using SPARQL queries to extract data
from the open knowledge base.
As a reader of my blog, you know I have been doing quite some research where Wikidata has some role. I am preparing a paper on the work I have done around chemicals in Wikidata, based on what I presented at the ICCS with a poster. So, I was delighted when Andra and Andrew asked me to contribute to a paper outline the importance of Wikidata to the life sciences. The paper was published in eLife, which I'm excited about to, as they do a significant amount of publishing innovation.

I'll keep this post brief, as I have plenty of work to do, among which is SARS-CoV-2 data in Wikidata. Join this project, after you read the paper: Wikidata as a knowledge graph for the life sciences (doi:10.7554/eLife.52614, or in Scholia):



I'll write up some more queries for this eBook now: Wikidata Queries around the SARS-CoV-2 virus and pandemic.

Sunday, March 15, 2020

SARS-CoV-2, stuck at home, flu, and snowstorms

Scholia linking articles about the COVID19 disease.
Okay, okay, the snowstorm was ten years ago, when we were living in Sweden. We had two snowstorms, each time stuck at home, unable to leave our house. That was okay. We knew the next days the streets were cleaned, and we could continue living our lives.

Now it's different. I've been in 'social distancing' mode since the evening of Friday the 6th, so a bit over a week now. Because I have a flu. Presumably. Testing for SARS-CoV-2 is not routinely done and saved for risk groups and patients with severe COVID19 symptoms.

But the current situation is once in a lifetime. In the bad way. My generation has not had a situation like this yet. A real national emergency. But The Netherlands is coping. The data is scary. The situation in North Italy shows that humans are humans, and the virus doesn't care where it is surviving. It is how each country deals with it. And let me make clear, we must be learning from the countries that have been in the fire line already.

(North) Italy has a health care system in the top 5% according to OECD guidelines. Still, they were taken by surprise. But even the warned countries have been hesitant. The discussion is complex. A smaller economy (a 1% shrink is estimated right now) also means (as a Dutch professor pointed out 2, 3 days ago) there is less tax money to spend on the health care system.

Sad fact is, where are no longer talking about how to stop SARS-CoV-2. We are now talking about minimizing the number of causalities. A storm it is.

Keep safe, keep electronically in contact with the people around you (mental health), and foremost, wash your hands and practice social distancing. Let the storm not grow much further. This storm is not over the next morning. We're in for a rough ride.

Saturday, January 25, 2020

MetaboEU2020 in Toulouse and the ELIXIR Metabolomics Community assemblies

This week I attended the European RFMF Metabomeeting 2020, aka #MetaboEU2020, held in Toulouse. Originally, I had hoped to do this by train, but that turned out unfeasible. Co-located with this meeting where ELIXIR Metabolomics Community meetings. We're involved in two implementation studies for together less than a month of work. But both this community and the conference are great places to talk about WikiPathways, BridgeDb (our website is still disconnected from the internet), and cheminformatics.

Toulouse was generally great. It comes with its big city issues, like fairly expensive hotels, and a very frequent public transport system. It also had a great food market where we had our "gala dinner". Toulouse is also home to Airbus, so it was hard to miss the Beluga:


The MetaboEU2020 conference itself had some 400 participants, of course, with a lot of wet lab metabolomics. As a chemist, with a good pile of training in analytical chemistry, it's great to see the progress. From a data analysis perspective, the community has a long way to come. We're still talking about known known, unknown knowns, and unknown unknowns. The posters were often cryptic, e.g. stating they found 35 interesting metabolites, without actually listing them. The talks were also really interesting.

Now, if you read this, there is a good chance you were not at the meeting. You can check the above linked hashtag for coverage on Twitter, but we can do better. I loved Lanyrd, but their business model was not scalable and the service no longer exists. But Scholia (see doi:10.3897/rio.5.e35820) could fill the gap (it uses the Wikidata RDF and SPARQL queries). I followed Finn's steps and created a page for the meeting and started associated speakers (I've done this in the past for other meetings too):


Finn also created proceedings pages in the past, which I also followed. So, I asked people on Twitter to post their slidedeck and posters on Figshare or Zenodo, and so far we ended up with 10 "proceedings" (thanks to everyone who did!!!):



As you can see, there is an RSS feed which you can follow (e.g. with Feedly) to get updates if more materials appears online! I wish all conferences did this!

Thursday, January 16, 2020

Help! Digital Object Identifiers: Usability reduced if given at the bottom of the page

The (for J. Cheminform.) new SpringerNature article template has the Digital Object Identifier (DOI) at the bottom of the article page. So, every time I want to use the DOI I have to scroll all the way down to the page. That could be find for abstracts, but totally unusable for Open Access articles.

So, after our J. Cheminform. editors telcon this Monday, I started a Twitter poll:


Where I want the DOI? At the top, with the other metadata:
Recent article in the Journal of Cheminformatics.
If you agree, please vote. With enough votes, we can engage with upper SpringerNature manager to have journals choose where they want the DOI to be shown.

(Of course, the DOI as semantic data in the HTML is also important, but there is quite good annotation of that in the HTML <head>. Link out to RDF about the article, is still missing, I think.)