Pages

Thursday, July 02, 2020

Bioclipse git experiences #2: Create patches for individual plugins/features

Carrying around git patches is hard work.
Source: Auckland War Memorial Museum, CC-SA.
This is a series of two posts repeating some content I wrote up back in the Bioclipse days (see also this Scholia page). They both deal with something we were facing: restructuring of version control repositories, while actually keeping the history. For example, you may want to copy or move code from one repository to another. A second use case can be a file that must be removed (there are valid reasons for that). Because these posts are based on Bioclipse work, there will be some specific terminology, but the approach I regularly apply in other situations.

This second post talks about how to migrate code from one repository to another.

Create patches for individual plugins/features

While the above works pretty well, a good alternative in situations where you only need to get a repository-with-history for a few plugins, is to use patch sets.
  • first, initialize a new git repository, e.g. bioclipse.rdf:
 mkdir bioclipse.rdf
 cd bioclipse.rdf
 git init
 nano README
 git commit -m "Added README with some basic info about the new repository" README
  • then, for each plugin discover you need what the commit was where the plugins was first commited, using the git-svn repository created earlier:
 cd your.gitsvn.checkout
 git log --pretty=oneline externals/com.hp.hpl.jena/ | tail -1
  • then create patches for the last tree before that last patch by appending '^1' to the commit hash. For example, the first patch of the Jena libraries was 06d0eb0542377f958d06892860ea3363e3316389, so I type:
 rm 00*.patch
 git format-patch 06d0eb0542377f958d06892860ea3363e3316389^1 -- externals/com.hp.hpl.jena
(tune the filter when removing old patches if there are more than 99!)
The previous two steps can be combined into a Perl script:
#!/usr/bin/perl
use diagnostics;
use strict;

my $plugin = $ARGV[0];

if (!$plugin) {
  print "Syntax: gfp <plugin|feature>\n";
  exit(0);
}

die "Cannot find plugin or feature $plugin !" if (!(-e $plugin));

`rm -f *.patch`;
my $hash = `git log --follow --pretty=oneline $plugin | tail -1 | cut -d' ' -f1`;
$hash =~ s/\n|\r//g;

print "Plugin: $plugin \n";
print "Hash: $hash \n";
`git format-patch $hash^1 -- $plugin`;
  • move these patches into your new repository:
 mv 00*.patch ../bioclipse.rdf
(tune the filter when moving the patches if there are more than 99! Also customize the target folder name to match your situation)
  • apply the new patches in your new git repository:
 cd ../bioclipse.rdf
 git am 00*.patch
(You're on your own if that fails... and you may have to default to the other alternative then)
  • repeat those two steps for all plugins you want in your new repository

Bioclipse git experiences #1: Strip away unwanted plugins

This is a series of two posts repeating some content I wrote up back in the Bioclipse days (see also this Scholia page). They both deal with something we were facing: restructuring of version control repositories, while actually keeping the history. For example, you may want to copy or move code from one repository to another. A second use case can be a file that must be removed (there are valid reasons for that). Because these posts are based on Bioclipse work, there will be some specific terminology, but the approach I regularly apply in other situations.

For this first post, think of a plugin as a subfolder, tho it even applies to files.

Strip away unwanted plugins

  • then you remove everything you do not want in your new git repository. Do:
 git clone --bare --no-hardlinks old.local.clone/ new.local.clone/
then use:
 git filter-branch --index-filter 'git rm -r -q --cached --ignore-unmatch plugins/net.bioclipse.actionHistory plugins/net.bioclipse.analysis' HEAD
It often happens that you need to run the above command several times, in cases when there are many subdirectories to be removed.
When you removed all the bits you need removed, you can clean up the repository and reduce the size considerably with:
 git repack -ad; git prune

Thursday, May 07, 2020

new project: "COVID-19 Disease Maps"

Project logo by Marek Ostaszewski.
CC-BY.
Already started a few weeks ago, but the COVID-19 Disease Maps project now has a sketch published, outlining the ambitions: COVID-19 Disease Map, building a computational repository of SARS-CoV-2 virus-host interaction mechanisms  (doi:10.1038/s41597-020-0477-8).

I've been focusing on the experimental knowledge we have about the components of the SARS-CoV-2 virion and how they interact with the human cell. I'm at least two weeks behind on reading literature, but hope to catch up a bit this week. The following diagram shows one of the pathways on the WikiPathways COVID-19 Portal:

wikipathways:WP4846, CC0
This has led to collaborations with Andra Waagmeester, Jasper Koehorst and others, resulting in this preprint that needs some tweaking before submission, to an awesome collaboration with Birgit Meldal around the Complex Portal (preprint pending), and a Japanese translation of a book around a number search queries against Wikidata (CC-BY/CC0). The latter two were started at the recent online BioHackathon.

Oh, boy, do I love Open Science.

Monday, April 27, 2020

new paper: "NanoSolveIT Project: Driving nanoinformatics research to develop innovative and integrated tools for in silico nanosafety assessment"

Fig. 1. Schematic overview of the workflow for toxicogenomics
modelling and how these models feed into the subsequent
materials modelling and IATA. Open Access.
NanoSolveIT is a H2020 project that started last year. Our BiGCaT group is involved in the data integration to support systems biology part of the Integrated Approaches to Testing and Assessment (IATA) for engineered nanomaterials in Work Package 1. This paper gives an overview of the project, the work, and the goals.

Of course, doing this is not trivial at all. And we have to bridge a lot of different research data, concepts, etc. As such, it is clear how it relates to the other nanosafety projects we have been involved in, such as eNanoMapper, NanoCommons, and RiskGONE.

Sunday, March 29, 2020

Tackling SARS-CoV-2 with big data

This blog post will contain a translation I made of this short "our story" Coronavirus te lijf met big data at the MUMC+ website written by André Leblanc. The Maastricht University Medical Center+ (MUMC+) is a collaboration of our our Maastricht University Faculty of Health, Medicine, and Life Science, of which our BiGCaT research group is part.

Wikidata is a community project and I only use and contribute to it. Scholia is a project started by Finn Nielsen (Technical University of Denmark - DTU), and now has funding from the Alfred P. Sloan Foundation, coordinated by Daniel Mietchen and Lane Rasberry (University of Virginia). Further acknowledgements to Andra Waagmeester (Micelio) and Jasper Koehorst (Wageningen University) for a great collaboration on corona virus information (see also Wikidata:WikiProject_COVID-19). WikiPathways colleagues including, of course, Prof. Chris Evelo and Dr. Martina Kutmon in Maastricht, but also Dr. Alex Pico and others in San Francisco. For me it was one of the selling points of the research group when I joined in 2012.

Tackling the corona virus with big data


Scholars around the world are working relentlessly on the development of a vaccine against the new SARS-CoV-2 coronavirus. Chemist and assistant professor Egon Willighagen contributes in collaboration with colleagues at the BiGCaT Department of Bioinformatics in Maastricht to make data and knowledge easier to find for other scholars. How does that work?

Big data is the new buzz words in the scholarly community. For example, collecting worldwide data around the treatment of cancer, and extracting from the best personal, unique treatment. In the case of the new coronavirus there is a more general need to just have access to data. Since the virus outbreak in Wuhan, China, there has been an explosion of new research articles on the COVID19 and the causing SARS-CoV-2 virus. The total number of scientific publications about corona viruses itself has reached some 29 thousand. These are not only about the new virus, but also the corona viruses that roamed the world before, like SARS and MERS. Either way, this makes it practically impossible to read all these articles. Instead, access to this literature has to be provided in a different way, allowing researchers to find the knowledge and data they need for their research.

Filter
Willighagen does this by organizing scientific literature, linking information, and filtering the collection of data and publications, making it searchable for scholars. He annotates publications with search terms and author names, and uses unique, global identifiers (like personal identification numbers) to support this. This is not unlike the use of phone numbers or dictionaries.

Various tools

Wikidata is the database used by Willighagen to link the information resources, along with Scholia to visualize the results. For example, Wikidata organizes data around the new virus with the https://tools.wmflabs.org/scholia/topic/Q82069695 entry. Willighagen uses these two tools to visualize what this database knows about specific topics.

Research can take advantage of a new open access resource edited by Willighagen: https://egonw.github.io/SARS-CoV-2-Queries/. Also social media are used: Twitter is used to increase awareness and mobilize people. Willighagen: "That is from a personal motivation. I tweet articles that show important changes. Or if they emphasize aspects that show how unique and urgent the situation". And finally there is WikiPathways, a project initiated by colleagues of Willighagen, to collect even more specific knowledge about the COVID19 virus. Here's the pathway about the SARS-CoV-2 virion: https://www.wikipathways.org/index.php/Pathway:WP4846