Pages

Sunday, July 12, 2020

Journals performance, individual articles, assessment and ranking, and researchers

Sign here. Image: CC-BY-SA.
Okay, there it is: journals performance, individual articles, assessment and ranking, and researchers. It has it all. Yes, it is journal impact factor season.

Most scholars know now when and when not to use the impact factor. But old habits die slowly and the journals impact factor (JIF or IF) is still used a lot to rank journals, rank universities, rank articles, rank researchers.

I signed DORA, but that does not mean I do not know that the (change year over year of the) IF hints at how a journal is doing. Yes, an median is better than an average. A citation count distribution is even better. After all, a stellar IF still means that tens of percent points of the articles in the same period are not or just once cited.

One striking voice was angry the Journal of Cheminformatics tweeted its new IF. We did not do so without internal discussion and deliberation. Readers of the journal know we do not mention the IF on our front page (as many journals) do. We are working on displaying the citation distribution on a subpage of the website. And we want authors to submit to our journal because we value Open Science and have a reviewer that value that too. We want articles in our journal to be easily reproduced.

But I know reality. I know many researchers are still expected to report IFs along with their articles. I am one of them (in the past 8 years, articles in a journal with IF>5 were "better"). I've been objecting against it for many years, and fortunately there is a path away from them in The Netherlands. If you must rank articles and researchers, then rank them according to their own work, and not based on the work of others. So, I decided that I had no objection against tweeting the J. Cheminform. IF.

Interestingly, if you really want to push this, you should also not mention journal names in your publication list. Let the scholars ranting against the IF but still cheering a Nature, Cell, Science (etc) article rethink their reasoning.

So, what should we do? How should we move forward. Of course I have some ideas about this. Just (re)read my blog. Progress is slow. But I ask everyone who rants about the IF to not just propose better solutions, but actively disseminate them. Implement that solutions and get other people to use it. For example, send your journal an open Letter to the Editor to make a clear statement against the use of IF as a reason to publish in that journal.

If that is too much for you, at least sign DORA and ask your peers to do so too.

Thursday, July 02, 2020

Bioclipse git experiences #2: Create patches for individual plugins/features

Carrying around git patches is hard work.
Source: Auckland War Memorial Museum, CC-SA.
This is a series of two posts repeating some content I wrote up back in the Bioclipse days (see also this Scholia page). They both deal with something we were facing: restructuring of version control repositories, while actually keeping the history. For example, you may want to copy or move code from one repository to another. A second use case can be a file that must be removed (there are valid reasons for that). Because these posts are based on Bioclipse work, there will be some specific terminology, but the approach I regularly apply in other situations.

This second post talks about how to migrate code from one repository to another.

Create patches for individual plugins/features

While the above works pretty well, a good alternative in situations where you only need to get a repository-with-history for a few plugins, is to use patch sets.
  • first, initialize a new git repository, e.g. bioclipse.rdf:
 mkdir bioclipse.rdf
 cd bioclipse.rdf
 git init
 nano README
 git commit -m "Added README with some basic info about the new repository" README
  • then, for each plugin discover you need what the commit was where the plugins was first commited, using the git-svn repository created earlier:
 cd your.gitsvn.checkout
 git log --pretty=oneline externals/com.hp.hpl.jena/ | tail -1
  • then create patches for the last tree before that last patch by appending '^1' to the commit hash. For example, the first patch of the Jena libraries was 06d0eb0542377f958d06892860ea3363e3316389, so I type:
 rm 00*.patch
 git format-patch 06d0eb0542377f958d06892860ea3363e3316389^1 -- externals/com.hp.hpl.jena
(tune the filter when removing old patches if there are more than 99!)
The previous two steps can be combined into a Perl script:
#!/usr/bin/perl
use diagnostics;
use strict;

my $plugin = $ARGV[0];

if (!$plugin) {
  print "Syntax: gfp <plugin|feature>\n";
  exit(0);
}

die "Cannot find plugin or feature $plugin !" if (!(-e $plugin));

`rm -f *.patch`;
my $hash = `git log --follow --pretty=oneline $plugin | tail -1 | cut -d' ' -f1`;
$hash =~ s/\n|\r//g;

print "Plugin: $plugin \n";
print "Hash: $hash \n";
`git format-patch $hash^1 -- $plugin`;
  • move these patches into your new repository:
 mv 00*.patch ../bioclipse.rdf
(tune the filter when moving the patches if there are more than 99! Also customize the target folder name to match your situation)
  • apply the new patches in your new git repository:
 cd ../bioclipse.rdf
 git am 00*.patch
(You're on your own if that fails... and you may have to default to the other alternative then)
  • repeat those two steps for all plugins you want in your new repository

Bioclipse git experiences #1: Strip away unwanted plugins

This is a series of two posts repeating some content I wrote up back in the Bioclipse days (see also this Scholia page). They both deal with something we were facing: restructuring of version control repositories, while actually keeping the history. For example, you may want to copy or move code from one repository to another. A second use case can be a file that must be removed (there are valid reasons for that). Because these posts are based on Bioclipse work, there will be some specific terminology, but the approach I regularly apply in other situations.

For this first post, think of a plugin as a subfolder, tho it even applies to files.

Strip away unwanted plugins

  • then you remove everything you do not want in your new git repository. Do:
 git clone --bare --no-hardlinks old.local.clone/ new.local.clone/
then use:
 git filter-branch --index-filter 'git rm -r -q --cached --ignore-unmatch plugins/net.bioclipse.actionHistory plugins/net.bioclipse.analysis' HEAD
It often happens that you need to run the above command several times, in cases when there are many subdirectories to be removed.
When you removed all the bits you need removed, you can clean up the repository and reduce the size considerably with:
 git repack -ad; git prune

Thursday, May 07, 2020

new project: "COVID-19 Disease Maps"

Project logo by Marek Ostaszewski.
CC-BY.
Already started a few weeks ago, but the COVID-19 Disease Maps project now has a sketch published, outlining the ambitions: COVID-19 Disease Map, building a computational repository of SARS-CoV-2 virus-host interaction mechanisms  (doi:10.1038/s41597-020-0477-8).

I've been focusing on the experimental knowledge we have about the components of the SARS-CoV-2 virion and how they interact with the human cell. I'm at least two weeks behind on reading literature, but hope to catch up a bit this week. The following diagram shows one of the pathways on the WikiPathways COVID-19 Portal:

wikipathways:WP4846, CC0
This has led to collaborations with Andra Waagmeester, Jasper Koehorst and others, resulting in this preprint that needs some tweaking before submission, to an awesome collaboration with Birgit Meldal around the Complex Portal (preprint pending), and a Japanese translation of a book around a number search queries against Wikidata (CC-BY/CC0). The latter two were started at the recent online BioHackathon.

Oh, boy, do I love Open Science.

Monday, April 27, 2020

new paper: "NanoSolveIT Project: Driving nanoinformatics research to develop innovative and integrated tools for in silico nanosafety assessment"

Fig. 1. Schematic overview of the workflow for toxicogenomics
modelling and how these models feed into the subsequent
materials modelling and IATA. Open Access.
NanoSolveIT is a H2020 project that started last year. Our BiGCaT group is involved in the data integration to support systems biology part of the Integrated Approaches to Testing and Assessment (IATA) for engineered nanomaterials in Work Package 1. This paper gives an overview of the project, the work, and the goals.

Of course, doing this is not trivial at all. And we have to bridge a lot of different research data, concepts, etc. As such, it is clear how it relates to the other nanosafety projects we have been involved in, such as eNanoMapper, NanoCommons, and RiskGONE.