Pages

Friday, July 31, 2020

New Editorial: "Adoption of the Citation Typing Ontology by the Journal of Cheminformatics"

My first blog post about the Citation Typing Ontology was already more than 10 years go. I have been fascinated with finally being able to add some semantics to why we cite a certain article. For years, I had been tracking why people were citing the Chemistry Development Kit articles. Some were citing the article because the Chemistry Development Kit was an important thing to mention, while other articles cited it because they actually used the Chemistry Development Kit. I also started using CiTO predicates in RDF models, and you might find them in various ongoing semantic web projects.

Unfortunately, scholarly publishers did not show much interest. One project that did, was CiteULike. I had posted it as a feature request and it was picked up by CiteULike, something I am still grateful for. CiteULike also no longer exists, but I had a lot of fun with it while it existed:
  1. CiteULike CiTO Use Case #1: Wordles
  2. CiTO / CiteULike: publishing innovation
But I like to also stress it has more serious roles in our scientific dissemination workflow:
  1. "What You're Doing Is Rather Desperate"
So, I am delighted that we are now starting a pilot with the Journal of Cheminformatics to use CiTO annotation at the journal side. You can read it in this new editorial.

It is a first step of a second attempt to get CiTO of the ground. Had CiteULike still existed, this would have been a wonderful mashup, but Wikidata might be a good alternative. In fact, I already trialed a data model and developed several SPARQL queries. Support in Scholia is a next step on this front.

Now, citation networks in general have received a lot of attention. And with projects like OpenCitations we increasingly have access to this information. That allows visualisation, for example, with Scholia, here for the 2010 paper:


More soon!

For now, if you like to see the CiTO community growing too, please tweet, blog, message your peers about our new editorial:

Willighagen, E. Adoption of the Citation Typing Ontology by the Journal of CheminformaticsJ Cheminform 12, 47 (2020). https://doi.org/10.1186/s13321-020-00448-1

Tuesday, July 28, 2020

new paper: "Risk Governance of Emerging Technologies Demonstrated in Terms of its Applicability to Nanomaterials"

Design of the Council and the
processes around it.
CC-BY.
In April I reported about a paper outlining NanoSolveIT and another paper outlining plans have come out, this time detailing the Risk Governance Council which the European Commission asked three H2020 projects to be set up. One is RiskGONE where our group is involved in, and the other two are Gov4Nano and NANORIGO:

Isigonis P, Afantitis A, Antunes D, Bartonova A, Beitollahi A, Bohmer N, et al. Risk Governance of Emerging Technologies Demonstrated in Terms of its Applicability to Nanomaterials. Small. 2020 Jul 23;2003303. 10.1002/smll.202003303

What I personally hope we will achieve with this Council, is that all our governance is linked clearly via FAIR data, provenance, reproducible research, to underlying experiments. This requires many FAIR approaches, something that we have been working together closely between RiskGONE and Gov4Nano already in the NanoSafety Cluster WGF.

Of course, all this requires good data (think NanoCommons) and good computation (think NanoSolveIT).

Sunday, July 12, 2020

Journals performance, individual articles, assessment and ranking, and researchers

Sign here. Image: CC-BY-SA.
Okay, there it is: journals performance, individual articles, assessment and ranking, and researchers. It has it all. Yes, it is journal impact factor season.

Most scholars know now when and when not to use the impact factor. But old habits die slowly and the journals impact factor (JIF or IF) is still used a lot to rank journals, rank universities, rank articles, rank researchers.

I signed DORA, but that does not mean I do not know that the (change year over year of the) IF hints at how a journal is doing. Yes, an median is better than an average. A citation count distribution is even better. After all, a stellar IF still means that tens of percent points of the articles in the same period are not or just once cited.

One striking voice was angry the Journal of Cheminformatics tweeted its new IF. We did not do so without internal discussion and deliberation. Readers of the journal know we do not mention the IF on our front page (as many journals) do. We are working on displaying the citation distribution on a subpage of the website. And we want authors to submit to our journal because we value Open Science and have a reviewer that value that too. We want articles in our journal to be easily reproduced.

But I know reality. I know many researchers are still expected to report IFs along with their articles. I am one of them (in the past 8 years, articles in a journal with IF>5 were "better"). I've been objecting against it for many years, and fortunately there is a path away from them in The Netherlands. If you must rank articles and researchers, then rank them according to their own work, and not based on the work of others. So, I decided that I had no objection against tweeting the J. Cheminform. IF.

Interestingly, if you really want to push this, you should also not mention journal names in your publication list. Let the scholars ranting against the IF but still cheering a Nature, Cell, Science (etc) article rethink their reasoning.

So, what should we do? How should we move forward. Of course I have some ideas about this. Just (re)read my blog. Progress is slow. But I ask everyone who rants about the IF to not just propose better solutions, but actively disseminate them. Implement that solutions and get other people to use it. For example, send your journal an open Letter to the Editor to make a clear statement against the use of IF as a reason to publish in that journal.

If that is too much for you, at least sign DORA and ask your peers to do so too.

Thursday, July 02, 2020

Bioclipse git experiences #2: Create patches for individual plugins/features

Carrying around git patches is hard work.
Source: Auckland War Memorial Museum, CC-SA.
This is a series of two posts repeating some content I wrote up back in the Bioclipse days (see also this Scholia page). They both deal with something we were facing: restructuring of version control repositories, while actually keeping the history. For example, you may want to copy or move code from one repository to another. A second use case can be a file that must be removed (there are valid reasons for that). Because these posts are based on Bioclipse work, there will be some specific terminology, but the approach I regularly apply in other situations.

This second post talks about how to migrate code from one repository to another.

Create patches for individual plugins/features

While the above works pretty well, a good alternative in situations where you only need to get a repository-with-history for a few plugins, is to use patch sets.
  • first, initialize a new git repository, e.g. bioclipse.rdf:
 mkdir bioclipse.rdf
 cd bioclipse.rdf
 git init
 nano README
 git commit -m "Added README with some basic info about the new repository" README
  • then, for each plugin discover you need what the commit was where the plugins was first commited, using the git-svn repository created earlier:
 cd your.gitsvn.checkout
 git log --pretty=oneline externals/com.hp.hpl.jena/ | tail -1
  • then create patches for the last tree before that last patch by appending '^1' to the commit hash. For example, the first patch of the Jena libraries was 06d0eb0542377f958d06892860ea3363e3316389, so I type:
 rm 00*.patch
 git format-patch 06d0eb0542377f958d06892860ea3363e3316389^1 -- externals/com.hp.hpl.jena
(tune the filter when removing old patches if there are more than 99!)
The previous two steps can be combined into a Perl script:
#!/usr/bin/perl
use diagnostics;
use strict;

my $plugin = $ARGV[0];

if (!$plugin) {
  print "Syntax: gfp <plugin|feature>\n";
  exit(0);
}

die "Cannot find plugin or feature $plugin !" if (!(-e $plugin));

`rm -f *.patch`;
my $hash = `git log --follow --pretty=oneline $plugin | tail -1 | cut -d' ' -f1`;
$hash =~ s/\n|\r//g;

print "Plugin: $plugin \n";
print "Hash: $hash \n";
`git format-patch $hash^1 -- $plugin`;
  • move these patches into your new repository:
 mv 00*.patch ../bioclipse.rdf
(tune the filter when moving the patches if there are more than 99! Also customize the target folder name to match your situation)
  • apply the new patches in your new git repository:
 cd ../bioclipse.rdf
 git am 00*.patch
(You're on your own if that fails... and you may have to default to the other alternative then)
  • repeat those two steps for all plugins you want in your new repository

Bioclipse git experiences #1: Strip away unwanted plugins

This is a series of two posts repeating some content I wrote up back in the Bioclipse days (see also this Scholia page). They both deal with something we were facing: restructuring of version control repositories, while actually keeping the history. For example, you may want to copy or move code from one repository to another. A second use case can be a file that must be removed (there are valid reasons for that). Because these posts are based on Bioclipse work, there will be some specific terminology, but the approach I regularly apply in other situations.

For this first post, think of a plugin as a subfolder, tho it even applies to files.

Strip away unwanted plugins

  • then you remove everything you do not want in your new git repository. Do:
 git clone --bare --no-hardlinks old.local.clone/ new.local.clone/
then use:
 git filter-branch --index-filter 'git rm -r -q --cached --ignore-unmatch plugins/net.bioclipse.actionHistory plugins/net.bioclipse.analysis' HEAD
It often happens that you need to run the above command several times, in cases when there are many subdirectories to be removed.
When you removed all the bits you need removed, you can clean up the repository and reduce the size considerably with:
 git repack -ad; git prune