Saturday, August 26, 2017

Updated HMDB identifier scheme

I have not found further details about it yet, but noticed half an hour ago that the Human Metabolome Database (doi:10.1093/nar/gks1065) seems to have changes all their identifiers: the added extra zeros. The screenshot for D-fructose on the right shows how the old identifiers are now secondary identifiers. We will face a period of a few years where one resource uses the old identifiers (archives, supplementary information, other databases, etc).

This change has huge implications, including that mere string matching of identifiers becomes really difficult: we need to know if it uses the old scheme or the new scheme. Of course, we can see this simply from the identifier length, but we likely need a bit of software ("artificial intelligence") in our software.

I ran into the change just now, because I was working on the next BridgeDb metabolite identifier mapping database. The release of this weekend will not have the new identifiers for sure: I first need more info, more detail.

For now, if you use HMDB identifiers in your database, get prepared! Using old identifiers to link to the HMDB website seems to work fine, as they have a redirect working at the server level. Starting to think about internally updating your identifiers (by adding two zero's), is likely something to put on the agenda.

No comments:

Post a Comment