Earlier this year I gave Mendeley a try, after having been a happy JabRef user, unhappy Connotea user (main problem was that any URI can be bookmarked, not just papers, so very noisy), happy CiteULike user (and still am). But the client did not bring me what I needed, and I canceled my account again.

Since then, Mendeley has undergone a transformation, and there is talk about OpenSourcing the client (or not), Open Data, and an Open Standard API. But, importantly, I no longer need the client and can do everything in the browser.

Moreover, Mendeley has momentum and is starting to provide interesting apps around the API, such as readermeter.org. And since being a scientist is playing the publishing game, one just must add once papers to these systems, just advertise them:

This brings us to problem #1: author identity, which is a general problem and addressed by projects like ORCID. So, besides the page shown above, I have a second page under an entry with just my first name.

But, as the title of the post suggests, Mendeley suffers from a second problem, which was recently brought up by Duncan in his How many unique papers are there in Mendeley? post. Mendeley, apparently, claims 36M papers, but the number of unique papers is much smaller, as detailedly outline by Duncan. Mr. Gunn replied that [d]uplicates are understandably enriched among the popular papers, such as yours, and it’s harder to go from 6 duplicates to 1 canonical document than from 2 to one, because the variability is higher (see this comment), but I do not buy that.

I replied in the blog about that claim and also made a suggestion: this dereplication should really be a crowd-sourcing event, but I found it impossible to find a place to report duplication, so I had to use a message to support form and a uninformative category Other. If I was working in Mendeley, I would make this reporting a key technology behind their dereplication efforts.

Anyway, the duplication goes deep, very deep into the long tail. And really, my papers are fairly well received in general (many of my papers in BMC journals are 'Highly Accessed'; I did request some distinction there, using the StackOverflow gold, silver, bronze system), but incomparable with the highly bookmarked papers in Mendeley. I know this is probably not something Mendeley likes to hear, but the paper duplication goes deep, very deep too: a majority of my papers show duplicates. A semi-exhaustive scan showed me duplication for the XMPP paper (here and here), the Blue Obelisk paper (here, here, and here; yes, three copies), the CDK-Taverna paper (here and here), the Bioclipse 2 paper (here and here), the userscripts paper (here and here), the CDK I paper (here and here), and the CDK II paper (here and here).

Hopefully, by the time you read this post, at least some above the above links no longer work. In that respect, I would also like to request URIs based on the DOI instead.
8

View comments

  1. I think it would be interesting to look further at the relationship between popularity and duplication, but let's not get caught up in trying to estimate numbers for something that's changing so rapidly.

    We've begun to address the existing dupes, and, as you might have guessed, we are also looking to crowdsource the Dupuis detection. There's a working demo of this already I can show if you run into me.

    ReplyDelete
  2. Very much looking forward to that, particular now that Mendeley seems to become more Open every day!

    ReplyDelete
  3. now, that's a bit embarrassing, but I'll definitely try to merge alternate spellings as of the next upgrade

    ReplyDelete
  4. These duplications are difficult. Some can be caught, but there should also be good, easy means to have the Social Web remove duplication, both in paper space as in author space.

    ReplyDelete
  5. Egon, glad to see I'm not the only one with duplicates, and didn't realise it was a "known issue" to the extent you describe. BTW identifying authors accurately is even harder than identifying individual papers - see http://pubmed.gov/20072710

    ReplyDelete
  6. @Duncan: that databases need curation is known; they actual errors can differ from one database to another; citation databases suffer from duplication.

    Regarding the author identity, yes, that is harder. Same initial, last name combination may be different authors. But this is where Mendeley's database come in, which has a 'My Publication' section; I'd say they have all the technical means to address author identity.

    @Mr Gunn: is Mendeley formally involved in the ORCID effort, or going to implement it anyway?

    ReplyDelete
  7. Yes, Mendeley is a participating member in ORCID. We plan to support ORCID in Mendeley as well, although it's a little too early yet to say how it will be implemented.

    (You can tell I wrote my earlier comment from my phone because Adroid adds the names of people in your contacts to the autocomplete dictionary, hence the autocorrect of "dupe" to "Dupuis", as in science librarian John Dupuis.

    ReplyDelete

Hi all, as posted about a year ago, I moved this blog to a different domain and different platform. Noting that I still have many followers on this domain (and not on my new domain, including over 300 on Feedly.com along).

This is my last post on blogger.com. At least, that is the plan. It has been a great 18 years. I like to thank the owners of blogger.com and Google later for providing this service. I am continuing the chem-bla-ics on a new domain: https://chem-bla-ics.linkedchemistry.info/

I, like so many others, struggle with choosing open infrastructure versus the freebie model. Of course, we know these things come and go. Google Reader, FriendFeed, Twitter/X (see doi:10.1038/d41586-023-02554-0).

Some days ago, I started added boiling points to Wikidata, referenced from Basic Laboratory and Industrial Chemicals (wikidata:Q22236188), David R. Lide's 'a CRC quick reference handbook' from 1993 (well, the edition I have). But Wikidata wants pressure (wikidata:P2077) info at which the boiling point (wikidata:P2102) was measured. Rightfully so. But I had not added those yet, because it slows me and can be automated with QuickStatements.

Just a quick note: I just love the level of detail Wikidata allows us to use. One of the marvels is the practices of 'named as', which can be used in statements for subject and objects. The notion and importance here is that things are referred to in different ways, and these properties allows us to link the interpretation with the source.

I am still an avid user of RSS/Atom feeds. I use Feedly daily, partly because of their easy to use app. My blog is part of Planet RDF, a blog planet. Blog planets aggregate blogs from many people around a certain topic. It's like a forum, but open, free, community driven. It's exactly what the web should be.

This blog is almost 18 years old now. I have long wanted to migrate it to a version control system and at the same time have more control over things. Markdown would be awesome. In the past year, I learned a lot about the power of Jekyll and needed to get more experienced with it to use it for more databases, like we now do for WikiPathways.

So, time to migrate this blog :) This is probably a multiyear project, so feel free to continue reading it hear.
4

The role of a university is manifold. Being a place where people can find knowledge and the track record how that knowledge was reached is often seen as part of that. Over the past decades universities outsources this role, for example to publishers. This is seeing a lot of discussion and I am happy to see that the Dutch Universities are taking back control fast now.

I am pleased to learn that the Dutch Universities start looking at rankings of a more scientific way. It is long overdue that we take scientific peer review of the indicators used in those rankings seriously, instead of hiding beyond fud around the decline of quality of research.

So, what defines the quality of a journal? Or better, of any scholarly dissemination channel? After all, some databases do better peer review than some journals.

A bit over a year ago I got introduced to Qeios when I was asked to review an article by Michie, West, and Hasting: "Creating ontological definitions for use in science" (doi:10.32388/YGIF9B.2). I wrote up my thoughts after reading the paper, and the review was posted openly online and got a DOI. Not the first platform to do this (think F1000), but it is always nice to see some publishers taking publishing seriously. Since then, I reviewed two more papers.
Text
Text
This blog deals with chemblaics in the broader sense. Chemblaics (pronounced chem-bla-ics) is the science that uses computers to solve problems in chemistry, biochemistry and related fields. The big difference between chemblaics and areas such as chem(o)?informatics, chemometrics, computational chemistry, etc, is that chemblaics only uses open source software, open data, and open standards, making experimental results reproducible and validatable. And this is a big difference!
About Me
About Me
Popular Posts
Popular Posts
Pageviews past week
Pageviews past week
1728
Blog Archive
Blog Archive
Labels
Labels
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.