Friday, June 08, 2007

Scientific Literature: searching, ranking, storage

Dealing with scientific literature has been one important theme in Chemical blogspace. For example, ranking articles and how to store your personal PDF archive has been topics of discussion. In this blog I will summarize bits of the discussion, and my personal view on things.

Searching literature is traditionally done in systems like Chemical Abstracts and Web-of-Science. The open nature of a growing number of repositories (e.g. the Dutch DARE) and indexing facilities like PubMed make these proprietary tools obsolete.

It is incorrect to assume that these payed services are the only trustworthy sources. Even WoS fails to make the all links between entries in the database. For example, I am aware of two missing citations to articles I have written, even though both the cited and the citing article is available in the system. One of the citing articles was in the Angewandte Chemie!

Additionally, some search services, like Google Scholar, have the advantage that they find copies and close variants of articles in proprietary articles on home pages and in open repositories. Today, I learned about Scientific Commons which indexes and links to a staggering 1.5M publications, using, among others, PubMed and university repositories. Where possible it makes direct links to PDF versions of the article.

Mitch set up ChemRank, to which Peter, the ChemBlog and I replied. Afterwards, I learned that other services are available too, that allow, in addition to setting up an online personal literature database, voting and commenting on articles.

Apparently, CiteULike (CUL) supports this too. In contrast to ChemRank, CUL requires a login, which I personally see as an advantage, because I can browse literature bookmarked by other accounts I trust. There is also Connotea but I never liked that site that much (e.g. is allows bookmarking any web page); Rich has his comments too. I would also like to mention BioWizard which is based on the PubMed content, which actually covers a good deal of chemistry literature nowadays too.

Local Storage
These above mentioned systems can be used as alternative to offline bibliographic database systems, like EndNote and JabRef. The latter is my favorite, being based on BibTeX which I use for my LaTeX based publications, and is opensource and contains a few patches from yours truly. Jungfreudlich wondered how people organized their PDF archive and I commented how I do it:
  • a directory hierarchy based on journal name and year
  • file names that include last name of the first author and year
  • JabRef for the bibiographic database
  • Strigi for full text search
Jörg and the power of goo replied too.

I have accounts on several online tools now (with some duplication which I don't like), and I have no idea which of the options will stay around. Time will learn. Good news is that the open characters of many of these allow making mashups, and generally integrate tools. For example,
JabRef allows downloading citations from PubMed, and Noel suggested to use Greasemonkey scripts to link to the supplementary information for his articles, instead of using the mechanisms journals have. I can see the advantage of this, as, for example, Wiley takes full copyright of the data in SI material, while Noel's mechanism would keep the data open.

For now, however, I would very much like to see a meta service where I can query rankings and comment for articles using any or all of the above tools.


  1. For the record, I think you misquote Rich who never says that he does not like Connotea. In fact, he says "I think both Connotea and CiteULike are great services".