Thursday, December 23, 2010

Status update on BJOC analysis with Oscar and ChemicalTagger #3

The two earlier posts in this series showed screenshots of results of Oscar, but the title also promised results by Lezan's ChemicalTagger. Sam helped with getting the HTML pages online via the Cambridge Hudson installation. Where Oscar find named entities (chemical compounds, processes, etc), ChemicalTagger finds roles, like solvent, acid, base, catalyst. Roles are properties of chemical compounds in certain situations. Ethanol is not always a solvent, sometimes it is a Xmas present. The current output is not entirely where I want to go yet, but makes it easy which solvents are frequently found in the BJOC corpus:

This screenshot of an analysis of 15 BJOC papers shows that AcOEt (is that the same as EtOAc?) is mentioned as solvent three times in PMC1399459. Brine, however, is mentioned as solvent in three papers.

As said, these two pages contain RDF and the tables are sortable. Hudson recompiles them automatically when I update the source code to create the HTML+RDFa. So, go ahead, send me bug reports, feature requests, and patches!