Two and a half months after the CDK milestone, the Blue Obelisk paper also reached 100 citations. Here the lucky paper is Design, Synthesis, and Preclinical Evaluation of New 5,6- (or 6,7-) Disubstituted-2-(fluorophenyl)quinolin-4-one Derivatives as Potent Antitumor Agents by Chou et al (doi:10.1021/jm100780c). The Blue Obelisk paper (doi:10.1021/ci050400b) is cited because the authors used OpenBabel. About half of the 100 citations is because OpenBabel was used, whereas OpenBabel is only mentioned as one of the Blue Obelisk-associated unprojects in the Blue Obelisk paper.

I am not sure how this habit started, but with citation practices, it is unlikely to go away. But who am I to complain....
4

Oscar is a text miner. It mines in text for chemistry. Oscar4 is the next iteration of Oscar code that I worked on in the past three months, with Lezan, Sam, and David.

Today is the last day I work on Oscar in my position in Cambridge (tomorrow I have a day off and fly back to Sweden). Three months go quick indeed. Next Monday I start my position in Stockholm at the IMM department at the Karolinska Institutet on predictive toxicology. Back in Sweden, it is. Well, of course, I worked from home most of the time anyway.

So, today it is time for me to write up a report for the last three months. This blog item is basically a prelude, or procrastination, or so.

Mark's new CCO/RDF hosting functionality (see also my post two days ago) requires RDF/XML format, so I updated my code to convert the Chempedia Substances data into RDF/XML instead of N3 (I have asked Rich to put a new download link online).

What if scientists could host small amounts of CC0 data for free? Something like computation results, e.g. outputted as HTML+RDFa? Without having to worry about setting up triple store, etc? Well, that future might be near. The above screenshot shows a first go. Not by me, but in response to a feature request by me. So, the question right now is, what would be like to see on the summary page.
3

One of my first encounters with open source cheminformatics was the XYZ file viewer applet by Sun. I extended it back then with minimal PDB support for our Woordenboek Organische Chemie website (started in 1995, now extinct). This applet dates back to at least 1997, as shown by the screenshot.
1

Oscar uses a Maximum Entropy Markov Model (MEMM) based on n-grams. Peter Corbett has written this up (doi:10.1186/1471-2105-9-S11-S4). So, it basically is statistics once more. If you really want a proper bioinformatics education, so do your PhD at a (proteo)chemometrics department.

N-grams are word parts of n characters. For example, the trigrams of acetic acid include ace, cid, tic, eti, and aci. N-grams of length four include acid, etic, and acet.

The two earlier posts in this series showed screenshots of results of Oscar, but the title also promised results by Lezan's ChemicalTagger. Sam helped with getting the HTML pages online via the Cambridge Hudson installation. Where Oscar find named entities (chemical compounds, processes, etc), ChemicalTagger finds roles, like solvent, acid, base, catalyst. Roles are properties of chemical compounds in certain situations. Ethanol is not always a solvent, sometimes it is a Xmas present.

OK, the second paper I ran into today is a perfect match for the paper by Khanna and Ranganathan I just dicussed in the Commercial or Proprietary? post. So perfect, in fact, that it I should have really combined them. But since the other post is already infecting the WWW, I'll have to post this update.
2

Khanna and Ranganathan wrote up a review paper on molecular similarity (doi:10.1002/ddr.20404). I have not fully read it yet, but my eye fell on Table 1, which lists a number of programs that can be used to calculate QSAR descriptors, both open source and proprietary. However, the table features a column Availability which has two options: Public, Commercial. They qualify Bioclipse, CDK, and RDKit as public, and Dragon, MOE, CODESSA and others are commercial.
15
Text
Text
This blog deals with chemblaics in the broader sense. Chemblaics (pronounced chem-bla-ics) is the science that uses computers to solve problems in chemistry, biochemistry and related fields. The big difference between chemblaics and areas such as chem(o)?informatics, chemometrics, computational chemistry, etc, is that chemblaics only uses open source software, open data, and open standards, making experimental results reproducible and validatable. And this is a big difference!
Popular Posts
Popular Posts
Blog Archive
Blog Archive
Labels
Labels
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.