Saturday, September 05, 2009

NMRShiftDB RDF #2: Some statistics

This morning I had some more fun, and since the statistics view on the NMRShiftDB server is down, I though I could recalculate the statistics myself. Because the current RDF version of the data does not include all information yet, I cannot reproduce all of them. On the other hand, I can determine some other interesting statistics.

Spectra per spectrum type
One of the statistics given in the aforementioned page is the number of spectra per nuclei. This can be recalculated with the following SPARQL:

The results for the 1.3.3 release are:

nucleus count
13C 21958
1H 3031
11B 326
17O 131
15N 79
195Pt 68
19F 50
31P 38
73Ge 18
33S 8
29Si 5
I am a bit surprised by the count for the silicon NMR spectra, as I would have thought I alone had entered more than just five.

Molecules with the most spectra
It turns out that the molecules have in the 1.3.3 NMRShiftDB release at most 7 spectra, as I can calculate with:

That is going to change, as the paper I am digitizing now (doi:10.1021/jo971176v) has carbon and hydrogen NMR spectra for 7 solvents for each compound :) It should be possible to summarize the number of molecules for each number of spectra per molecule, but did not manage to get this SPARQL to work out well.

BTW, did you know you can find reprint PDFs of a paper (if any; this one happens to have a PDF copy) with Google using the title in quotes and filetype:pdf? Try this query. The top hit was molecule 10016314 (RDF), which has 4 13C spectra, one 15N and two proton NMR spectra.

Molecules with the most different nuclei
In the first query, we already save saw in the first SPARQL, there are 11 different nuclei in the database, though carbon and hydrogen are by far the most abundant spectra. I like diversity, so one statistic I find interesting, is the molecules which have spectra with the most different nuclei. This is done with the query:

It shows that molecule 10023801 (RDF) has 5 different NMR types: 13C spectra, one 15N, 29Si spectra, one 17O, and 1H spectra. Unfortunately, the compound also has chlorines, so it disqualifies as molecule for which NMR spectra are available for all its elements.