Now, addressing the limitations of the current citation databases is technically simple, and purely blocked by social and commercial aspects. The Citation Typing Ontology by David Shotton defines the framework to define citation types, independent from any existing database. The semantic web technologies will take it from there, and allow aggregation etc.
There are some things to think about on how to use such citation networks, though. If we calculate the impact of the CDK project, we should combine citation counts to the website(s), papers, etc, after removal of duplicates, etc. The cito:cites does link to resources, and the CDK paper resources is not the same as the CDK website resource. But, we could define a Project Class, where both are foo:partOf. Then, we could define that the triple chain the:citingWork cito:cites the:CDKArticle foo:partOf the:CDKProject would imply the triple the:citingWork cito:cites the:CDKProject.
Typed Citations
Now, while writing up this blog, I realize that my fork of this morning, A BIBO Citation Typing Ontology, might actually be counter-productive in the long run, as I was only working out a solution to a simpler, but different problem, which the CiTO also addresses: a citation is not typed. When a paper does cite the CDK paper, we still do not know if it uses the CDK, or merely mentioned it as related-but-unused, or even refuted work.
Now, as I am leaning towards the Biobliography Ontology as RDF-based system for my references, and been using this already in the RDF store hosting the ChEMBL data, I forked the CiTO to define rdfs:domain and rdfs:range on bibo:Document. The CiTO 1.5 actually defines a large set of document types too, and I rather see BIBO reused.
This indeed has the downside that the bibocto:cites cannot be used for the above chaining, and this might bite me seriously later. Well, nothing wrong with a failing experiment, right? For now, it will serve my purpose: setting up a citation database for the CDK project papers.
The CDK citation database
So, here goes (it's RDFa-enabled; check this RDF pulled out):
@prefix bibo: <http://purl.org/ontology/bibo/>.
@prefix bibocto: <http://github.com/egonw/bibo-cto/>.
<urn:doi:10.1186/1758-2946-2-1> a bibo:Article ;
bibocto:cites <urn:doi:10.1021/ci025584y> .
I am not entirely happy about the error-prone XHTML+RDFa of the above example, and filed a question of better solution on SemanticOverflow.While the above example merely defines the citation of Peter Ertl's article to the CDK (whether that is valid or not... would he have cited the other paper perhaps?), the citation typing allows me to state how the CDK paper is cited. Now, Peter states:
- It is also gratifying to see the advent of open source movement in cheminformatics on the Internet, as advocated for example by the Blue Obelisk Group (40) and witnessed by collaborative projects like Chemistry Development Kit CDK (41), Jmol (42), Bioclipse (43) and several others.
<urn:doi:10.1186/1758-2946-2-1> bibocto:credits <urn:doi:10.1021/ci025584y> .
which is very much appreciated!