![]() |
Source: Wikipedia. CC-BY-SA |
April this year I blogged about an important SPARQL query for many chemists: getting CAS registry numbers from Wikidata. This is relevant for two reasons:
- CAS works together with Wikimedia on a large, free CAS-to-structure database
- Wikidata is CCZero
Since the post in April, Wikidata put online a new SPARQL end point and created "direct" property links. This way, you loose the provenance information, but the query becomes simpler:
-
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
SELECT ?compound ?id WHERE {
?compound wdt:P231 ?id .
}
- CAS registry number (P231): 19420
- PubChem ID (CID) (P662): 16616
- InChI (P234): 14312
- ChemSpider ID (P661): 11566
- ChEBI ID (P683): 4313
- KEGG ID (P665): 3983
- Drugbank ID (P715): 2518
- KNApSAcK ID (P2064): 9
- HMDB ID (P2057): 6
- ZINC ID (P2084): 4
- LIPID MAPS ID (P2063): 3
- Leadscope ID (P2083): 3
Because there is also a predicate for SMILES, we can also create a query that puts the CAS registry number alongside to the SMILES (or any other identifier):
- PREFIX wdt: <http://www.wikidata.org/prop/direct/>
SELECT ?compound ?id ?smiles WHERE {
?compound wdt:P231 ?id ;
wdt:P233 ?smiles .
}
refreshed! @pubchem now contains 25,758,525 purchasable molecules from ZINC15 #docking #chemoinformatics
— John Irwin Chemistry (@chem4biology) December 22, 2015
PubChem compound 100 million comes from ZINC! https://t.co/lCoYZ7P34e -> https://t.co/vJ1qX5cUNB #zinc15 #win #watchoutcas #woot
— John Irwin Chemistry (@chem4biology) December 17, 2015

No comments:
Post a Comment