tag:blogger.com,1999:blog-17889588.post5619277836551674381..comments2024-03-13T07:14:55.283+01:00Comments on chem-bla-ics: Downloading Domoic Acid from PubChemEgon Willighagenhttp://www.blogger.com/profile/07470952136305035540noreply@blogger.comBlogger5125tag:blogger.com,1999:blog-17889588.post-3555256736018961842009-04-21T13:23:00.000+02:002009-04-21T13:23:00.000+02:00Some information regarding name lookups in PubChem...Some information regarding name lookups in PubChem: The presented names are not in random order. Rather, a reliability score is computed, and the name identified as probably the most trustworthy is listed first. So using the first name is a usually reasonable choice (but of course not foolproof, this is not hand-curated data)W. D. Ihlenfeldthttp://www.xemistry.comnoreply@blogger.comtag:blogger.com,1999:blog-17889588.post-68172102964683111702009-04-21T13:20:00.000+02:002009-04-21T13:20:00.000+02:00IMHO this example demonstrates serious problems wi...IMHO this example demonstrates serious problems with the CDK methodology. In order to set this up, you need precise and specific knowledge about:<br /><br />a) 3 import packages<br />b) 1 specific reader object and its methods<br />c) 1 molecule object and its attributes<br />d) 1 download URL (and it reads the XML data which is slow and not always kosher, ASN.1 data is the gold standard)<br /><br />Compare this with the much shorter, more robust and equivalent Cactvs script:<br /><br />echo "Atom count: [ens get [ens create 5282253] E_NATOMS]"<br /><br />Or:<br /><br />echo "Atom count: [ens get [ens create {domoic acid}] E_NATOMS]"W. D. Ihlenfeldthttp://www.xemistry.comnoreply@blogger.comtag:blogger.com,1999:blog-17889588.post-54921692295719135052009-04-20T13:44:00.000+02:002009-04-20T13:44:00.000+02:00Thats indeed an unsatisfying effect of large depos...Thats indeed an unsatisfying effect of large deposition databases and I would love to see any suggestion how to perform text searches via interfaces. Whenever it comes to searching for names in pubchem or chemspider you will probably get more then one hit, which needs a human to decide which one you wanted and which not - or you evaluate a second parameter of your entity afterwards.Oliver Koeplerhttps://www.blogger.com/profile/05647630778156154419noreply@blogger.comtag:blogger.com,1999:blog-17889588.post-3593448326971661542009-04-18T14:48:00.000+02:002009-04-18T14:48:00.000+02:00Bioclipse has a free text search option, which ret...Bioclipse has a free text search option, which returns the first 15 hits for the text you search on. Regarding what the real true one is... that depends on curation indeed. I will try to write up similar material for interaction with ChemSpider. Something like a Bioclipse ChemSpider plugin will be easier to do when a good programming API comes online, but something like what is shown in this blog should not be hard. Still, I would be much interested in drawing a substructure in Bioclipse and searching in ChemSpider using that.Egon Willighagenhttps://www.blogger.com/profile/07470952136305035540noreply@blogger.comtag:blogger.com,1999:blog-17889588.post-87649292196537289202009-04-18T14:37:00.000+02:002009-04-18T14:37:00.000+02:00Egon, In order to download the correct structure o...Egon, In order to download the correct structure of Domoic Acid you have to know the exact record on PubChem? A text-based search of Domoic Acid gives 5 structures on PubChem so without knowing the exact record you'd be stuck. How would you figure out what the appropriate record is on PubChem in general?ChemSpidermanhttps://www.blogger.com/profile/12619309311131629965noreply@blogger.com