tag:blogger.com,1999:blog-17889588.post7565414772611065375..comments2024-03-13T07:14:55.283+01:00Comments on chem-bla-ics: New InChI software beta: license issues resolved and InChIKeyEgon Willighagenhttp://www.blogger.com/profile/07470952136305035540noreply@blogger.comBlogger14125tag:blogger.com,1999:blog-17889588.post-52268539563211300642007-09-14T05:15:00.000+02:002007-09-14T05:15:00.000+02:00A number of web services exposing InChI-related ca...A number of web services exposing InChI-related capabilities have been provided this evening at<BR/><BR/>http://www.chemspider.com/inchi.asmx<BR/><BR/>The services include the ability to search for the appropriate ChemSpider ID based on the InChI string and InChIKey. <BR/><BR/>Further comments are available at: <BR/><BR/>http://www.chemspider.com/blog/?p=135ChemSpidermanhttps://www.blogger.com/profile/12619309311131629965noreply@blogger.comtag:blogger.com,1999:blog-17889588.post-35371764213101465812007-09-12T12:53:00.000+02:002007-09-12T12:53:00.000+02:00Sorry I didn't catch the URL4InChI stuff at the ti...Sorry I didn't catch the URL4InChI stuff at the time, Egon, I was away around that time. I would have voiced a preference for making them 'proper' URLs, i.e. including the InChI (or info URI) before the ?<BR/><BR/>It will be interesting to see practical collision rates for InChIKey, but I wonder whether the benefits of InChIKey are really needed. I've blogged about this at http://wwmm.ch.cam.ac.uk/blogs/downing/?p=126 (it's a bit long to include here).Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-17889588.post-23866557446485916482007-09-12T09:29:00.000+02:002007-09-12T09:29:00.000+02:00Jim, I agree with the URL4InChi, and have proposed...Jim, I agree with the URL4InChi, and have proposed in a different blog items to <A HREF="http://chem-bla-ics.blogspot.com/2007/07/rdf-ing-molecular-space.html" REL="nofollow">use rdf.openmolecules.net for resolving InChIs</A>.<BR/><BR/>This service can easily be extended for InChIKey support, and I will do this shortly.<BR/><BR/>BTW, I good estimate of collision properties could be to randomly generate a lot of molecular structures and generate a huge database of InChIKeys. I will try to set something up for that using the CDK next week, when I'll be in Ulm.Egon Willighagenhttps://www.blogger.com/profile/07470952136305035540noreply@blogger.comtag:blogger.com,1999:blog-17889588.post-14076824489891723362007-09-12T09:20:00.000+02:002007-09-12T09:20:00.000+02:00PS... I forgot to add, since there would be a cent...PS... I forgot to add, since there would be a centralized point for assigning InChIURL (or an agreed protocol for dealing with collisions), they would be unique. The problem with InChIKey isn't so much that collisions can happen, it's that you don't know when they'll collide.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-17889588.post-23677809494910716982007-09-12T09:17:00.000+02:002007-09-12T09:17:00.000+02:00Ah, I see. So instead of InChIKey, how about somet...Ah, I see. So instead of InChIKey, how about something like TinyURL for InChIs, where there's a convenient URI for each InChI (short, digest based, non-semantic, conveniently embeddable in text, useful for semantic web etc). <BR/><BR/>There would need to be a lookup service to find the URL for each InChI, and doing GET on the InChIURL would return a very small chunk of CML containing the InChI, the InChI in text or a chunk of neatly marked up XHTML that made it clear where the InChI is.<BR/><BR/>It has the disadvantage that you couldn't algorithmically calculate the InChIURL from the InChI, but it is convenient in text and has the added benefit of being more useful than a string literal for semantic web applications.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-17889588.post-16417886089494272262007-09-11T17:03:00.000+02:002007-09-11T17:03:00.000+02:00Sam, thanx for those details. That is useful. Does...Sam, thanx for those details. That is useful. Does not allow for so many InChI versions, but that it, I guess, not intended anyway. Yes, I would prefer a much obvious layer indication.<BR/><BR/>Jim, it is not so much a problem of decreasing clash probability, as it is a problem of converting the key back to an InChI. I could look up the key in translation tables and find the InChI to which the InChIKey corresponds.<BR/><BR/>Now, my worry was that I could not do this, as I overlooked the version info available from one of the chars. So, if this character is in the range A-H, then I should look at a InChI=1/... table, if I-P, then InChI=2/... etc. That should do for a while.<BR/><BR/>Using InChIKey=1/... would make the correspondence clearer.Egon Willighagenhttps://www.blogger.com/profile/07470952136305035540noreply@blogger.comtag:blogger.com,1999:blog-17889588.post-80727665080858864262007-09-11T16:09:00.000+02:002007-09-11T16:09:00.000+02:00Egon, I understand the problems that arise from ha...Egon, I understand the problems that arise from hash collisions, but from what you're saying InChIKey is basically a digest anyway. Does including the version layer really increase the chance of a collision considerably?Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-17889588.post-75422929145976117652007-09-11T15:17:00.000+02:002007-09-11T15:17:00.000+02:00There is still version information included throug...There is still version information included through the flag character. This indicates which combination of isotopic, fixedH and stereo layers were included in the InChI. For version 1, the flag takes values A-H, for version 2, I-P, and for version 2+ Q-X. The full table of values is in the release notes. However, I'd agree that InChIKey=1/... seems like a better way to go about it.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-17889588.post-47626421081124485872007-09-11T14:15:00.000+02:002007-09-11T14:15:00.000+02:00Jim, about including the version layer prior to MD...Jim, about including the version layer prior to MD5 calculation has this disadvantage:<BR/><BR/>Say way have InChI=1/foo and InChI=2/bar. Say they both create InChIKey=BLA. The key would be identical, and effectively it would be impossible to decide if the key would refer to foo or to bar.Egon Willighagenhttps://www.blogger.com/profile/07470952136305035540noreply@blogger.comtag:blogger.com,1999:blog-17889588.post-66200291236134571772007-09-11T13:53:00.000+02:002007-09-11T13:53:00.000+02:00Could somebody explain to me how the InChIKey is a...Could somebody explain to me how the InChIKey is a better idea than just agreeing a length for an MD5 sum of the whole InChI string? There are standard implementations of MD5 and using the whole string would mean that the version layer was included.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-17889588.post-83601231573506879582007-09-10T16:48:00.000+02:002007-09-10T16:48:00.000+02:00Just set up my Blogger profile and didn't think ab...Just set up my Blogger profile and didn't think about the name showing up :-) So, dah-dah...I'm now ChemSpiderman (I'm on the web a lot)<BR/><BR/>We will be looking for clashes this week. Higher priorities right now I'm afraid. We owe Joerg an update to the single structure deposition system to try out. It's coming...ChemSpidermanhttps://www.blogger.com/profile/12619309311131629965noreply@blogger.comtag:blogger.com,1999:blog-17889588.post-78309558439337502852007-09-09T14:14:00.000+02:002007-09-09T14:14:00.000+02:00Hi Database Guy,(weird name :)Did you find any cla...Hi Database Guy,<BR/><BR/>(weird name :)<BR/><BR/>Did you find any clashes? From the statistics I would expect a few clashes (1.3 in 10 million, right...)?Egon Willighagenhttps://www.blogger.com/profile/07470952136305035540noreply@blogger.comtag:blogger.com,1999:blog-17889588.post-36876521236586510532007-09-09T04:01:00.000+02:002007-09-09T04:01:00.000+02:00There are now over 17.6 million InChI keys posted ...There are now over 17.6 million InChI keys posted to ChemSpider. We generated these yesterday and posted them this morning.<BR/><BR/>As I commented on the blog posting<BR/>http://www.chemspider.com/blog/?p=125<BR/> this does resolve the previous issues in regards to different erectile dysfunction drugs giving bigger SMILES as expected with larger InChIs (http://www.chemspider.com/blog/?p=19). Now the InChI key will be the same length but the size of the SMILE can still vary based on the nature of the chemical structure :-)ChemSpidermanhttps://www.blogger.com/profile/12619309311131629965noreply@blogger.comtag:blogger.com,1999:blog-17889588.post-16886565518377177352007-09-08T14:54:00.000+02:002007-09-08T14:54:00.000+02:00I completely agree! Especially on the versioning a...I completely agree! Especially on the versioning and the uniqueness.<BR/><BR/>JoergAnonymoushttps://www.blogger.com/profile/09112376168632883058noreply@blogger.com