## Friday, March 31, 2006

### InChI's in LaTex and CDK News

An InChI (or the FAQ) is a line notation for a molecular structure that was recently developed by the NIST and the IUPAC. Principally they can be applied to protein too (see below), but because proteins would give lenghty InChI's and are quite well defined in terms of connectivity anyway, those can better be described by their amino acid sequence.

The March 2006 issue of CDK News, the Chemistry Development Kit project newsletter, will be released later today, and had, for the second time, the requirment that authors provide InChI's for molecular structures mentioned in the articles. Different from the previous issue is how InChI's are marked up in LaTeX. I've setup a \inchi{} for this that automatically creates a Google search query as link behind the InChI:
\newcommand{  \inchi}[1]{\href{http://www.google.com/search?q=#1}                  {\normalfont\texttt{InChI=#1}            }}

Now, googling for InChI's only works if one removes the InChI= part of the InChI. As an example I will show how it works for methane. The InChI for this compound is InChI=1/CH4/h1H4, so in LaTex one enters \inchi{1/CH4/h1H4}. This will create a link like: InChI=1/CH4/h1H4.

BTW, if you are interested in InChI's for proteins, here is the InChI for 1CRN, created with OpenBabel:
InChI=1/C202H439N55O64S6/c1-28-92(12)149-188(308)237-127-84-323-324-
85-128(176(296)225-114(46-37-63-212-202(209)210)165(285)232-122(69-89(6)7)195(315)253-64-38-
47-132(253)179(299)215-80-143(274)241-158(107(27)265)199(319)257-68-42-51-136(257)182(302)226-
115(60-61-144(275)276)164(284)218-100(20)162(282)244-149)236-187(307)148(91(10)11)242-172(292)
120(74-138(204)269)229-168(288)117(70-108-43-34-33-35-44-108)228-169(289)119(73-137(203)268)
230-173(293)124(81-258)234-166(286)113(45-36-62-211-201(207)208)224-159(279)99(19)221-186(306)
147(90(8)9)243-189(309)150(93(13)29-2)245-174(294)125(82-259)235-183(303)135-50-41-66-255(135)
196(316)130-87-326-322-83-126(223-142(273)79-216-185(305)154(103(23)261)251-171(291)118(72-
110-54-58-112(267)59-55-110)231-192(312)155(104(24)262)250-163(283)101(21)220-175(127)295)178
(298)246-151(94(14)30-3)190(310)247-152(95(15)31-4)191(311)248-153(96(16)32-5)198(318)256-67-
40-49-134(256)181(301)213-77-140(271)217-97(17)161(281)249-156(105(25)263)194(314)240-131
(88-327-325-86-129(177(297)239-130)238-193(313)157(106(26)264)252-184(304)146(206)102(22)260)197
(317)254-65-39-48-133(254)180(300)214-78-141(272)222-121(76-145(277)278)170(290)227-116(71-
109-52-56-111(266)57-53-109)167(287)219-98(18)160(280)233-123(200(320)321)75-139(205)270/h89-
202,211-252,258-321H,28-88,203-210H2,1-27H3/t92-,93-,94-,95-,96-,97-,98-,99-,100-,101-,102+,
103+,104+,105+,106+,107+,109-,110-,111+,112+,113-,114-,115-,116-,117-,118-,119-,120-,121-,122-,
123-,124-,125-,126-,127-,128-,129-,130-,131-,132-,133-,134-,135-,136-,137?,138-,139-,140-,141+,
142-,143+,146-,147-,148-,149-,150-,151-,152-,153-,154-,155-,156-,157-,158-,159+,160?,161-,162?,
163-,164-,165?,166+,167?,168+,169+,170+,171-,172+,173+,174+,175?,176-,177?,178+,179+,180-,
181?,182-,183+,184?,185+,186+,187-,188-,189-,190+,191?,192-,193?,194-,195-,196-,197-,198-,199-/m0/s1

## Saturday, March 25, 2006

### The Cologne University BioInformatics Center (CUBIC)

As of April 3, I will be working as postdoc in the group of Christoph Steinbeck at the Cologne University BioInformatics Center, or simply CUBIC, for a year. Though no exact plans have been decided upon, the work will include CDK, CML, ontologies, Bioclipse, semantic web technologies, Jmol, and other interesting things. Research areas will at least include QSAR, but I hope to touch bits of bioinformatics too.

## Saturday, March 18, 2006

### How to make money from Open Source scientific software

Dan (the original Jmol author) has an interesting blog series: How to make money from Open Source scientific software I, II and III. Three more blog items are in the planning. The deal with how to make money from open source scientific software. He wants to be able to skeptically review the software in his field, hence open source. But open source software development, at least in chemistry, needs funding, because there are too few people working on such software on a voluntary basis.

The articles discuss possible scenarios. Article I discusses 'Sell hardware' that comes with open source software, and article II discusses the 'Sell services' scenario, which still works in the GNU/Linux OS world. He argues that selling support does not fit the chem-bla-ics world: "First, scientific software targets a relatively small group of users, and at the same time, the development and support costs are often quite large." and "Why would a researcher spend \$10000 on a support contract if the problem could be solved by throwing a graduate student at the open source version of the code for a few months?" Interesting arguments indeed.

Instead, he suggests, the service sold should be knowledge. The open source based company should sell knowledge, should solve customer problems using open source software. Each problem will come with specific needs, allowing indirect funding of open source development. And, yes, this is indeed how open source chemo-/bioinformatics software is currently development: as a mean to solve scientific challenging problems.

I'm looking forward to his next articles in this series.

## Thursday, March 16, 2006

### The PDB protein database uses Jmol

The beta has been using Jmol as one of the viewers for ages already, but this beta is no longer: it's the new interface for the PDB database.

## Sunday, March 12, 2006

### Open source in drug discovery

Geldenhuys et al. published an article in Drug Discovery Today titled Optimizing the use of open-source software applications in drug discovery (DOI: 10.1016/S1359-6446(05)03692-5), and approached the review from a bench chemist point of view. Unfortunately, he discusses free, but closed source, program in one go.

He discusses the advantages and problems with opensource, and mentions the often lacking user-friendly GUI (true), and the the lack of literature to validate the program. It was unclear to me wether the last argument applied to the free tools, or to the open source programs; I thought the open-source projects like the CDK, JOELib, Jmol and PyMol were quite strong in this area, at least compared to the commercial software I have seen.

## Saturday, March 11, 2006

### Classpath 0.90 makes the Jmol application run

A few days back, Classpath 0.90 was released, the first release after the 0.20 release. Earlier Classpath releases could run the rendering engine, but running the application failed so far.

Today it hit Debian unstable, so upgrade my sid32 chroot and had Cacao run Jmol. I had some memory issues opening a small molecule [1], and the rendering speed was a factor 100 or so slower than Sun's JVM, but it runs!

Using the command cacao -Xmx512M -jar Jmol.jar triplebond.mol I got:

Note the exceptions copied to the console. I'll paste the full stack trace as a comment.
Many thanx to the Classpath team!

1.InChI=1/C6H10/c1-4-5-6(2)3/h6H,1-3H3

### More chemistry in KDE

After Kalzium and kfile_chemical, KDE has now be extended with kparts for 3D structure and spectrum display: Kryomol. It is written in C++ and licensed GPL. It supports several chemistry formats, among which quantum chemical formats like Gaussian03, NwChem and ACES, and 3D structures as MDL molefile and XYZ.

## Monday, March 06, 2006

### Progress with CMLRSS plugin for Bioclipse

With quite some help from Ola (thanx!), I made good progress with the CMLRSS plugin. The current result looks like:

A problem in the transition from Jumbo 5.0 to 5.1 is causing a problem so that it does not show a 3D model or 2D diagram, but that will follow soon.