Pages

Thursday, December 30, 2010

The 100th Blue Obelisk 2006 paper citation

Two and a half months after the CDK milestone, the Blue Obelisk paper also reached 100 citations. Here the lucky paper is Design, Synthesis, and Preclinical Evaluation of New 5,6- (or 6,7-) Disubstituted-2-(fluorophenyl)quinolin-4-one Derivatives as Potent Antitumor Agents by Chou et al (doi:10.1021/jm100780c). The Blue Obelisk paper (doi:10.1021/ci050400b) is cited because the authors used OpenBabel. About half of the 100 citations is because OpenBabel was used, whereas OpenBabel is only mentioned as one of the Blue Obelisk-associated unprojects in the Blue Obelisk paper.

I am not sure how this habit started, but with citation practices, it is unlikely to go away. But who am I to complain....

Text mining chemistry from Dutch or Swedish texts

Oscar is a text miner. It mines in text for chemistry. Oscar4 is the next iteration of Oscar code that I worked on in the past three months, with Lezan, Sam, and David. I blogged about aspects of Oscar4 at several occasions:


These posts will server is a some initial critical mass for a draft report I plan to finish today. I might have to blog some further posts with diagrams, here and there. This post is actually one of them, and discusses something where Oscar can be expected to go next, now that the design is cleaned up (though this effort is not halted now) and it has become possible again to extend it. The over 250 unit tests make this a lot easier too.

One aspect where I expect Oscar to go in 2011 is the support for other languages. To a very large extend this is based on multi-language support in the dictionaries, as well as having training data in a particular language. This also provides some context to my earlier post about the need for a Oscar training data repository.

This extension opens a number of options: analysis of patent literature in other languages, monitoring of press releases in other languages, and news items in local news papers, etc. For example, it could analyse this C2W news item on yeast cells:


There are many use cases for such localized text mining. And it surely matters for determining the impact of research.

Oscar has various places where language specifics are found. For example, in tokenization of a text. One step here is the detection of sentence ends. This is done in most western languages with a period, exclamation mark, question mark, etc. But periods (dots) are also used in abbreviations. Similarly, colons can be used in chemical names. But the every language comes in with different abbreviations that need to be recognized.

Currently, some abbreviations are found in NonSentenceEndings. In the past three months, we have been cleaning up the code, and restructured the source code, making it easier to detect such places. This class will likely undergo further refactoring, to making the list of such non-sentence-endings configurable via files or so. What I expect to see, is that we you initiate Oscar like this:
Oscar oscar = new Oscar(Locale.US);

This might actually even make a nice student summer project. The biggest challenge will be in making a good corpus of training data, like the SciBorg training data that was used for training Oscar3.

But the whole normalization is tainted with English language specifics too. For example, the normalizer will have to 'normalize' the question marks, for which there exist several unicode variations. But the normalized variant is language dependent. For example, greek and armenian have different characters (see this page), and then we have not even started talking about the right to left.

Besides localized dictionaries, this Oscar will also benefit from a localized OPSIN. It seem to recognize the Dutch propaan, but not benzeen. I am not going to look at that soon, but if you are interested, I recommend checking out Rich' posts about forking OPSIN and writing patches.

Getting Oscar going for other languages is a challenge, but also offers new opportunities. Just email the oscar mailing list if you are interested and need help.

Blogging as part of your workflow

Today is the last day I work on Oscar in my position in Cambridge (tomorrow I have a day off and fly back to Sweden). Three months go quick indeed. Next Monday I start my position in Stockholm at the IMM department at the Karolinska Institutet on predictive toxicology. Back in Sweden, it is. Well, of course, I worked from home most of the time anyway.

So, today it is time for me to write up a report for the last three months. This blog item is basically a prelude, or procrastination, or so. People sometimes ask me how I find time to blog so much. The trick is just make blogging part of your workflow. So, small scripts I use in finishing another task form a blog post (e.g. Converting JSON to RDF/XML with Groovy). I started blogging (in 2005) to actually optimize my workflow. I was sending the same message ("have you seen this interesting webpage") to several mailing lists, often tuning it a bit to the audience. Now, by just posting it in a blog, I removed the need for tuning, and as a bonus, would reach a much larger audience too. Actually, with more then 300 unique visitors a week, I cannot complain.


Neither can I complain about the amount of discussion it triggers. It's like having my own private symposium:


Anyways, time to go back to blogging about Oscar...

Wednesday, December 29, 2010

Converting JSON to RDF/XML with Groovy

Mark's new CCO/RDF hosting functionality (see also my post two days ago) requires RDF/XML format, so I updated my code to convert the Chempedia Substances data into RDF/XML instead of N3 (I have asked Rich to put a new download link online). This is the Groovy code I used:

import groovy.xml.MarkupBuilder
import groovy.util.IndentPrinter

input = new File("substances.json")
json = new JsonSlurper().parse(input);

def writer = new StringWriter()
def xml = new MarkupBuilder(
  new IndentPrinter(new PrintWriter(writer))
)
xml.'rdf:RDF'(
  'xmlns:rdf':
    'http://www.w3.org/1999/02/22-rdf-syntax-ns#',
  'xmlns:dc' :
    'http://purl.org/dc/elements/1.1/',
  'xmlns:iupac' :
    'http://www.iupac.org/',
  'xmlns:cp' :
    'http://rdf.openmolecules.net/chempedia/onto#',
  'xmlns:owl' :
    'http://www.w3.org/2002/07/owl#'
) {
  json.each { substance ->
    xml.'rdf:Description'(
      'rdf:about': substance.uri
    ) {
      xml.'dc:identifier'(substance.gsid)
      xml.'owl:sameAs'(
        'rdf:resource' :
        'http://rdf.openmolecules.net/?' +
        substance.inchi
      )
      xml.'iupac:inchi'(
        'http://rdf.openmolecules.net/?' +
        substance.inchi
      )
      for (int i = 0; i<substance.namings.size(); i++)
      {
        naming = substance.namings.get(i);
        namingURI = substance.uri + "/naming" + i;
        xml.'cp:hasNaming' {
          xml.'rdf:Description' {
            xml.'cp:hasName'(naming.name)
            xml.'cp:hasStatus'(naming.status)
            xml.'cp:hasScore'(naming.score)
          }
        }
      }
    }
  }
}
println writer.toString();

Monday, December 27, 2010

What should a free CC0 RDF hosting for scientists look like?

What if scientists could host small amounts of CC0 data for free? Something like computation results, e.g. outputted as HTML+RDFa? Without having to worry about setting up triple store, etc? Well, that future might be near. The above screenshot shows a first go. Not by me, but in response to a feature request by me. So, the question right now is, what would be like to see on the summary page. Some things I can think of are:

  • clear statement of CC0 waiver, possible the opendata icon
  • basic stats: # of triples, # unique predicates, etc
  • a Wordle of all rdfs:labels in the data set
  • the upload page should probably a separate tab
That would be a good start. What else should be there?

Posted via email from Egon's posterous

My first encounter with Open Source cheminformatics


One of my first encounters with open source cheminformatics was the XYZ file viewer applet by Sun. I extended it back then with minimal PDB support for our Woordenboek Organische Chemie website (started in 1995, now extinct). This applet dates back to at least 1997, as shown by the screenshot.

Posted via email from Egon's posterous

Sunday, December 26, 2010

Oscar: training data, models, etc

Oscar uses a Maximum Entropy Markov Model (MEMM) based on n-grams. Peter Corbett has written this up (doi:10.1186/1471-2105-9-S11-S4). So, it basically is statistics once more. If you really want a proper bioinformatics education, so do your PhD at a (proteo)chemometrics department.

N-grams are word parts of n characters. For example, the trigrams of acetic acid include ace, cid, tic, eti, and aci. N-grams of length four include acid, etic, and acet. The MEMM assigns weights to these n-grams, and based on that decided if something is in deed a named entity (in Oscar terminology). For example, consider the acet n-gram: acetone should be matched, but facet not.

Put this in perspective in the ongoing refactoring of the Oscar software. We are changing normalization (e.g. converting all unicode hyphen alternatives into one specific hyphen), updating the tokenizer (e.g. changing the list of non-sentence-endings like Prof.). It is clear this changes the n-grams typical for chemical-like things. Worse, the weights are tuned towards to know n-grams, and statistical models are generally a bit overtrained for the data, or, at least, specific for it.

Now, if the distribution of n-grams changes, the weights in the model need to be updated too, to not degrade the model performance. So, Oscar is useless if we cannot retrain its MEMM component after a refactoring. If that would be impossible, we would have effectively created an intellectual monopoly.

Thus, what the Oscar project needs, is one or more free sets of annotated literature, which can be used to train new MEMM models. The SciBorg corpus was used to train the current Oscar3 and Oscar4 models. This data (copyright RSC) will very likely be available under a Creative Commons license (RSC++), but may have the NC clause, which would not be good for developing a business model around the opensource Oscar (such as providing a high-performance web service via a subscription service). I have recently written up the problems the NC clause introduces, and some examples of commercial Open Source cheminformatics projects.

We need not focus only on this SciBorg data, however. In fact, we will need multiple models anyway. For example, the SciBorg papers (42 if not mistaken) are around a particular kind of literature. So, it introduces the risk of using it to analyse papers out of the application domain. Furthermore, I am very interested (and others indicated so too) to use Oscar for other languages. Surely, English is the major language, but there are many use cases for Oscar when useful for other languages.

Therefore, for what we need in the Oscar project, is a registry of training (/test) data, annotated itself with metadata around how that data was created (what quality assurance, what kind of named entity types, how many domain experts were involved, etc), test results for those data sets, etc. My time on the Oscar project is almost over, and I have no clue when I will be able to invest the same amount of time into the project as I did in the past three months. But the creation of this registry is clear step that must be taken in the Oscar4 development.

Corbett, P., & Copestake, A. (2008). Cascaded classifiers for confidence-based chemical named entity recognition BMC Bioinformatics, 9 (Suppl 11) DOI: 10.1186/1471-2105-9-S11-S4

Thursday, December 23, 2010

Status update on BJOC analysis with Oscar and ChemicalTagger #3

The two earlier posts in this series showed screenshots of results of Oscar, but the title also promised results by Lezan's ChemicalTagger. Sam helped with getting the HTML pages online via the Cambridge Hudson installation. Where Oscar find named entities (chemical compounds, processes, etc), ChemicalTagger finds roles, like solvent, acid, base, catalyst. Roles are properties of chemical compounds in certain situations. Ethanol is not always a solvent, sometimes it is a Xmas present. The current output is not entirely where I want to go yet, but makes it easy which solvents are frequently found in the BJOC corpus:


This screenshot of an analysis of 15 BJOC papers shows that AcOEt (is that the same as EtOAc?) is mentioned as solvent three times in PMC1399459. Brine, however, is mentioned as solvent in three papers.

As said, these two pages contain RDF and the tables are sortable. Hudson recompiles them automatically when I update the source code to create the HTML+RDFa. So, go ahead, send me bug reports, feature requests, and patches!

Tuesday, December 21, 2010

re: Commercial or Proprietary?

OK, the second paper I ran into today is a perfect match for the paper by Khanna and Ranganathan I just dicussed in the Commercial or Proprietary? post. So perfect, in fact, that it I should have really combined them. But since the other post is already infecting the WWW, I'll have to post this update.

Yap wrote up a paper on PaDEL-descriptor: An open source software to calculate molecular descriptors and fingerprints (doi:10.1002/jcc.21707), and Table 2 is quite like Table 1 in the paper by Khanna and Ranganathan. Not only does Yap correctly differentiates between product cost and license, it also details the descriptor type count and descriptor value count. It is a good exercise to compare those two tables yourself.

Yap, C. (2010). PaDEL-descriptor: An open source software to calculate molecular descriptors and fingerprints Journal of Computational Chemistry DOI: 10.1002/jcc.21707

Commercial or Proprietary?

Khanna and Ranganathan wrote up a review paper on molecular similarity (doi:10.1002/ddr.20404). I have not fully read it yet, but my eye fell on Table 1, which lists a number of programs that can be used to calculate QSAR descriptors, both open source and proprietary. However, the table features a column Availability which has two options: Public, Commercial. They qualify Bioclipse, CDK, and RDKit as public, and Dragon, MOE, CODESSA and others are commercial. Effectively, it seems to suggest that they classify them as open source versus commercial, though I am not entirely sure what they mean with public.

The authors and referees would not be the first to make this common mistake: not express the difference between two orthogonal axis: free-versus-commercial and open source-versus-proprietary. To clarify these axes, I created this diagram (CC-BY, SVG source available upon request):


It is very important to realize the Open Source software can be commercial. For example, you can get commercial support for Bioclipse and CDK with GenettaSoft. It is also really important to realize that free software (public?) does not mean it is Open Source (or visa versa). E-Dragon is an example here: you can freely use, but the source code is proprietary. Some years after open source cheminformatics took off, commercial providers started to provide free-for-academic-use packages, which fits into this category too.

Readers of my blog know that I advocate Open Source, not gratis software (see also Re: Why I and you should avoid NC licences), even though you can download many of the Open Source cheminformatics tools I worked on for free. Here it is important to realize that the CDK and Bioclipse are not free: it is just that the tax-payer covered the cost via academic institutes mostly, as well as hobbyists working out-of-office, like I have done for many, many years, and companies who saw mutual benefit. Maybe something to consider the next time you are wondering about donating money to an Open Source cheminformatics project, and pay some respect to the project contributors of the software you use.

Off-topic: there is a second inaccuracy in this table. For each software, they list the number of descriptors, but without units. Units, units?? Yes. For example, for the CDK they list ">40" descriptors, while for Dragon "3,224" (it puzzles me why you can count accurately above 3000, but not below 50. But the point here is that the CDK count is really the number of Java classes, reflecting descriptor algorithms. One algorithm can calculate more than one descriptor value, and those are counted for Dragon. The columns is comparing apples with oranges. While I have never really counted it, and you every CDK user can in fact tune it, the number of calculated CDK descriptor values approaches a thousand. Well, I guess that is ">40" too :(

Khanna, V., & Ranganathan, S. (2010). Molecular similarity and diversity approaches in chemoinformatics Drug Development Research DOI: 10.1002/ddr.20404

Sunday, December 19, 2010

Re: Why I and you should avoid NC licences

Peter blogged about an issue that recently came up: the role of the Non-Commercial clause in Creative Commons licenses. This clause goes like:
    NonCommercial
    NonCommercial nc
    You let others copy, distribute, display, perform, and (unless you have chosen NoDerivatives) modify and use your work for any purpose other than commercially unless they get your permission first.
Originally, I have been tempted to use this clause too, and maybe I forgot to remove it here and there, but I agree with Peter that it is better to not use this clause. Peter outlines several of things involved. One important thing is indeed that 'commercial' use is not well-defined.

Moreover, one of the arguments outlines "[the NC clauses] are unlikely to increase the potential profit from your work". I think this is true (I could have said, 'I believe this is true', but that would only introduce unqualified trust): the material already being free to some extend, the material will itself is not-profitable, but the services around would be. But, by making it impossible to allow other to set up 'services' around the material (education at a high-fee university, books, software you sell on CD, a webservice for which you require fees for high-volume use, ... remember, 'commercial' is not well-defined), you also make it impossible to build a community around the material; and as such, you reduce the value of the material, also for yourself. FaceBook would not be what it is right now, without community building (neither would the CDK).

These issues are acknowledged by Creative Commons themselves too, and they wrote up an interesting report last year about how commercial use is understood. The bottom line is that no one really knows. I guess. I have not fully read the report yet, but anticipate it is a must read. Here is a bit of the executive summary as a teaser:

    The most notable differences among subgroups within each sample of creators and users are between creators who make money from their works, and those who do not, and between users who make money from their uses of others’ works, and those who do not. In both cases, those who make money generally rate the uses studied less commercial than those who do not make money. The one exception is, again, with respect to personal or private uses by individuals: users who make money consider these uses more commercial than those who do not make money.

Tuesday, December 14, 2010

Open Research Computation, ORC, #openrescomp

Yesterday, a new journal launched, but your regular new journal. This journal is about scientific software. Tested software. Documented software. Open Source software. I am on the editorial board, so I am biased here. But, this journal is special. This new journal is called Open Research Computation (ORC?). Several others blogged about it too.

My talk at Imperial College

Yesterday I was guest with Judy at the Department of Surgery & Cancer at Imperial College, where I presented the what I am working on. As I met up in a pub with Ron (my former PhD supervisor, who now works on metabolomics at the Istituto Agrario San Michele all'Adige in Italy) the night before, the presentation does share many slides from previous talks, but also adds a few new ones.


It was great talking to Judy about her PhD research on NMR in metabolomics, and Tim about further work in this research. I was interesting to learn that here too they have problems in metabolite identification quick like those when analyzing *C/MS data, and was really happy to hear he is in contact with Christoph about an Open repository for metabolomics data.

Another 'new' slide was one advertising the yesterday launched Open Research Computation which I'll blog about in more detail later today.

Saturday, December 11, 2010

Status update on BJOC analysis with Oscar and ChemicalTagger #2


A quick update on the post of this morning. The above screenshot shows the progress of the reporting of text mining results using Oscar on the BJOC literature. I think I am almost ready to analyze the full corpus, with a blacklist put in place for large papers, What you see is the same kind of JQuery-enabled sortable list in the HTML view, and a SPARQL query in RDFaDev, to list all papers that mention DHMO (in the first 10 of all 350 BJOC papers) by its InChI.

Importantly, IMHO, it is using the CHEMINF ontology.

Supramolecular chemistry

Some smart software developer once said to not optimize your code too early. However, not caring about it at all does not help either. Some basic knowledge of memory management can keep you going. That is, I just ran into the limits of Oscar and ChemicalTagger. As I blogged earlier today, I am analyzing the BJOC literature, but Lezan and I are running into a reproducible out-of-memory exception. At first I thought it was a memory leak, as it was the 95th paper if fell over on, but after we optimized our code a bit, by reusing classes, the problem remained and turned out to be not in recreating objects (though the code is significantly faster now), but in a single BJOC paper being too large.

The particular paper is not even ridiculously large, though it has an amazing 800 references! The paper, Molecular recognition of organic ammonium ions in solution using synthetic receptors (doi:10.3762/bjoc.6.32), is in fact an interesting review paper on supramolecular chemistry. The molecules I worked on (see one below) in my own supramolecular chemistry time (doing a M.Sc. minor (6 month practical) with Peter Buijnsters in organic chemistry in the group of Prof. Nolte), are actually of the type they review, though surfactants are not really covered in it particularly.
Yeah, supramolecular chemistry has this nice level complexity; it is so supramolecular, that it is currently outside the scope of the molecular analysis of Oscar and ChemicalTagger ;)

ResearchBlogging.orgSpäth, A., & König, B. (2010). Molecular recognition of organic ammonium-ions in solution using synthetic receptors Beilstein Journal of Organic Chemistry, 6 DOI: 10.3762/bjoc.6.32

ResearchBlogging.orgBuijnsters, P. J. J. A.; García-Rodríguez, C. L.; Willighagen, E. L.; Sommerdijk, N. A. J. M.; Kremer, A.; Camilleri, P.; Feiters, M. C.; Nolte, R. J. M.; Zwanenburg, B. (2002). Cationic Gemini Surfactants Based on Tartaric Acid: Synthesis, Aggregation, Monolayer Behaviour, and Interaction with DNA European Journal of Organic Chemistry, 2002 (8), 1397-1406 : DOI:10.1002/1099-0690(200204)2002:8%3C1397::AID-EJOC1397%3E3.0.CO;2-6

Status update on BJOC analysis with Oscar and ChemicalTagger

This screenshot shows the current status of the Oscar analysis results of the BJOC literature. The results our logged as HTML+RDFa page, as I explained before in Scripts logs as HTML+RDFa: mix free text reporting with CSV. The page is interactive, using JQuery goodies to allow table sorting.

Posted via email from Egon's posterous

Friday, December 10, 2010

Trust has no place in science #2

Thanks to all who replied and shared their views. Particular thanx to Christina who replied in her blog. With Saml, and Cameron and Bill they think this is about semantics. Linguistic tricks. I hope not; this is too serious to get away with such. "Reliable, trustworthy, assumptions": it's all working around the real issue. Similarly, splitting up 'trust' into 'blind trust' and 'smart trust' is just working around the real problem.

Indeed, my point is different. The key of science is to replace trust by facts. Or, when talking about database, software, research papers in Nature, it is replacing trust with traceability. Actually, we seem to have lost a long-standing tradition of citing previous work when we write down the arguments we base our argumentation on. Facts are backed up with references, providing the required traceability.

Now, compare that to current electronic sciences. We 'trust' our database to have done something sane. Well, don't. They made an attempt, but made errors. As they say with software, having zero bugs just means you have not found them yet.

The real point with 'trust' is, is that it is completely irrelevant. It adds zero to the scholarly discussion. Whether you trust the highly curated ChEMBL database or not, it has errors. (Noel pointed out one source of ambiguity in the ChEMBL database this week). What does matter, instead, is if those errors are significant. Do they affect the conclusions I draw when I use this data. That is what actually matter. Trust has no place in science. Error has.

Sadly, this is basically the hypothesis of the VR grant I wrote up but did not get awarded. But I trust I do better next time.

Why this matters? Well, this is what ODOSOS is about: bring back the traceability into science, and get rid of trust.

How many hydrogen-bond acceptor groups does triazole have?

Andrew asked me about the H-bond donor capabilities of 1,2,4-triazole:


Apparently, the way he uses the CDK library, it returns zero H-bond donor groups, as is visible in his application down the bottom of the page. When start from the PubChem entry, it correctly detects on H-bond donor group, so it seems to have to do with implicit hydrogens (not uncommon). Debugging will continue later, when I am done analyzing the Beilstein Journal of Organic Chemistry literature with Oscar.

But, I do like to put this question out in the open: how many H-bond acceptor groups does this triazole have? The CDK calculates 3 groups, while PubChem counts 2. ChemSpider thinks 3 too.

Intuitively, I would agree with the CDK and ChemSpider: the nitrogen which acts as the single H-bond donor still have a free electron pair. What do you think? Is PubChem wrong, or is the CDK and ChemSpider wrong? Can this special nitrogen be both a donor and acceptor at the same time? I think so. However, I do not know how I can easily search CrystalEye for this. Bonus points for answering the Blue Obelisk eXchange question.

Monday, December 06, 2010

Trust has no place in science

One discussion I had often had in the past year, is about trust in science. I, for one, believe (hahahaha; you see the irony? ;) that trust has nothing to do with science. Likewise, any scholar should be, IMHO, hes is suspicious when someone talks about trust. A scholarly scientist will never trust any result: hes will accept it as true or false, but will take responsibility for that decision; hes will not hide behind 'but I trusted him' or 'but it was published in Nature'.

Antony asked last week the community to answer a questionnaire, which turned out the be about our trust in online chemical database. He presented the results at the EBI. This is the slide that summarizes the results from that questionnaire:


We see that trust clearly has a very significant place in science. How disappointing. You can spot me in these results easily: I am the one that consequently answered 'Never Trust' for all databases. It's not that I do not value those databases, but there is no need for them to trust them. I verify. This is actually a point visible in Tony's presentation: we can compare databases.

This is the point that I and others have been making for more than a decade now: if we do things properly, we can do this verification. Anyone can. With Open Data, Open Source, and Open Standards we can. I can only stress once more how important this is. We trust people, we trust government, but repeatedly this trust is taken advantage of. Without transparency, people can hide. By being able to hide, human loose there ability to decide what is right. With transparency, we see things return to normal, as we saw this week with UK politicians.

Further reading in my blog:

Update: if you liked this post, you will also like blogs posts like this one from Björn.

What is 'hes'?

hes - pronoun

1. Used to refer to a person whose gender is unspecified, unknown, or irrelevant.

Typical use: hes was a scientist born in the late 20th century.

Sunday, December 05, 2010

Konqueror Web Shortcuts for CHEMINF

In 2004 I wrote up a short CDK News article on how to set up Konqueror web shortcuts for the CDK (Windows users can download Konqueror here). They are very handy, and I just found another simple use case, and as I have not seen it get much attention recently, and it is a great productivity tool, here goes.

CHEMINF is an ontology under development by people at the EBI, Canada and Sweden, for cheminformatics. I was aggregation examples, and when you browse these, you immediately run into the problem that all OWL resources have rather cryptic names, like CHEMINF_000000. Of course, any decent OWL tool will just view the rdfs:label, but I and others prefer to work in plain text editors, rather than, for example, Protege.

Fortunately, the Michel and/or Leonid have made the CHEMINF ontology LinkedData, which is where the web shortcuts come in. So, CHEMINF_000000 has the URI http://semanticscience.org/resource/CHEMINF_000000. For the RDF and ontology users, a web shortcut is just like defining a namespace (actually, a bit more general), so we will define cheminf:CHEMINF_000000. Actually, let's skip this step, and make use of web shortcuts fully, and define cheminf:000000. After all, a web shortcut is nothing more than an simple expansion.

Open Konqueror's Settings -> Configure Konqueror dialog, and select that Web Shortcuts page:


Click new New button on the right:


and fill out the dialog like this:


Now, I can open the cheminf:000000 URI in Konqueror and get the information about a CHEMINF resource:

Friday, December 03, 2010

ChemWriter, Google Chrome, and Many Eyes in Open Source

Update: Wow, how tired can you be. I have to apologize for this post: as Andrew points out in the comments, Rich did not analyze the Chrome source code, but his own source code. That is not so special indeed. I have misread Rich' post. This completely ruins the point I was making. He did not take advantage of Chrome being Open Source, and find the problem that way, but in an old fashion debugging session on ChemWriter. The below could have happened, but it didn't.

This was the old post:

Linus' law:
    given enough eyeballs, all bugs are shallow.

Rich of MetaMolecular works on Open Source and closed source cheminformatics solutions. ChemWriter is one product he is working on which uses JavaScript and SVG (two Open Standards), and recently asked feedback on the new version. Test users found a problem on Google's Chrome browser, and Rich then did something that is only possible in an Open Source environment: he downloaded the buggy product (Chrome), started looking for the cause, found it, and filed a detailed bug report. Just think that would have happened if this problem was in MS Internet Explorer...

Well done!