Friday, August 16, 2013

Analyzing WikiPathways metabolites in Bioclipse is easy with Groovy

Assume you downloaded a set of GPML pathway files from WikiPathways (doi:10.1371/journal.pbio.0060184) and placed those in a Bioclipse (doi:10.1186/1471-2105-10-397) workspace project, then you can easily analyse all metabolites:

Well, genes and proteins too, but I just happen to like metabolites more.

In fact, more interesting than printing the database source and identifier is perhaps opening them in a molecule table. Because I have not update the BridgeDb plugin to easily load identifier mapping databases, let's just use OPSIN (which recently saw its 1.5.0 release) and accept that we don't get to see all metabolites just yet:
    dataMap = bioclipse.fullPath("/WikiPathways/data/")
    gpmlFiles = new File(dataMap).listFiles()

    structureList = cdk.createMoleculeList()
    gpmlFiles.each { file ->
      def data = new XmlParser().parse(file)
      def metabolites = data.DataNode.findAll{
      metabolites.each() { node ->
        name = node.'@TextLabel'.trim()
        try {
          molecule = opsin.parseIUPACName(name)
          js.print("IUPAC name found: $name \n")
        } catch (Exception exception) {
          // OK, it was not an IUPAC name
Then we get to see this (for pathways with names starting with a "B"):

Anyway, this is just playing around. The point is, we can now hook up metabolite information in WikiPathways with any of the other functionality in Bioclipse, such as toxicity prediction, decision support, structural analysis (with the CDK), database look ups, etc, etc.

Or, and that was actually my primary goal this afternoon, to find all GPML Label elements with IUPAC names. But more on that next week.