Pages

Tuesday, January 07, 2014

Handling SD files with JavaScript in Bioclipse

I finally got around to continuing with a task to create an SD file for WikiPathways. The problem is more finding the time, than doing it, and the tasks are basically:
  1. iterating over all metabolites in the GPML files
  2. extract the Xref's database and database identifier (see previous link)
  3. extract the molfile from the database SD file
  4. give the WikiPathways metabolite a unique identifier
  5. record that WikiPathways metabolite has a molfile
  6. append that molfile along with the new WikiPathways metabolite ID in a new SD file
It turns out that I can use Uppsala's excellent SD functionality in Bioclipse (using indexing, it opens 2 GB SD files for me) is also available from the JavaScript command line:

  hmdbIndex = molTable.createSDFIndex(
    "/WikiPathways/hmdb.sdf"
  );
  
  idIndex = new java.util.HashMap();
  molCount = hmdbIndex.getNumberOfMolecules();
  for (i=0; i<molCount; i++) {
    mol = hmdbIndex.getMoleculeAt(i);
    if (mol != null) {
      hmdbID = mol.getAtomContainer().getProperty(
        "HMDB_ID"
      );
      idIndex.put(hmdbID, i);
    }
  }

Using this approach, I can create an index by HMDB identifier of molfiles in the HMDB SD file extract just those molfiles which are found in WikiPathways, and create a new WikiPathways dedicated SD file. When I have the HMDB identifiers done, ChEBI, PubChem, and ChemSpider will follow.