Tuesday, April 29, 2014

Open PHACTS community workshop

Of course, you can start hacking with Open PHACTS any day, but please see this invitation (I will not be present myself):
    You are cordially invited to the upcoming Open PHACTS Community Workshop on 26.
    The Open PHACTS Discovery Platform is a freely accessible infrastructure that semantically integrates publicly available data for applied life science R&D. The Platform provides a powerful Application Programming Interface (API) which allows application builders and researchers to query the integrated data using existing applications, to build new applications and to access the API using workflows tools (e.g. KNIME). Examples of such applications, which illustrate what can be achieved, include the Open PHACTS Explorer, ChemBioNavigator, and PharmaTrek.

    The Open PHACTS Community Workshop in London on 26 June aims to introduce members of the academic community to the Open PHACTS Discovery Platform. The workshop will be of interest to: 
    • Researchers who would benefit from directly querying the Open PHACTS API using scripting languages or by developing applications to consume the data.
    • Lecturers & Principal Investigators who can use the Open PHACTS application ecosystem to access the data within the Open PHACTS Discovery Platform.

    The Community Workshop will introduce attendees to the Open PHACTS API and showcase how it can be used to create new or enhance existing applications. We will demonstrate, using real life use-cases, how universities can use the Open PHACTS API and associated tools for teaching and research in drug discovery.

    The Workshop is free to attend, please register your interest by replying to Feel free to forward this email to interested members of the academic community.

    Please find a preliminary agenda attached. We very much look forward to seeing you in London.

    Kind regards
The next chance to discuss Open PHACTS in person with me, is the International Conference on Chemical Structures, in the first week of June. And, if you like to learn about and/or contribute to the R client "ropenphacts" I have been hacking on, just let me know!

Sunday, April 27, 2014

Changes in CDK 1.6 #3: Constructors that now require a builder

The advantage of the builders in the CDK is that code can be independent of data class implementations (and we have three of them in CDK 1.6, at this moment). Over the past years more and more code started using the approach, but that does involve that more and more class constructors take a IChemObjectBuilder. CDK 1.6 has two more constructors that now take a builder.

The DescriptorEngine constructor is changed to now take a IChemObjectBuilder which is needed to initialize descriptor instances.

The second constructor that now needs a IChemObjectBuilder is that of the SMARTSQueryTool. Here it is passed on to the SMARTSParser which needs it for its data structure for the matching.

Earlier posts in this series:

Saturday, April 26, 2014

Changes in CDK 1.6 #2: IteratingMDLReader rename

This post in a series about API changes in CDK 1.6 is about the iterating reader for SD files, which are basically a list of MDL molfile (Symyx, ... I lost track) complemented with properties for each structure. Since the CDK IO readers have a representation of the file format in the class name, this class was renamed from IteratingMDLReader to IteratingSDFReader.


Saturday, April 05, 2014

Every PhD student must use Git (aka research data management)

Last Thursday and Friday the SURFAcademy Masterclass Research Data Management in Nederland took place, and Chris Evelo and I presented some biology-world use cases. He focused more on the larger projects (e.g. ISA-TAB, GSCF, and FAIRPort) while I exposed my day to day data management. My day to day work habit looks more or less like this.

Day 0 is to think about how to do it, but the answer is pretty simple: use a version control system, like Git. Because it tracks every bit of what you do, allows for easy back ups, and makes it easy to continue working on a different machine in case you forget to take your laptop adapter home :)

  • Day 1: keep an electronic lab notebook (e.g. a version control system; read Git from the Bottom Up)
  • Day 2: carefully select data you build on (can you indeed share it with the rest of your arguments in your next paper?)
  • Day 3: do you research and store everything
  • Day 4: integrate data repositories in your data analyses, e.g. rrdf and knitr
  • Day 5: if you like scientific dissemination, collaboration, and progressing science, share your data in public repository, like FigShare, Data Dryad, Dutch Dataverse, 3TU.Datacentrum, DANS, etc. (that's a lot of D-D-D-Data...) or in a domain specific database, like WikiPathways, XMetDb, or DrugMet. And data copyright and licenses and particularly, whatever you chose, be explicit about it and don't let others guess (wrong).
  • Day 6: think ahead of reuse, and suitable formats. Consider semantic web and linked data.
  • Day 7: did you get impact? Think DataCite, ImpactStory, and Altmetric (and ORCID and DOI along the way).
And here are the slides:

Tuesday, April 01, 2014

Permission to put Jmol in a paper's Supporting information

A random email correspondence (thanks to the author for asking, giving me the chance to blog the answer!):
    I want readers of a paper I am writing to see several molecules in Jmol. I could instruct them to download Jmol but this cumbersome as Jmol is 54 Mb. Apparently all that is needed is Jmol.jar, which is only 4.5 Mb. Can I get permission to add a zip file to my paper that contains jmol.jar plus a number of .pdb files?
My answer:
    Dear Jmol user,

    the Open Source license of Jmol defines the permission you ask for.

    In an ordinary world, you would have to ask *all* authors, but one of the virtues of an Open Source license is that you do not need to ask such permission, because the license explicitly provides you with the permission to redistribute the software.

    I cannot make a claim about the PDB files, of which I do not know the source.

    Hoping to have informed you sufficiently,

    with kind regards,

Network of BioThings: the EU hackathon

Very soon an international hackathon will take place: the Network of BioThings, with events in the USA and in Maastricht, The Netherlands. As I am traveling back from the NanoTox 2014 meeting, I will not be able to join in person, sadly, but will try to join online from Eindhoven.

The hackathon includes lunch, pizza, is synchronized between continents, and is aimed at:
  • Hackers and Mentors
  • Biologists, Text Miners and Data Wranglers
  • Ontologists, Terminologists, and Data Linkers
  • Semantic Web novices and experts
  • Systems and Network Biologists
  • Crowdsourcing experts and functional game designers
  • Skills in Large Text/Data Indexing, Facet Search and Browse, and REST APIs
  • Domain experts to advise on motivating use cases
Registration is open!