Missouri Botanical Garden Open Conference Systems, TDWG 2013 ANNUAL CONFERENCE

Font Size: 
Workshop: Sharing & Delivery of Reusable Phylogenetic Knowledge
Rutger Vos, Nico Cellinese, Hilmar Lapp

Building: Grand Hotel Mediterraneo
Room: America del Nord (Theatre I)
Date: 2013-10-31 11:00 AM – 12:30 PM
Last modified: 2013-10-21

Abstract


Phylogenetic trees are applied in a variety of research fields whose practitioners are not phylogeneticists themselves and who therefore do not want to duplicate effort in an attempt to reconstruct what is to them in essence a "nuisance parameter". Despite this, phylogenetic knowledge is rarely re-used. Among the barriers to re-use of phylogenetic knowledge are practical impediments, i.e. a dearth of tools to simplify data re-use and integration. This workshop will showcase recent progress in the development of open source software tools that address different aspects of this issue.

Firstly, phylogenetic inferences can be based on the recent and ongoing surge in the availability of DNA sequence data in public databases, but mining these databases and correctly identifying and aligning homologous sequences is a considerable challenge. Alexandre Antonelli will demonstrate how the SUPERSMART pipeline harvests data from GenBank and identifies and aligns homologous, phylogenetically informative sequences in a multi-step process. Suzi Lewis will demonstrate how the Quest For Orthologs project benchmarks different approaches to a key challenge in this process: the identification of orthologous sequences (whose origin lies in successive speciation events) from paralogous sequences (which result from gene duplication events).

Secondly, inferring large phylogenies is computationally intensive, and the correct application of sophisticated statistical methods poses barriers to non-experts. The BioVeL project aims to simplify many aspects of computationally intensive biodiversity research. One of these is phylogenetic inference, for which the project is developing user-friendly, advanced workflows that can be executed on powerful computational infrastructure. Saverio Vicario will demonstrate the recent developments and current status of these workflows.

Thirdly, a big challenge in the integration of any kind of biological data is that the principle around which most biological knowledge is organised is the Linnaean system of taxonomy, which translates poorly to the integration of data by machines (as opposed to human experts) due to the vagaries of homonyms, synonyms, alternate spellings, alternate suffixes, and so on. Several online taxonomies provide services to address this, though they are often limited to only some higher taxa (e.g. plants, or mammals) or only taxa for which DNA sequence data are available. One of the outcomes of the PhyloTastic project is a "meta" web service that integrates multiple such taxonomies in a modular architecture so that input names can be matched against all of them. Rutger Vos will demonstrate its usage.

Lastly, even if source data is integrated and analysed successfully, situations still exist where the resulting trees need to be integrated to form larger, composite estimates of phylogeny. Numerous supertree methods and tree grafting algorithms exist that can enable this, but their application at the largest scale has so far been lacking.  The Open Tree of Life project represents the "moonshot" approach to this problem by developing the tools to integrate large phylogenies in graph databases in order to derive a rough estimate of the entire tree of life. Jonathan Rees will demonstrate the latest advances in this project.