Missouri Botanical Garden Open Conference Systems, TDWG 2016 ANNUAL CONFERENCE

Font Size: 
A reference taxonomy for phylogenetic data aggregation
Jonathan A Rees

Last modified: 2016-09-29


Any large biodiversity data project requires one or more taxonomies for discovery and data integration purposes, as in "find me records for primates" or "follow this record's link to IRMNG (the Interim Register for Marine and Nonmarine Genera)".  For example, the GBIF occurrence record database and the NCBI Genbank sequence both have dedicated taxonomy efforts, while Encyclopedia of Life is taxonomy-agnostic, supporting multiple taxonomies.  We present the design and application of the Open Tree Taxonomy, which serves a store of phylogenetic trees (from currently about 3,500 published studies) called 'Phylesystem'.  In order to obtain the greatest possible number of resolved taxa for names occurring in the phylogenetic trees, the taxonomy is a synthesis of seven different source taxonomies, each with different strengths.  Automatic taxonomy synthesis gives a unified view of the tree store, and in addition has allowed creation (in conjunction with a phylogenetic supertree) of a comprehensive summary "tree of life".  The synthesis process is repeatable so that updates to source taxonomies can be incorporated easily.

Taxonomy synthesis has been technically challenging in unexpected ways. The taxonomy contains a number of unfortunate artifacts, and making the taxonomy transparent for users has been a struggle.  I will report on experience that may help others considering this kind of synthetic taxonomy project.