Missouri Botanical Garden Open Conference Systems, TDWG 2016 ANNUAL CONFERENCE

Font Size: 
CANCELLED: What’s in a name? The foundation of telling machines what we mean instead of how someone calls it
Nico Cellinese, Gaurav Vaidya, Hilmar Lapp

Last modified: 2016-12-01


Improving our understanding of Life, whether the biology of individual species or the mechanisms and processes governing biodiversity at large, critically depends on integrating, querying, and aggregating biological data from many different organisms. To this day, the most fundamental and common way to accomplish this goal relies on organism names, making them one of the pillars of querying and managing our biological knowledge and data. However, traditionally defined names, based on Linnaean nomenclature, suffer from major limitations to their usefulness when it comes to integrating and communicating data. Given that they are simple text-strings, the meaning intended by those who coin a name and those who apply it is inaccessible to machines. As a result, exactly which organism a name is or is not meant to include is often ambiguous, and names are therefore applied inconsistently.  As a consequence, the continued failure to unambiguously reconcile taxon names and the concepts they purportedly denote has plagued biodiversity informatics for decades, despite sustained – and expensive – efforts to solve this problem. After more than 280 years since the first publication of Linnaeus’ Systema Naturae, a canonical, comprehensive, and authoritative taxonomy even only for some of the most charismatic groups (such as vertebrates) remains elusive, let alone for all of life. Moreover, many groups of organisms may never have a Linnaean name, let alone one denoting a stable and universally agreed upon taxon concept. Yet, scientists may discover molecular or macroscopic characteristics for them that constitute valuable biological data and knowledge. How to communicate, find, aggregate, and properly reuse such data and knowledge if their most important metadata link, the taxon identification, is also their weakest? We propose that these challenges can be addressed by what we refer to as phyloreferencing, which allows users to refer to any group of organisms of shared evolutionary history by a machine-interpretable definition of the unique pattern of descent that distinguishes a group from all others. Phyloreferences build on a large body of theoretical and applied work on phylogenetic nomenclature. Although phyloreferences can, and often will be named, their names are used solely to aid human communication, not to define their semantics.