Missouri Botanical Garden Open Conference Systems, TDWG 2011 Annual Conference

Font Size: 
Linking semantic phenotypes to character matrices and specimens
James P. Balhoff, Matthew J. Yoder, Andrew R. Deans

Last modified: 2011-09-10

Abstract


Phenotype descriptions documented in the large body of published systematic biology literature are traditionally reported in a free-text format. As a consequence, they are largely inaccessible to computational methods for large-scale integrative analysis, including even seemingly basic steps such as linking them to biological knowledge maintained in databases for genetics, development, and other domains. Pioneered by the model organism genomics community, the Entity-Quality (EQ) model is an emerging standard for describing phenotype data within an ontological framework. Ontologies have become a foundational technology for establishing shared semantics, and, more generally, for capturing and computing with biological knowledge. An EQ expression that corresponds to a phenotype observation asserts that an entity, a term drawn from an organism-specific ontology (e.g. 'fin', 'vertebra', or 'skull', from the Teleost Anatomy Ontology) bears a particular quality, a term from the taxonomically agnostic Phenotype and Trait Ontology (e.g. 'serrated', 'curved', or 'blue'). The Phenoscape project (http://www.phenoscape.org/) has established the applicability of EQ descriptions to comparative data by ontologically annotating a body of published phylogenetic character matrices from the ichthyological literature (http://kb.phenoscape.org/).

In order for EQ phenotype descriptions to be consistently applied across comparative biology, their formal semantic relationship to individual organisms, taxonomic groups, and taxon-by-character matrices and other information artifacts must be explicated. Using the Web Ontology Language (OWL), we have extended the Comparative Data Analysis Ontology (CDAO) to relate character states to the phenotypic classes (EQs) which they denote, and to relate operational taxonomic units (OTUs) to organismal specimens and taxonomic groups. We provide OWL property chain axioms which allow propagation of phenotypic descriptions from taxon-by-character matrix annotations to specimens represented by OTUs in the matrix. Our model tentatively incorporates elements of the TDWG Darwin Core vocabulary in order to describe organismal specimens; however it would greatly benefit from an RDF-based (Resource Description Framework) version of the Darwin Core data model. Our OWL framework provides consistent semantics for phenotypic descriptions across both the published literature annotated by Phenoscape as well as new semantics-based taxonomic descriptions being developed by the Hymenoptera Anatomy Ontology project (http://hymao.org/).