Missouri Botanical Garden Open Conference Systems, TDWG 2011 Annual Conference

Font Size: 
The final frontier of taxonomic standards- descriptive content
Matthew Yoder

Last modified: 2011-09-09

Abstract


Standards that facilitate taxonomy now exist for many elements of revisionary treatise.  Taxon name and specimen metadata frameworks are particularly well-developed, as evidenced by entities like GBIF, which harvest data from myriad sources that all utilize TDWG-developed standards. Document-level metadata standards are on course to be equally broadly utilized (e.g. TaxPub, http://sourceforge.net/projects/taxpub/).  One major (arguably the most important) component of taxon descriptions, however, remains untouched: the descriptive elements of a paper.  While we can use existing standards (e.g., Structure of Descriptive Data (SDD)) to return individual columns or rows of descriptive matrices, or return the block of text that contains a diagnosis or description, we can only estimate algorithmically responses to questions like “show me all the statements pertaining to the head of an insect”.  

We propose that the production of formalized “semantic phenotypes” (SPs), i.e. descriptive statements that can be reasoned or computed across, will significantly enhance taxonomic products and ultimately avail the hard work of taxonomists to a nearly universal audience of biological researchers.  Developing a framework that facilitates the production of SPs, while minimizing the additional work required from taxonomists is a grand challenge with many hurdles.  To the extent that they are understood we enumerate the steps necessary to meet this challenge.  These include the development of 1) multi-species anatomy ontology (e.g. Hymenoptera Anatomy Ontology, HAO) and phenotype and trait ontologies (e.g. PATO);  2) models to formalize the instantiation of these ontologies with respect to individuals (e.g. specimens); 3) applications to produce SPs statements in the broader context of other taxonomic data (e.g. specimens, names); and 4) repositories to warehouse SPs.  While the path to implementing a broad-scale SP system is long term, there are many potential intermediate steps possible, for example embedding URIs that point to anatomical concepts within TaxPub.  The formalization of these intermediate steps would benefit greatly from TDWG’s expertise.  We provide real-world examples with experiments in Hymenoptera taxonomy using the HAO.