Missouri Botanical Garden Open Conference Systems, TDWG 2013 ANNUAL CONFERENCE

Font Size: 
Semantic tools for aggregation of morphological characters across studies
James Balhoff, Alex Dececchi, Paula Mabee, Hilmar Lapp

Building: Grand Hotel Mediterraneo
Room: Sala dei Continenti
Date: 2013-11-01 11:29 AM – 11:36 AM
Last modified: 2013-10-07

Abstract


Comparative descriptions of vertebrate morphology have been formalized in the phylogenetic systematic literature, yet the information embedded within character statements is typically opaque to computational approaches. Semantic annotation, using an Entity–Quality approach, provides the means to create a classification of these character states along anatomical, developmental, qualitative, and other axes. The Phenoscape project has semantically annotated over 16,000 published character statements describing phenotypes from across over 4000 vertebrate taxa, recently focusing on the fin to limb transition. Here we describe the application of semantic web technologies supporting the production of automatically aggregated supermatrix exports from this dataset housed in the Phenoscape Knowledgebase. Building on the knowledge representation frameworks RDF and OWL, we use the open-source OWL API, the ELK reasoner, and the Bigdata RDF triple store in our custom software which allows investigators to extract character states involved in arbitrarily-defined aspects of anatomy, e.g. “any part of the limb or fin”, along with their taxonomic associations from multiple studies. We also provide the generation of presence/absence character states when they can be automatically inferred from information present within other statements. The resulting supermatrix can be automatically mapped onto a default phylogeny according to a provided taxonomy. Supporting efficient queries, which at the same time allow semantically rich reasoning, presents performance challenges at the scale of our knowledgebase. We describe reasoning approaches and software tools we have developed to address trade-offs between logical expressivity and application performance, and also give examples of the data generated and its potential significance to the question of the origin of the tetrapod limb.