Missouri Botanical Garden Open Conference Systems, TDWG 2013 ANNUAL CONFERENCE

Font Size: 
Practical interoperability across semantic stores of data for ecological, taxonomic, phylogenetic, and metagenomics research
Cynthia Parr

Building: Grand Hotel Mediterraneo
Room: Sala dei Continenti
Date: 2013-11-01 11:36 AM – 11:43 AM
Last modified: 2013-10-07

Abstract


EOL's TraitBank™ aggregates and manages attribute (trait) data across the tree of life in a Virtuoso triple store. Attributes of organisms include morphological descriptors, life history characteristics, habitat preferences, and interactions with other organisms. In this talk we focus on how we add to and improve semantics of both data and metadata in order to improve interoperability across the domains of morphology, ecology, and genomics. At least initially, most data aggregated by TraitBank will not have been "born semantic." Wherever possible, for each dataset, staff will select Uniform Resource Identifiers (URIs) for terms in existing ontologies (e.g. those registered in bioportal.bioontologies.org) to anchor the type of the attribute (e.g. habitat from the Environments Ontology).  We also use terms from ontologies or other controlled vocabularies for value of attributes (e.g. a particular type of habitat) as well as for most metadata describing the context of the measurement (e.g. life stage, geographic scope). As large datasets are ingested we will propose new terms if needed to managers of existing ontologies. Using a customized interface we ensure and can share good definitions and labels for terms that don't yet have them. We also use this interface to promote good practice when others choose URIs for directly-added data. However, we will remain flexible and allow new community-generated terms. We anticipate iterative processes to relate new terms to each other and to existing ontologies. Our usage of semantic reasoning will initially be quite light, limited to units conversion and inverse relationships. Eventually it could be expanded to infer values based on phylogeny. A prime example of the approach of reusing ontologies is the Global Biotic Interactions group (GLoBI, http://globalbioticinteractions.wordpress.com/) which reuses and extends classes and relations from existing biomedical and genomic ontologies. In particular Globi.owl draws interaction processes from the Gene Ontology, taxonomic ranks from the Open Biomedical Ontology (OBO) taxrank ontology, relations from the OBO Relations Ontology, life cycle stages and body parts from UBERON, observation and specimen terms from various ontologies, behaviors from NeuroBehaviorOntology and habitat keywords from Environment Ontology. GLoBI standardizes data then flows it to EOL. Though challenges remain to be addressed, the ultimate goal is to expose semantically-annotated, contextualized data so that it can contribute to 1) phylogenetic analyses aimed at understanding evolutionary responses and evolutionary history, 2) facilitation of new species discovery, 3) metagenomic analyses aimed at integrated understanding of ecosystem processes, and 4) Global biotic models.