Missouri Botanical Garden Open Conference Systems, TDWG 2014 ANNUAL CONFERENCE

Font Size: 
Complex Data Modeling for Simpler Data Access
Ramona Walls, Robert Guralnick

Building: Elmia Congress Centre, Jönköping
Room: Rydbergsalen
Date: 2014-10-29 11:50 AM – 12:05 PM
Last modified: 2014-10-03

Abstract


Biodiversity data is commonly collected using complex sampling or survey schemes. For example, vegetation plot surveys may include a nested design with data recorded at different levels but no specimens taken, while ocean sampling often includes water samples collected at multiple depths with separate aliquots (subsamples) taken from each sample to measure physiochemical parameters, microscopic-organisms, and microbial DNA. Data from such processes can be of high value if stored in a manner that allows future researchers to correctly understand it. Traditionally, relational databases have been used to model, store, and serve sampling or survey data. These databases retain the complexity of the original data but require customized interfaces to access the data and have serious limitations for integrating data stored under different schemas. The advent of semantic web technologies such as ontologies and RDF allows the development of data models and exchange formats for sampling and survey data that are flexible and extensible without any loss of information. Although there is a high overhead associated with adoption of ontology-based models, the benefits in data access and integration have been amply demonstrated in other allied domains. The Biological Collections Ontology (BCO) models biodiversity specimen and data collection processes, and ongoing work aims to integrate the BCO with existing models such as Observations and Measurements (O&M) and the Extensible Observation Ontology (OBO-E). In addition, we have developed several prototype tools to collect and share data using BCO. In this presentation, we will discuss the basic model underlying BCO and some of the ways we are using it to store and query data.  In particular, we discuss ongoing work defining taxonomic surveying processes, and how those relate to other concepts such as collecting event processes.  We also present results from recent concerted work which clearly shows that existing flat standards such as Darwin Core are inadequate for accurately conveying the semantic content of biodiversity data. Going forward, we urge the biodiversity community to contribute to BCO development and refine standards in a way that makes them more compatible with semantic web technologies.