Missouri Botanical Garden Open Conference Systems, TDWG 2016 ANNUAL CONFERENCE

Font Size: 
Global Biotic Interactions: a case study in ecological data aggregation
Jorrit Poelen, Katja Sabine Schulz, Jennifer Hammock

Last modified: 2016-09-29

Abstract


Although large-scale ecological data sets are essential to enhance our understanding of biodiversity in a changing world, efforts to aggregate and harmonize ecological data from heterogeneous sources are still in their infancy. Global Biotic Interactions (GloBI) contributes to progress in this area by building an extensible, open-source infrastructure for the integration and sharing of species interaction data such as predator-prey, parasite-host, pollinator-plant associations. In collaboration with the Encyclopedia of Life (EOL), GloBI lowers technical barriers for data contributors by accommodating a variety of data types and formats. Interaction data may be based directly on vouchered occurrence records or they may be derived from journal articles, books, or scientific reports. If source data are not sufficiently structured, technical help is available to prepare a data set for ingestion. The GloBI integration process leverages multiple taxonomic name services and hierarchies (e.g., World Register of Marine Species, Integrated Taxonomic Information System, National Center for Biotechnology Information, via EOL & Global Names Architecture services), community ontologies (e.g., OBO Relations Ontology, Uberon, and Environment Ontology), controlled vocabularies (e.g., GeoNames, Coastal and Marine Ecological Classification Standard), and registries (e.g., Crossref Digital Object Identifiers). The data model is flexible and easily accommodates novel data and metadata types uncovered by evolving aggregation strategies. Integration into the unified GLoBI framework is continuous, and the data are rebuilt and updated from the original data sources on a daily basis. As the volume of interaction records grows, the integration process is optimized to ensure scalability. Since its launch in 2013 GloBI has collected more than 2 million interaction records for over 100,000 taxa from almost 300 sources. These data are openly shared through an application programming interface (API) and various data archives (Darwin Core, Turtle, Neo4j), always accompanied by comprehensive information about data sources and contributors. GloBI welcomes new data contributors (see Ways to Contribute Data) and data users (see Accessing Species Interaction Data). While the focus of GloBI is on the efficient mobilization of as many interaction data as possible in support of large-scale scientific analyses and visualizations (e.g., Gulf of Mexico Species Interactions), the data also reach a wider audience through dissemination on Encylopedia of Life taxon pages and use in a variety of educational tools to advance students' understanding of food webs and ecosystems.