Missouri Botanical Garden Open Conference Systems, TDWG 2016 ANNUAL CONFERENCE

Font Size: 
A new power balance is needed for trustworthy biodiversity data
Nico Franz, Beckett Sterner

Building: CTEC
Room: Auditorium
Date: 2016-12-09 11:30 AM – 11:45 AM
Last modified: 2016-10-16


Biologists' trust and use of aggregated biodiversity data are suffering because of persistent criticisms of the quality of these data for basic and applied analyses. Individually, one can interpret each criticism as a problem of data quality local to some taxonomic group or geographic region. Indeed, biodiversity aggregators often respond by pointing critics toward correcting errors at their source. We will show, however, that these disputes over data quality are better understood as reflecting systemic flaws in the design of the aggregation process. As a result, fundamental change is needed to effectively address issues of trust in big biodiversity data. In particular, the design change must expand the roles available to researchers as established by data aggregators, such that the interests and views of bottom-up, high-quality content providers are more directly represented. We will outline steps towards alternative, provenance-aware design solutions that promote the formation and maintenance of high-quality biodiversity data packages.

Our discussion focuses on the unitary taxonomic syntheses ("backbones") created by biodiversity data aggregators. We show how the aggregation process can lead to a loss of data unity at the system level when different data sources adhere to conflicting taxonomic perspectives.

Many aggregators follow a design paradigm that requires one taxonomic hierarchy to organize all data at a given time. They achieve this unitary representation of the data using combinations of algorithmic and social practices governed by feasibility constraints rather than principles grounded in taxonomic theory. Eliminating taxonomic conflict between input sources in this manner often results in a hierarchy that no longer corresponds to the view of any particular source – it is a synthesis nobody believes in. Biodiversity data users and contributors frequently regard the quality of these novel classification theories as deficient.

We will show how the Darwin Core (DwC) standard plays a critical role in the design of the aggregation process. We carefully separate causes for poor aggregation that are rooted in failures on the data provision or DwC implementation side, versus systemic DwC flaws in the context of aggregation. For the latter, we outline specific syntactic and semantic solutions - often but not always represented in the Taxonomic Concept Transfer Schema - to achieve suitable aggregation outcomes. We conclude that improved aggregation designs must increase the power allocated to individual (or co-authoring) experts and their heterogeneous views to act as intermediary license-providers for the formation of trusted, big biodiversity data.