Missouri Botanical Garden Open Conference Systems, TDWG 2016 ANNUAL CONFERENCE

Font Size: 
Progress in Standardizing Sampling-Event Data
Kyle Braak

Building: Computer Science
Room: Computer Science 3
Date: 2016-12-06 09:15 AM – 09:30 AM
Last modified: 2016-10-16


Scientists can now share sampling data on GBIF.org, making it available for other researchers while showing a commitment to open access and reproducibility, which are integral to scientific inquiry.

GBIF.org is the world's largest source of species occurrence data, providing free and open access to more than 600 million occurrences from more than 29,000 datasets published by over 800 institutions. Its near real-time infrastructure is now widely used, supporting more than one substantive use in peer-reviewed research per day.

Over the past two years, the GBIF Secretariat has been working with European Biodiversity Observation Network (EU BON) partners and the wider biodiversity informatics community to enable sharing of “sampling-event datasets”. These data are derived from environmental, ecological and natural resource investigations that follow standardized protocols for measuring and observing biodiversity. Because the sampling methodology and sampling units are precisely described, the resulting data is comparable and thus better suited for measuring trends in habitat change and climate change. Previously GBIF.org did not support this type of data because of the complexity of encoding the underlying protocols in consistent ways.

In March 2015, TDWG ratified changes to Darwin Core (DwC) standard to enable the mobilization of sampling-event data, particularly species abundance. In September 2015 GBIF released a new version of the Integrated Publishing Toolkit (IPT), its free, open-source data publishing software, allowing publication of sampling event datasets in connection with updates to GBIF.org, which enhanced indexing and discovery of these datasets.

Early adopters began publishing the first round of sampling event datasets in late 2015. Based on feedback collected from these publishers, four additional DwC terms were proposed in June 2016 in order to more faithfully represent a wider number of sampling protocols. Their input also helped guide the development of documentation to support publishers interested in sharing sampling-event data.

This presentation will highlight recent improvements GBIF has made to support the publication of sampling event datasets. The presentation will also reveal how upcoming changes to GBIF.org may improve the discovery and reuse of this type of dataset. Drawing on some exemplar datasets, the presentation also aims to promote this new data standard, demonstrating, for example, how it can truly represent vegetation plot data.