Biodiversity Data Integration from an Aggregator’s Perspective
Tim Robertson

Date: 2016-12-05 11:15 AM – 11:30 AM
GBIF’s fundamental charge is to make all of the world’s biodiversity data (as much as people are willing to share) behave as though it were managed in a single consistent database, with linkages to any other similar resources in biological and earth sciences.  [Replace that with your preferred grand expression, but I hope one that highlights the contrast between consistent and inconsistent data.]  The ability to query and summarize data, with answers that are as complete and accurate as possible, is made much more difficult by the fact that people record and publish data so differently.

We will summarize GBIF data ingestion and integration operations, and highlight how standards, particularly vocabulary standards, could simplify the integration effort and vastly improve the quantity and quality of data that are represented consistently.

GBIF harvests more than 32,000 data resources from over 800 providers.  At the first level, follow DarwinCore, ABCD, and various extensions, standardize the larger concepts, but at the value level, contents are still very heterogeneous.

The key concepts that GBIF standardizes include: Decimal-Latitude, Decimal-Longitude, Country, Taxon-Name (ranks of the taxonomic hierarchy?), Collecting-Date (and Time?).  In addition to Specimen, Observation, and Taxon-Name, what are the key classes that we need to standardize?

The processes of standardizing content has been expensive, and fields that remain unstandardized impede the producing complete and accurate results.

What are the concepts that most important to address with content vocabulary?

How else can vocabulary standards improve the quantity and quality of biodiversity data?

Will internationalization of vocabularies be required?