Building: Grand Hotel Mediterraneo
Room: America del Nord (Theatre I)
Date: 2013-10-28 04:40 PM – 05:00 PM
Last modified: 2013-10-05
Abstract
As a mature organisation, GBIF’s activities increasingly focus on delivering stable production-quality services. It has enabled universal access through a single data portal to more than 400 million records of organism occurrences comprising more than 10,000 datasets from over 500 institutions. Such data, made discoverable and accessible for re-use through the GBIF portal, is intended to support scientific research and inform policy and decision making. This brings many challenges, not only technological, but also concerning data quality: how can a prospective user make a judgement and trust the data they download from the GBIF network? This presentation outlines the activities being undertaken by GBIF to meet the challenge of documenting the quality of the data it serves, thus enabling a user to make informed decisions on fitness-for-use.
The GBIF portal has already implemented several measures to document data quality. It requires that datasets submitted using the GBIF Integrated Publishing Toolkit (IPT) are described in an accompanying metadata document in Ecological Metadata Language and, when indexing submitted datasets, undertakes basic geospatial and taxonomic quality control checks, e.g., testing whether record coordinates lie within a stated country or marine region/economic zone, and matching taxon names and higher classification to authoritative sources such as Catalogue of Life / Species 2000. An issue report is automatically generated and made available to the data provider so that any problems can be addressed before the data are actually made available in the GBIF portal.
However, much remains to be done to document data quality and, to achieve this, the GBIF Work Programme, 2014-2016, has prioritised certain activities. These include i) developing guidelines and supporting tools to assess and improve metadata completeness, ii) developing portal upgrades to report data quality and fitness-for-use for each dataset and species, iii) exploring possible approaches to endorsement of datasets (e.g., through scientific oversight), iv) engagement of expert communities to form fitness-for-use working groups, v) developing metrics and indicators for assessing relevance of GBIF-mobilised data, and vi) evaluation of models for creating and curating reference data sets. Guidance on the allocation of responsibilities within the network will also be critical, with the role of various actors, from data custodian through node manager to central portal, clearly defined. GBIF will also address the requirement, a foundational pre-requisite, that all datasets and records have stable identifiers, e.g., to allow the global community to annotate, correct, curate and cite data. Moreover, GBIF cannot be a lone operator in these endeavours: it will seek to collaborate with other major players that share these goals, e.g., in developing a common global taxonomic framework to underpin taxonomic quality.