Missouri Botanical Garden Open Conference Systems, TDWG 2015 ANNUAL CONFERENCE

Font Size: 
Towards Automatic Curation of Biodiversity Data: the BLOOM Project
Mélanie Hachet, Haevermans Thomas, Philippe Grandcolas, Frédéric Legendre, Roseli Pellens, Julien Troudet, Visotheary Ung

Last modified: 2015-08-17

Abstract


Current environmental uncertainty raises significant questions on the future of biodiversity and highlights the need for its monitoring. However, the sheer magnitude and complexity of biodiversity data make it impractical to consider the available millions of organisms’ occurrence points for analyzing on a large scale the impact of climate change or addressing macroecological research questions. Besides the size, the challenge is the extreme heterogeneity of biodiversity occurrence data. Using occurrence points is non-trivial and the curation process should take into account issues such as inaccurate determinations, poor geolocalization, format errors and incorrect projection systems. As potentially no single data occurrence is the same, it is hardly possible to correct all errors automatically without going through sophisticated validation processes, most of it requiring direct human input and going through each record manually.

Our objective is to provide an efficient and easy-to-use tool to improve biodiversity monitoring via an automatic curation procedure. The BLOOM (Biodiversity Linked Organisms Occurrences Metadatasets) project will provide a generic solution for curation of large biodiversity datasets. To this end, we propose a data validation workflow that addresses, for millions of points, simple data quality checking (coordinates, localities, outlier points and taxonomy).