Missouri Botanical Garden Open Conference Systems, TDWG 2014 ANNUAL CONFERENCE

Font Size: 
The Spatio-Taxonomic Data Quality API: Rationale, Development and Examples
Javier Otegui, Robert Guralnick, John Wieczorek, David Bloom, Laura Russell

Building: Elmia Congress Centre, Jönköping
Room: Rum 11
Date: 2014-10-30 02:00 PM – 02:15 PM
Last modified: 2014-10-03


In the 2013 TDWG Annual Meeting, our team presented a prototype for the integration of two large biodiversity data initiatives, VertNet and Map of Life, and the initial outcome of such integration. VertNet (http://vertnet.org/) is a network of institutions who share global primary information about vertebrate species’ presence. Map Of Life (http://mol.org/) is an online resource that collates and links different sources of biodiversity information (expert range maps, regional checklists, etc.) under a common infrastructure. In our integrative framework, records from VertNet were checked against data sources from Map of Life, which leads to the detection of a series of spatio-taxonomic issues. These checks ranged from the most basic completion tests (whether coordinates were present, for example) to more advanced consistency checks (whether coordinates and country matched and, if not, why) and spatio-taxonomic assessments (e.g. overlaying occurrence points and range map polygons). Since then, we have further developed this approach by generalizing the concept of this integration and bundled these into a web API that performs these assessments on the fly for any given set of data. This API, based on Google Cloud Endpoints and accessible with any current web browser, is fed with basic geospatial and taxonomic data, namely coordinates, country or country code, and scientific name, and performs all the completion checks, consistency evaluations and quality assessments. It is also designed to be modular with the aim of making it easier to add more and/or better assessments down the road. The output of a call to the API is a JSON object with the results of all these assessments, which can be downloaded or integrated into any workflow. As a proof of usability, we integrated calls to this API with the VertNet data portal (http://portal.vertnet.org/). Each time a user loads the page of a record, the API is called with values taken directly from that record, and the results are parsed and presented in a specific tab within the page.  We have styled these results so that users may quickly get an overview of quality issues while also getting detailed reporting.  Next steps include integrating this service more broadly into VertNet data quality workflows both prior to completing data publication and post-publication via VertNet’s record-level issue submission system.  The API and its integration into pre- and post-publications workflows can greatly hasten improving fitness for use of point occurrence records.