Missouri Botanical Garden Open Conference Systems, TDWG 2013 ANNUAL CONFERENCE

Font Size: 
VESpeR - Visual Exploration of Species-referenced Repositories
Martin Graham, Jessie Kennedy

Building: Grand Hotel Mediterraneo
Room: America del Nord (Theatre I)
Date: 2013-10-29 02:30 PM – 02:45 PM
Last modified: 2013-10-05

Abstract


VESpeR is a UK BBSRC (Biotechnology and Biological Sciences Research Council-United Kingdom) funded project to visualise the data quality of Darwin Core Archive (DWCA) datasets through the client-side HTML5 functionality available to newer standards-compliant web browsers. This approach to interrogating DWCA datasets is being carried out to reduce the amount of time and effort needed by biologists to ascertain the quality of data they are generating or using, as currently DWCA quality checking is limited to table outputs of data ‘existence’ and compliance with DWCA format guidelines via the online DWCA archive validator and reader. These tools thoroughly detect the parseability of data structure and the presence of data but not the underlying quality of the data itself.

Using the popular D3 javascript library it analyses and displays DWCA datasets in three fundamental dimensions - taxonomic, geographic and temporal - with a visualisation dedicated to each of these aspects of the data. Through viewing the composition of the dataset in these dimensions a prospective user of a dataset can judge whether it is suitable for the tasks or analyses they have in mind, and a data provider can identify where a dataset they’ve constructed may fall short in terms of data quality by lacking necessary datapoints or contains data that upon viewing is obviously incorrect – (a classic example being geographical data that transposes longitude data such that North American data is apparently in China). Further visualisations can reveal the taxonomic spread of reference taxonomies -whether it fits with the ‘hollow curve’ pattern - while a simple table reveals the presence or not of certain data types for each record to give an overall data ‘existence’ profile for the dataset. Selections of parts of the dataset within one visualisation are reflected in the other visualisation displays with a standard linking interaction, allowing the discovery of whether data quality issues are restricted to identifiable sub-portions of the dataset.

VESpeR can handle data sets client-side of almost a million entities within a browser by judicious use of data filtering as many of the data types within individual records are not necessary to judge the geographic, temporal or taxonomic distributions and extents of a dataset. Future work includes the ability to cross-reference a smaller specimen collection against a larger taxonomy to ensure data quality in terms of adherence to reference taxonomies.