Missouri Botanical Garden Open Conference Systems, TDWG 2014 ANNUAL CONFERENCE

Font Size: 
Tuning the Citizen Science “Instrument” for Gathering Biodiversity Data and Maintaining Data Quality
Robert D. Stevenson, Yurong He, HeeJun Kim, Todd E Suomela

Building: Elmia Congress Centre, Jönköping
Room: Rydbergsalen
Date: 2014-10-30 03:15 PM – 03:30 PM
Last modified: 2014-10-03


Citizen science (CS) is a novel “instrument” that can gather biodiversity data over spatial and temporal scales unavailable to scientists using traditional methods. Recent advances in internet technologies and mobile computing are accelerating the possibilities to engage the public in participating in CS Biodiversity Projects (BPs). Projects such as eBird and REEF are providing data that is proving exceptional to scientists. Nevertheless, both the scientific community and the public remain skeptical about the quality of CS data because citizens by definition are not “certified” scientists. To overcome this hurdle, CS projects use a variety of approaches to improve data quality. We surveyed the CS literature and reviewed project web sites to understand the approaches that BPs have employed. Building on the work of Wiggins et al 2011 (Mechanisms for Data Quality and Validation in Citizen Science), we have constructed a list of mechanisms that can be used to improve data quality. These mechanisms are usefully grouped in categories (before, during and after data collect) and will also depend significantly on the kind of data collected. Data generated by BPs can typically be placed into four categories A) classifying sounds or images (BatDetective, WhaleFM, Floating Forests), B) gathering specimens (School of Ants, MicrobeNET), C) making digital recordings (Lost LadyBug, iNaturalist, iBats), and D) reporting personal observations (eBird, REEF, Great SunFlower Project).  The data quality for classification projects (A) is often controlled and measured by having three to five participants classify the same sample.  The data quality for categories B and C have a form of evidence that others including experts can examine.  Category D, personal observations, is the most challenging for data quality and requires the most trust of the participants’ contributions. When comparing our categories of mechanisms to DataONE’s data life cycle steps, it is clear that both domain scientists and information specialists are needed to manage data quality.  The choice of mechanisms to improve data quality varies significantly with regard to activities prior to collecting data (quality assurance plans, mentoring, training, testing).  This variability may represent a larger essential tension between engaging and keeping citizens contributing to the project (building the instrument) and improving the data quality (having citizens work like scientists to ensure the instrument quality).  Currently there are no guidelines, best practices, or establish standards for documenting data quality for biodiversity projects or citizen science projects generally.