Missouri Botanical Garden Open Conference Systems, TDWG 2011 Annual Conference

Font Size: 
Enlisting the Use of Educated Volunteers at a Distance: Or, Why Crowdsourcing and Citizen Science Will NOT Create Nightmare Zombies That Will Destroy Us All
Andrea Thomer, Robert Guralnick

Last modified: 2011-09-12

Abstract


In an era dominated by anthropogenic change, biocollections contain rich time-series data which are critical for documenting current biodiversity, reconstructing how it has changed over time, and investigating the causes of these changes. Unfortunately, the vast majority of biocollections are not digitized, and methods to dramatically increase the rate of digitization are desperately needed. “Crowdsourcing” label transcription to citizen scientists is one promising method that will have tangible and intangible benefits for our community. However, a set of thorny questions remain regarding how to develop these projects to best leverage maximum benefit for all parties. We focus on three questions that the community seems most interested in answering based on listserv chatter and blog comments:

1) What quality of data will come from this endeavor?  Will citizen scientists be able to interpret and adequately transcribe the relatively esoteric species names and localities that are found on handwritten labels?

2) Can we develop tools that help citizen scientists make good decisions and improve the quality of data during the process of transcription?

3) What are the motivations and rewards for citizen scientists who devote time to transcribing natural history collections ledgers and labels? How can these be encoded into applications?  

 

We first provide initial data and results from a pilot study documenting how quickly "non-experts" can accumulate data about taxa via web-based resources, and then use these resources to digitize specimen labels of taxonomic groups most “citizens” do not already know. Next we suggest a simple and novel higher-level architecture for such projects that integrates existing knowledge and improvements in data quality directly into transcription workflows. Many of these methods are already being used, piecemeal, by existing digitization projects (e.g. Atlas of Living Australia and Old Weather) and could easily be further integrated to create more complete user interfaces. Doing so requires, a) tapping into already existing, trusted and well-vetted data sources (such as the Encyclopedia of Life, The Biodiversity Heritage Library, VertNet, GenBank and more) and utilizing them to guide the accumulation of new data, and b) showing a skeptical community the value of expanding a dataset’s “chain of custody” to include strangers at a distance. We close by focusing on the "intangible benefits" that citizen science can provide to the community; not only does it allow us to capitalize on pre-existing investments in web resources and data infrastructure, but it also concretely fulfills an oft-made promise regarding the digitization of biocollections: that it will facilitate education and outreach to a broader audience.