Missouri Botanical Garden Open Conference Systems, TDWG 2011 Annual Conference

Font Size: 
Crowd sourcing record transcription to unlock historical species data from natural history collections
Andrew W Hill, Robert Guralnick, Mark W Westneat, Robert Prys-Jones, J Patrick Kociolek, Javier de la Torre

Last modified: 2011-09-15

Abstract


Despite the well-documented value of biocollections for science and society, the ability of researchers and policy makers to utilize this resource has been slowed because a wealth specimen data are sequestered within collections institutions in analog formats. By unlocking biocollections from analog formats, we open the doors to new research about species responses to climate change, patterns of species invasion, and ecosystem response to human interaction, to name a few. We believe a quick, accurate, and economical means of digitizing the remaining biocollection data is to harness the power of crowd sourcing. We have begun a collaborative pilot project between The Natural History Museum in London, the University of Colorado Museum of Natural History, and the Field Musuem of Natural History and Vizzuality to transcribe data from natural history collection ledgers and specimen labels through a voluntary crowd-sourced application.

In addition to the new species data we can unlock, we feel that we can use additional information gathered to demonstrate how cost effective crowd sourcing can be while remaining highly accurate. We will utilize a large set of yet untranscribed ornithological ledger pages provided by the Natural History Museum, London as well as a diverse set of specimen label images and ledgers from the Field Museum.  Voluntary participants will play a game where they will be asked to study digital images of collection ledger pages and transcribe the hand written information. All transcription tasks will be replicated by multiple volunteers. In this way, we can collect information, including how frequently volunteers performing replicate transcription tasks record differing results, how long it takes to gather transcriptions, and overall and unit costs of data transcription. We will compare our measurements with data collected by the University of Colorado covering the transcription efforts of trained staff at the Museum of Natural History.

While the data we unlock from the Natural History Museum, London analog pages will feed directly into other research products from our team related to documenting anthropogenic impacts on biodiversity, we also hope that the comparative data on trained transcription efforts versus crowd sourced efforts can help us determine the viability of crowd sourcing to unlock the worlds remaining biological collections data. Like another project we are involved in, Old Weather, we believe that enthusiastic community members can be relied upon to help us transcribe museum data quickly, cheaply, and with high quality. If successful, we plan to open the project up to any number of museums that can meet a small set of minimum requirements, including high quality images (either ledger or specimen labels) and a commitment to freely sharing the data generated.