Missouri Botanical Garden Open Conference Systems, TDWG 2014 ANNUAL CONFERENCE

Font Size: 
Data Discovery and Doer Happiness: Uses for Optical Character Recognition (OCR) Output
Deborah Paul, Andrea Matsunaga, Miao Chen, Jason Best, Sylvia Orli, William Ulate, Reed Beaman

Building: Elmia Congress Centre, Jönköping
Room: Rydbergsalen
Date: 2014-10-27 02:20 PM – 02:40 PM
Last modified: 2014-10-03


Are you planning to digitize the specimen collections at your institution or museum? Are you in the midst of digitization and looking for ways to improve the process? Perhaps you are capturing only minimal, or so-called "skeletal data records" in your museum collections digitization workflow. How do you plan to fill in the rest of the data? If you are imaging museum specimen data, like field notes, note cards, or specimen labels, how do you facilitate rapid, accurate transcription and validation? Is it possible to make data transcription and validation a more engaging task? With OCR output, some open source tools, and perhaps existing minimal data, it's possible to visualize, query, complete, and validate your data records in ways that not only improve data quality, but also reveal data you wouldn't think to look for and enhances the work experience for transcribers and quality assurance staff. Building on the work of Drinkwater (et al. 2014, http://phytokeys.pensoft.net/articles.php?id=1533), participants at the iDigBio Citscribe Hackathon demonstrated how OCR output, as well as the data already entered in a database, can be used to improve digitization workflows and the user experience.