Missouri Botanical Garden Open Conference Systems, TDWG 2016 ANNUAL CONFERENCE

Font Size: 
Large-scale digitization of herbarium specimens: development and usage of an automated, high-throughput conveyor system
Patrick Sweeney, Charles Davis, Paul Morris, Binil Starley

Last modified: 2016-09-29


The billions of specimens housed in natural science collections provide a tremendous source of under-utilized data that are useful for scientific research, conservation, commerce, and education. Digitization and mobilization of specimen data and images promises to increase their utilization. While digitization of natural science collection specimens has been occurring for decades, the vast majority of specimens remain un-digitized.

If the digitization task is to be completed in the foreseeable future, innovative, high-throughput approaches are needed. To create a data set for the study of global change in New England, we designed and implemented an industrial-scale, conveyor-based digitization system for herbarium specimen sheets. The system implements a variation of an object to image to data workflow that prioritizes imaging and the capture of collections-level data.

Using our system, we digitized almost 350,000 specimens over a 131-week period. Overall, the average time between successive digitized specimens was about 35 seconds (for intervals between images of 30 minutes or less). This rate was in line with our pre-project expectations for our approach. Our throughput rates are comparable with other similar, high-throughput approaches focused on digitizing herbarium sheets and exceeds rates achieved with more conventional approaches. The conveyor apparatus software, database schema, configuration files, hardware list, and conveyor schematics are available for download on GitHub.