Font Size:
HelpingScience: Online service for processing label data from digital herbarium specimen sheets
Last modified: 2011-09-10
Abstract
With the large increase in herbaria digitizing specimen sheets, services for processing large amounts of specimen data has great appeal. Online services such as the use of citizen scientists, along with volunteers and crowd sourcing, are ideal solutions for rapidly processing label data.
The service www.helpingscience.org has been created to provide a realistic solution for rapidly processing label data. It has been designed to work with all types of herbarium labels, new and old, English and other languages, to allow for the international community to volunteer in the transformation of digital labels into Darwin Core records. With the use of cloud computing, a mixture of Artificial Intelligence (AI), known checklists, citizen scientists, and volunteers, there is the ability to process thousands of labels in parallel and quickly learn what historical information we hold in our herbaria.
This service focuses on three main steps we call stages: identifying the location of labels and determinations on the entire specimen image, then identifying Darwin Core fields on those individual labels, and finally key stroking (data entry) these individual Darwin Core fields. With each stage we are reducing the amount of individual work to allow for parallelization of efficient man hours and to make use of distributed computing in the cloud. The constant and continued use of AI and redundant data entry help ensure quality control to produce an accurate Darwin Core record.
The service www.helpingscience.org has been created to provide a realistic solution for rapidly processing label data. It has been designed to work with all types of herbarium labels, new and old, English and other languages, to allow for the international community to volunteer in the transformation of digital labels into Darwin Core records. With the use of cloud computing, a mixture of Artificial Intelligence (AI), known checklists, citizen scientists, and volunteers, there is the ability to process thousands of labels in parallel and quickly learn what historical information we hold in our herbaria.
This service focuses on three main steps we call stages: identifying the location of labels and determinations on the entire specimen image, then identifying Darwin Core fields on those individual labels, and finally key stroking (data entry) these individual Darwin Core fields. With each stage we are reducing the amount of individual work to allow for parallelization of efficient man hours and to make use of distributed computing in the cloud. The constant and continued use of AI and redundant data entry help ensure quality control to produce an accurate Darwin Core record.