Font Size:
The use of optical character recognition (OCR) in the digitisation of herbarium specimens
Building: Grand Hotel Mediterraneo
Room: Africa (formerly America del Sud)
Date: 2013-10-29 04:30 PM – 04:45 PM
Last modified: 2013-10-08
Abstract
At the Royal Botanic Garden Edinburgh (RBGE) the use of Optical Character Recognition (OCR) to aid the digitisation process has been investigated. This was tested using a digitisation process with two stages of data entry. Records were initially batch-processed to add data extracted from the OCR text prior to being sorted based on Collector and/or Country. Additional data were then added using images of the specimens. To investigate whether the data from OCR aid the digitisation process, a team of six digitisers completed a series of trials, using two different protocols for data entry. A survey was carried out to explore the opinion of the digitisation staff to the protocols. In total 7,200 specimens were processed.
When compared to an unsorted, random set of specimens, those which were sorted based on data added from the OCR were quicker to digitise. Of the methods tested here, the most successful in terms of efficiency used a protocol that required entering data into a limited set of fields and where the records were filtered by Collector and Country. The survey and subsequent discussions with the digitisation staff highlighted their preference for working with sorted specimens, in which label layout, locations and handwriting are likely to be similar, and so a familiarity with the Collector or Country is rapidly established.