Missouri Botanical Garden Open Conference Systems, TDWG 2014 ANNUAL CONFERENCE

Font Size: 
Optical character recognition (OCR) in linking entomological labels with field notebook data
Tero Mononen, Riitta Tegelberg, Janne Karppinen, Mira Sääskilahti, Hannu Saarenmaa, Tommi Koskinen, Jyrki Muona

Building: Elmia Congress Centre, Jönköping
Room: Rydbergsalen
Date: 2014-10-27 04:00 PM – 04:20 PM
Last modified: 2014-10-03


The labels of pinned insect specimens are very small, and for practical reasons many collectors have used separate field notebooks for recording information about the specimens. Notebooks are mostly catalogues that show, in running order, sample numbers and detailed information about the specimens. Amateur entomologist Gunnar Blomqvist collected during 1930-1960s around 14,000 Coleoptera specimens, mostly from Finland and representing over 2,200 species. Blomqvist notebooks were fully digitized (imaged, transcribed, and denormalised in an SQL database) by the Finnish Museum of Natural History in 2009. Recently, the entire collection of Blomqvist was imaged at Digitarium using an automated imaging line, producing images of individual specimens with their labels. From the images, optical character recognition (OCR) is being used to test whether data from the specimen labels can be combined with the specimen data from the notebooks. From the labels, the required information is only the year of collection and the field number (i.e. notebook number). When this is solved by using OCR, then data such as taxonomic name, collection locality and date, and occasionally some other information could be retrieved from the database of the digitized notebooks. More information about the imaging project for these specimens can be found at http://www.digitarium.fi/en/content/mass-digitisation-pinned-insects and more about the digitization of field notebooks at http://digit.luomus.fi/node/5804.