Missouri Botanical Garden Open Conference Systems, TDWG 2014 ANNUAL CONFERENCE

Font Size: 
Capturing Inventory level information about collections as a step in object to image to data workflows
Paul J Morris, James Hanken, David Lowery, Bertram Ludäscher, James A. Macklin, Robert A Morris, Tianhong Song, Patrick Sweeney

Building: Elmia Congress Centre, Jönköping
Room: Rydbergsalen
Date: 2014-10-27 02:00 PM – 02:20 PM
Last modified: 2014-10-03


Natural Science collections tend to be highly organized for storage and retrieval having material with the same current identification stored together and often secondarily segregated by geography. The information content associated with the storage system can be exploited to improve digitization efficiency. In object-to-image-to-data workflows, imaging of specimen label data separates physical specimen handling from the capture of data associated with the specimen. If information about the storage of the collection can be carried through the imaging step into data records, it is possible to capture inventory-level information about the collection prior to imaging, and to associate that information with images.

In the New England Vascular Plant (NEVP) Thematic Collections Network (TCN), and in digitization projects in the entomology collections in the Museum of Comparative Zoology (MCZ), an inventory level pre-capture pass is performed in collections. The current identification of specimens is printed out, in machine-readable form, associated with the physical storage units (unit trays or folders), and then processed in the imaging step to create skeletal data records to accompany the image. In both cases, structured data is stored in JSON in a QR-code 2d barcode, produced by the MCZ's open source DataShot software, which uses the ZXing library for barcode generation and reading.

The MCZ entomology projects exploit the organization of the collection into unit trays, which contain material sharing a current identification. A machine-readable label containing the current identification (and often the drawer number) is printed out for each unit tray. During imaging, an individual specimen is removed from the unit tray, its labels are removed, and the specimen and its labels are placed on a jig. The machine-readable label containing the current identification for the unit tray is included in the image, as is a machine-readable label containing the catalog number for the specimen. The imaging step involves only specimen handling and imaging, and no data capture, but machine processing of the image creates skeletal database records that contain the current identification, drawer number, and catalog number.

Processing of machine-readable data in the image allows decoupling of capture of information inherent in the storage, specimen handling, and electronic data creation. A similar pass through the collections has been performed in the NEVP TCN, printing out the current identification (and often geographic information) for material in a folder in machine-readable form. During the imaging and handling step, at a primary digitization apparatus (which may not be co-located with the collection), this information is captured with a barcode scanner, as is the barcode/catalog number of the specimen. In addition, locality, collector name, collector number, and collection date are transcribed by an operator of the primary digitization apparatus during the imaging step. This creates a minimal data record of scientific name, town in which the material was collected, and the date it was collected. This information is then encoded in an OA annotation for transfer to consuming systems (Symbiota, and the database of record), allowing decoupling of specimen handling and skeletal record creation from database systems.