Missouri Botanical Garden Open Conference Systems, TDWG 2011 Annual Conference

Font Size: 
Digitarium's Digitisation Process Used for the Natural History Collection Specimens
Juha Lehtonen, Susanne Heiska, Mika Pajari, Riitta Tegelberg, Hannu Saarenmaa

Last modified: 2011-09-10

Abstract


Digitarium is a joint initiative of the Finnish Museum of Natural History and the University of Eastern Finland. It was established in 2010 as a dedicated shop for large scale digitisation of collections. Here we give an overview of the digitisation process as defined at Digitarium. It is driven by a custom-made software called JJClient (JJC). The process currently includes six different phases, and the JJClient uses SSH (Secure Shell) connection to move data between the phases:

1) The original physical material is received from the customer. A metadata entry is made about the received material and agreements.

2) The samples are barcoded. A globally unique HTTP URI and a 2-dimensional QRCode barcode is given for the sample. The ID tag will be glued to the paper sheet or pinned in needle of an insect sample.

3) All samples are imaged with high quality digital camera. The labels from insect specimens are removed and placed temporarily on a sheet of cardboard. An empty Darwin Core XML template and the TIFF images are created. JJC uploads the data in the server and serves the data for the next phase over the network. The physical samples are not needed anymore, which reduces their use to minimum.

4) Recording phase: Data is entered from the labels shown in the images to the XML template by using the JJC. The recorder downloads the images and XML template from the server and fills the data in the fields shown in the user interface of the JJC. In the end of the phase, the JJC uploads the data to the server.

5) Validation of the data entry, where also additional steps can be included, such as georeferencing and data filtering (used e.g. for endangered species). The georeferencing is a semi-automatical component integrated in the JJC, which uses web services such as GEOLocate. The validation process itself is technically similar to the recording phase: The data is downloaded from the server, updated and uploaded back to the server with the JJC. Since the recording and validation are done from the images, this allows the recorder and the validator to do their work remotely.

6) Archiving and publishing the digital data, and delivering data and specimens back to the customer. The verified images are automatically stored and published in the Morphbank database and the XML files in the GBIF IPT database. The Metacat service is being evaluated as the long-term archival solution. All these databases run in the Digitarium's server. If the publication has not been agreed with the customer, the data is retained for Digitarium's internal use.

Implementation of the process described here is by no means fully completed. The process is being tested and refined. Scaling up of capacity will happen gradually, but it is too early to estimate what level of efficiency will be achieved.

URLs: http://digitarium.fi, http://morphbank.digitarium.fi/, http://ipt.digitarium.fi/