Persistent identifiers for museum specimens in Norway
Dag Endresen, Christian Svindseth

One of the key challenges in maturing biodiversity informatics infrastructures is the integration of data across disparate databases. Implementation of persistent and globally unique identifiers (PIDs) for specimens held in natural history collections will open up new opportunities for referring to these physical resources in an interlinked digital context such as the Internet. Different approaches for declaring and resolving such persistent identifiers are now being implemented for the first natural history collections. The catalog numbers assigned to almost all of the specimens in the Natural History Museum at the University of Oslo (NHM-UiO) are unique and will identify the specimens unambiguously given the context of the museum. However, in a larger context, such as the information network established by GBIF, the catalog numbers are often no longer sufficiently unique. Alternative solutions such as the Darwin Core triplet, composed of an institution code, collection code, and catalog number separated by a colon, have been explored with various success. The Darwin Core triplet aim to combine the catalog number with additional identifiers or strings that encode for the local context where the catalog numbers are locally unique. This approach has proven challenging partly because of the lack of persistent codes/identifiers for the institutions and collections.

NHM-UiO has implemented an approach for persistent identifiers (PIDs) using universally unique identifiers (UUIDs) prefixed by a persistent uniform resource locator (PURL) “http://purl.org/nhmuio/id/[UUID]”. The identifiers are optimized for machine readability by encoding the http-PURL-UUID-string as QR codes or data matrix barcodes before they are attached to the labels of the physical specimens. The UUID provides the globally unique identity for the physical specimen. The http-PURL prefix is part of the identifier and provides the redirection (HTTP 303 "see other") to the resolver service located at “http://gbif.no/resolver/[UUID]”. Using content negotiation the user or machine can access descriptive information as html, comma-separated-values (csv), tab-delimited text-files, n3/turtle RDF data, and json. These formats can also be accessed directly (without calling content negotiation) using the file extension "http://gbif.no/resolver/[UUID].[html|csv|txt|n3|json]". All Norwegian institutions are invited to use this resolver service at "http://purl.org/gbifnorway/id/[UUID]" when publishing biodiversity data to GBIF. All occurrence records published through GBIF-Norway, with appropriate identifiers (mapped to occurrenceID), will be added to the resolver service. After the first year of operation new persistent identifiers have been assigned and physically attached to more than 250 000 (255 046) specimens held by NHM-UiO.