Last modified: 2011-10-12
Abstract
The Biodiversity Heritage Library - Europe project is a 3 years project (2009-2012) that integrates digitized content from 28 major European Institutions and serves it to the global corpus of BHL.
One of the major flaws for content ingested into the BHL-Europe repository is a loss of data integrity. As the content passes several stations during the ingest process (adding new content to the database and storages) it was a major concerns of content providers to ensure integrity of their data.
Biodiversity literature covers more aspects than just bibliographic information therefore several standards have been adopted and combined into one common schema. This monolithic / modular approach ensures that data which was linked together will stay linked during the whole process.
The OLEF-schema ("Open Literature Exchange Format" – http://www.bhl-europe.eu/bhl-schema/v0.3/) was developed to ensure . It covers the ability to combine bibliographical metadata (author, title, publication date, publication place, etc. - Dublin Core Elements), IPR metadtaa, as well as metadata for the digitized item (scan date, image information and file lists, type of page, etc.).
OLEF is capable of including information for names of taxa mentioned in puplications. It allows to “highlight” taxon names for later, efficient finding and processing and thus can be directly connected to the Taxon Names Services (Catalogue of Life, Global Names Architecture, PESI) and include their taxonomic concepts for search term amplification and reverse lookup.
An additional challenge within BHL-Europe was the harmonization of metadata provided by a wide range of content holders. As most partners provide their metadata in different formats and interfaces mapping to OLEF is required. A mapping tool (Schema Mapping Tool / SMT http://bhl.nhm-wien.ac.at/smt/launch.html ) was established to integrate mapping functionality into the ingest process of BHL-E.
Several standard mappings for MODS, DC, MARC21, ESE, etc. are embedded into the software and can be processed automatically.
The SMT was designed to not only be capable of mapping data to OLEF but rather between arbitrary input and output schemas. Even actual data processing can be adopted by the SMT as it allows to change output formats, processing offsets, etc.
The talk will provide an overview of both the OLEF schema and the SMT and its implementation in the BHL-Europe environment.