Font Size:
Project Bio-API
Last modified: 2014-09-29
Abstract
Naturalis Biodiversity Center in the Netherlands holds vast quantities of data, originating from its 37M specimen collection and its large Science sector. In the past these data were scattered over numerous datasets owned by different organisations that were merged. In 2013 Naturalis has been migrating most of its data into a few core data management systems: CRS (Central Registration System) for zoological, fossil and geological specimen, Brahms for botanical specimen, Linnaeus Next Generation for species descriptions and the Media Library for multimedia. Two projects to digitize all Naturalis specimen provide 7M additional records to these core systems in 2014-2015. These systems form the basis for a Big Data Information and Communication Technology (ICT) infrastructure, for which initial development is done in 2014 within the Bio-API project. This project will result in two components: the Netherlands Biodiversity Application Programming Interface (NBA) and a BioPortal: an online search engine for Biodiversity data that uses the NBA to retrieve Naturalis data. The NBA will be maintained as an open source project and will provide daily updated services to retrieve zoological, botanical, fossil and geological specimen records, species descriptions and multimedia data. It runs in a cloud and is built with Java, using Elastic Search indexes as its core. It also includes a name resolution service using Catalogue of Life. Data exchange with the core data management systems will be based on the TDWG Access to Biological Collection Data standard (ABCD) and OAI-PMH services for incremental harvesting. OAI-PMH stands for Open Archives Initiative Protocol for Metadata Harvesting. Data transformations are done with the Naturalis ETL Workbench (ETL = Extract, Transform and Load), initially developed by ETI BioInformatics. The NBA is designed to support multiple exchange formats in the future, like Darwin Core, ABCD and Audubon Core. Naturalis also connect its GBIF Integrated Publishing Toolkit instance to the NBA to deliver data to the GBIF network with daily updates. It is planned to extend the NBA in 2015 with data access to 70M observations from Dutch private flora and fauna organisations. Naturalis participates in initiatives towards Linked Open Data networks, in which the NBA will be integrated.