Missouri Botanical Garden Open Conference Systems, TDWG 2016 ANNUAL CONFERENCE

Font Size: 
The Open Biodiversity Knowledge Management System: A Semantic Suite Running on top of the Biodiversity Knowledge Graph
Viktor Senderov, Teodor Georgiev, Donat Agosti, Terry Catapano, Guido Sautter, Éamonn Ó Tuama, Nico Franz, Kiril Simov, Lyubomir Penev

Building: CTEC
Room: Auditorium
Date: 2016-12-05 02:30 PM – 02:45 PM
Last modified: 2016-10-15

Abstract


The Open Biodiversity Knowledge Management System (OBKMS) is a suite of semantic applications and services running on top of a graph database storing biodiversity and biodiversity-related information, known as the Biodiversity Knowledge Graph. The main purpose of OBKMS is to provide a unified system for interlinking and integrating diverse biodiversity data such as taxon names, taxon concepts, treatments, specimens, occurrences, gene sequences, bibliographic information, and others.The graph is serialized as Resource Description Framework (RDF) quadruples, extracted primarily from biodiversity publications. Options for expressing Darwin Core encoded data as RDF for insertion in the graph are explored.


Biodiversity publications provide a rich source of high quality data. In order to be able to convert such data into RDF, we have developed a general semantic model in support of information extraction from prospectively published and legacy taxonomic literature. We chose a number of ontologies from the publishing and biological domains to incorporate in our model. In addition to the utilization of Darwin Core Filtered Push (http://filteredpush.org/ontologies/FP/2.0/dwcFP.owl), together with Plazi, we have extended the Treatment Ontology for knowledge representation of current and legacy biodiversity publications. We understand a treatment to contain the informational value of a taxonomic concept and designed the ontology as such. Furthermore, the semantic model allows the expression of relationships between taxonomic concepts using Region Connection Calculus, RCC-5 (https://en.wikipedia.org/wiki/Region_connection_calculus). These relationships (congruence, inclusion, overlap, exclusion) are not usually found in old biodiversity publications where only nomenclatural relationships are allowed. However, for new publications, we are in the process of modifying Pensoft’s ARPHA Writing Tool (AWT) and the XML schemas, to allow authors to enter such information during the authoring process.


The system is currently in prototype stage and incorporates information extracted from Plazi’s TreatmentBank, as well as from the archives of ZooKeys and Biodiversity Data Journal. The system is designed also as a source for generating nanopublications.

The system is intended for different groups of users. Biodiversity scientists can use it, for example, to retrieve all taxonomic information associated with a name. Ecologists can use geographic search to locate taxon information associated with a region on a map. Collection managers can track if and where their specimen data have been published. Data aggregators can use the system to extend their stores. Biomedical scientists can make use of the linking of taxon and gene information.