Missouri Botanical Garden Open Conference Systems, TDWG 2014 ANNUAL CONFERENCE

Font Size: 
NoSQL and Linked Data for collection and taxonomic data
Anaïs Grand, Mohamed Berkani

Building: Elmia Congress Centre, Jönköping
Room: Rydbergsalen
Date: 2014-10-28 03:10 PM – 03:30 PM
Last modified: 2014-10-03


The lack of explicit links between data may make it difficult to manage and query collection databases, such as the search for a given specimen in natural history collections. We consider that the Linked Data approach is relevant to link collection data (with information about specimens) to taxonomic referentials (with information about taxa and their scientific names), referentials for taxonomic classifications and bibliographical databases. Data can be represented as graphs (NoSQL technologies) with Neo4j or Oracle Spatial and Graph for instance. Each node of the graph is a data point linked to another one by a branch of the graph. The nature of each node (i.e., its semantics) must be specified when the node is created. However, the various categories of nodes are represented as a flat (i.e., non-structured) list; for this reason, we emphasize the importance of making graphs and ontologies interoperable, in order to add explicit semantics and standards. With this approach, integrating databases does not make it mandatory to modify the various formats of databases in favour of a common format. For instance, the data from the national Museum of natural History, Paris (MNHN) are available in rdf files with the DarwinCore standard at science.mnhn.fr. We demonstrate that hundreds of data points from the collections of the MNHN can be linked to data from the taxonomic referential Taxref (developed at the Service du Patrimoine naturel, SPN MNHN) with Neo4J and illustrate some queries. We highlight research and development directions for the management of collection and taxonomic data.