Missouri Botanical Garden Open Conference Systems, TDWG 2015 ANNUAL CONFERENCE

Font Size: 
A simple model for large-scale data mobilization across a diverse organisation
Nicky Nicolson

Building: Windsor Hotel
Room: Oak Room
Date: 2015-09-29 04:00 PM – 04:30 PM
Last modified: 2015-08-30

Abstract


The talk presents a simple model for the mobilization of biodiversity data from a data rich, diverse organisation, based on open source tools compatible with those taught in the data carpentry syllabus.

The talk presents an open-source toolkit (https://github.com/RBGKew/Reconciliation-and-Matching-Frameworkhttp://data1.kew.org/reconciliation/) to configure an Open Refine (http://openrefine.org/) compatible reconciliation service over any tabular file or structured database. "Reconciliation" is the process of converting a text string representation of a thing into a usable identifier for that thing, e.g. to convert the text string "Tahina spectabilis" to  "http://ipni.org/urn:lsid:ipni.org:names:77086615-1".  Although the toolkit was developed first for scientific name reconciliation, it can be configured to reconcile any entity type (people, specimens etc). Micro-components of the tool (for data transformations - https://github.com/RBGKew/String-Transformers) are available as drop-ins in the Open Refine data cleaning package. This approach is an alternative to existing services development, which have largely been aimed at technical users. The guiding principle is to open data services to a wider range of users by lowering the barrier to entry, such that hands-on scientists and data curators - those who know their data best - can link it with external sources. Technical choices were made to fit with approaches taught in the software and data carpentry initiatives (http://datacarpentry.org/). The toolkit aids progress towards Tim Berners-Lee’s Linked Open Data principle #4 "Refer to other things using their HTTP URI-based names when publishing data on the Web" and shows how we can build the foundations of the biodiversity knowledge graph.