Missouri Botanical Garden Open Conference Systems, TDWG 2014 ANNUAL CONFERENCE

Font Size: 
Darwin Core Archive (DwC-A) validation: A New Collaborative Effort
Christian Gendreau, David P. Shorthouse, Tim Robertson, Marie-Élise Lecoq

Building: Elmia Congress Centre, Jönköping
Room: Rum 11
Date: 2014-10-30 11:40 AM – 11:55 AM
Last modified: 2014-10-03


The Darwin Core Archive (DwC-A) has become the most commonly used format for exchange of species occurrence data through the Global Biodiversity Information Facility (GBIF) network.  A DwC-A is effectively a collection of well structured field-delimited files, similar to comma separated files, and as such it does not benefit from automatic validation tools such as an XML schema validator.  Furthermore, the DwC standard does not impose strong rules on the content associated with any DwC term.  Because of this, a critical component on the GBIF network is a service to validate DwC-A to ensure syntactical and semantic correctness.

A version of a DwC-A validator has been online for some years [1] but it performs only the very basics of validation.  As such, developers working with DwC-A are required to write code to interpret content further - something that is happening in many projects where the interpretation is being done differently and interpretations are not always available to the original data publisher.

Developers at GBIF, GBIF France, and Canadensys have been working in an informally organized, collaborative project to enhance GBIF's Darwin Core Archive Validator [2]. Our goal is to build a common framework upon which shared and easily expressed custom rules and evaluators can be chained and executed at various stages throughout the data publication lifecycle. The open source dwca-validator will be integrated in projects that require data validation, will have a rejuvenated web page that accepts Darwin Core Archive uploads, and will have a web service for remote validations and support for customized/internationalized reporting. The developers believe that by collaborating on a common library, data quality issues will be addressed earlier in the publishing lifecycle where the original data publisher is involved, and we can reduce costs of development by sharing more functionality.

This presentation will summarise some of the limitations of the existing DwC-A validation services and introduce the open software project being established to address these challenges. The presentation is targeted towards application developers, data publishers and users of data in the DwC-A format.

[1] http://tools.gbif.org/dwca-validator

[2] https://github.com/gbif/dwca-validator