Font Size:
Darwin Core Archives and the Encyclopedia of Life
Building: Grand Hotel Mediterraneo
Room: America del Nord (Theatre I)
Date: 2013-10-30 04:55 PM – 05:05 PM
Last modified: 2013-10-05
Abstract
In the early stages of the Encyclopedia of Life (EOL) project, the only way to contribute large datasets was to prepare an XML file conforming to a custom XML Schema (XSD). This schema reuses elements from many popular schemas including Dublin Core and Darwin Core, but the exact structure is specific to EOL. It is fairly complex in that there are multiple related main types and some elements have attributes. Some of the main types include Taxa that can have common names and references, and Media which can have references and agents (authors, contributors, etc.).
In 2011 it was decided to allow contributions to EOL based on the Darwin Core Archive (DwC-A) format, which was already popular with another large-scale indexer, the Global Biodiversity Information Facility (GBIF). Early in the process of adopting this format for use with EOL, it was apparent that certain complexities in the existing XML schema would need to be flattened or denormalized in order to fit the 'core plus extensions' model of DwC-A. For example, if Taxon was the core of the archive and Taxon had a Reference extension and a Media extension, it would not be possible to declare Media uses the same Reference extension, nor could Media have its own Agent extension. More recently, EOL is interested in accepting several other data types including measurements of taxa and associations among taxa, which make it even harder to describe all these entities in the existing DwC-A format. This presentation will describe work EOL has done to explore changes to the DwC-A meta file to allow relationships among multiple types, for example extensions of extensions. The presentation will be a preface to a discussion about several potential additions to DwC-A including declaring multi-value delimiters, data types and validations, perhaps leading to a set of recommendations that could make DwC-A more expressive while maintaining backwards compatibility.
In 2011 it was decided to allow contributions to EOL based on the Darwin Core Archive (DwC-A) format, which was already popular with another large-scale indexer, the Global Biodiversity Information Facility (GBIF). Early in the process of adopting this format for use with EOL, it was apparent that certain complexities in the existing XML schema would need to be flattened or denormalized in order to fit the 'core plus extensions' model of DwC-A. For example, if Taxon was the core of the archive and Taxon had a Reference extension and a Media extension, it would not be possible to declare Media uses the same Reference extension, nor could Media have its own Agent extension. More recently, EOL is interested in accepting several other data types including measurements of taxa and associations among taxa, which make it even harder to describe all these entities in the existing DwC-A format. This presentation will describe work EOL has done to explore changes to the DwC-A meta file to allow relationships among multiple types, for example extensions of extensions. The presentation will be a preface to a discussion about several potential additions to DwC-A including declaring multi-value delimiters, data types and validations, perhaps leading to a set of recommendations that could make DwC-A more expressive while maintaining backwards compatibility.