Missouri Botanical Garden Open Conference Systems, TDWG 2016 ANNUAL CONFERENCE

Font Size: 
Darwin Core Documentation: More (would be) Better
Paula F. Zermoglio, David Bloom, John R. Wieczorek, Robert P. Guralnick, Raphael LaFrance, Laura Russell

Building: Computer Science
Room: Computer Science 3
Date: 2016-12-07 11:00 AM – 11:30 AM
Last modified: 2016-10-22


Data publishers and aggregators, such as VertNet and GBIF, share biodiversity data using the Darwin Core standard. Darwin Core provides definitions of terms and makes recommendations about how to populate the corresponding fields. These recommendations, however, are often not followed by data publishers, which results in highly heterogeneous content. Mapping fields from data sources to Darwin Core can be problematic for a variety of reasons, including misunderstanding and inexact correspondence of concepts. Thus, content meant for one or more Darwin Core fields might be left out or placed incorrectly. Even if mapping is correct, the content of the fields can show great variability in the absence of or lack of adherence to controlled vocabularies. The combination of these problems render data less discoverable and less readily usable than they could be. In order to close the gaps between data availability and discoverability, it is useful, first, to measure the extent to which Darwin Core suggestions are followed by the community. Second, documentation gaps need to be identified and remedied to improve the use of the standard and ultimately favor data usage. In order to address these needs, we have investigated the heterogeneity in the data shared in Darwin Core fields in VertNet and provide evidence of the necessity for better Darwin Core documentation. We have examined the current state of fields that contain information of high value for ecological and evolutionary research, such as taxonomy, sex, and life stage. We also examined fields used by the community to capture a great variety of types of information: dynamicProperties, occurrenceRemarks, and fieldNotes. We will expose a panorama of the current content of these fields and present examples of how data on particular specimen traits (i.e., length and mass) are shared, the degree of their heterogeneity, and how they can be extracted to enhance their discoverability and usability. Finally, we will urge the community to join efforts to improve Darwin Core documentation and provide recommendations to achieve this goal.