Missouri Botanical Garden Open Conference Systems, TDWG 2016 ANNUAL CONFERENCE

Font Size: 
Creature Features: A semantic toolkit for biodiversity trait data
Rob Penn Guralnick, Ramona Walls, John Wieczorek, John Deck, Paula Zermoglio, David Bloom, Laura Russell, Raphael LaFrance

Building: CTEC
Room: Auditorium
Date: 2016-12-06 11:45 AM – 12:00 PM
Last modified: 2016-10-15


For vast areas of the globe and large parts of the tree of life, data on trait diversity are grossly incomplete. When fully assembled, these trait data form the links between the evolutionary history of organisms, their assembly into communities, and the nature and functioning of ecosystems. Recent efforts to close data gaps have focused on collating trait-by-species databases that provide species-level aggregated values or ranges and almost always lack the direct observations on which those ranges are based. Digitized biocollections records collectively contain an under-utilized trove of trait data measured directly from individuals, but this content remains hidden and highly heterogeneous, impeding discoverability and use.  We developed a successful proof-of-concept that targeted body length and mass data found in digitized records published by VertNet, a thematic biocollections publishing platform, demonstrating that extraction, harmonization, and re-provisioning of specimen-level trait data are possible. We also characterized all the other trait contents in VertNet and attempted to align traits broadly to known ontologies.  We report on the outcomes of these efforts in this talk.  We also discuss critical ways to extend this proof of concept to gather other trait data from multiple taxa and to develop a more complete workflow for effective use of these data in research. We refer to our semantically based toolkit as Creature Features.  Creature Features will be a toolkit for assembling trait data from digitized specimen data and have at its foundation ontologies and semantic tools. CF is meant to leverage existing efforts in the model organism community, will be based on a semantic model, and powered by extensible parsers, a backend graph database, and API. A key aspect of Creature Features will be the ability to collect, store, aggregate, and share data at the individual or specimen level and at higher levels without loss of information. We discuss the research potential of such a toolkit and how to develop to most effectively leverage popular portals (e.g., VertNet, iDigBio) and software such as R, in order to make data broadly accessible to scientists in biodiversity and other biological domains.