Missouri Botanical Garden Open Conference Systems, TDWG 2016 ANNUAL CONFERENCE

Font Size: 
Streamlining the Flow of Taxon Occurrence Data Between a Manuscript and Biological Databases
Viktor Senderov, Teodor Georgiev, Lyubomir Penev

Building: CTEC
Room: Auditorium
Date: 2016-12-09 09:00 AM – 09:15 AM
Last modified: 2016-10-16


Taxonomic practice dictates that authors cite the occurrences their analysis is based on (materials) in the treatment section of the taxonomic paper. Information on occurrences of species could be stored in different biodiversity databases such as GBIF, PlutoF, iDigBio, and GBIF. Manual entering of occurrence records into a taxonomic paper is error-prone and time-consuming. This is why we developed an API (Application Programming Interface)-based material import for Pensoft’s ARPHA Writing Tool and consequent submission, peer review and publication in the Biodiversity Data Journal.

Ultimately, for an author of a taxonomic publication there are three use cases: (1) Occurrence records have not been digitized before. In this case, manual entry is always needed. (2) Occurrence data have been been deposited at data aggregators and available online from there. We developed an automated import for this case. (3) The data are available in а structured format, e.g in a Darwin Core (DwC) compliant Excel spreadsheet. We developed a tool for import of such data direct into а manuscript.

In order to import occurrence information from from GBIF and iDigBio, we rely on the DwC format.  For systems such as BOLD Systems and PlutoF API’s that do not use DwC, we developed a mapping between their terms and DwC.

Tracking the usage of occurrence records in publications is important for authors and collection managers. Where an occurrence ID is present (a persistent unique identifier of the occurrence), tracking is always possible. However, not all occurrence records have an occurrence ID. In this case a DwC triplet is used. We discuss how well different databases support these two approaches.

Finally, DNA-based species are often grouped in Operational Taxonomic Units (OTU’s). We are able to import all occurrence information from a BOLD OTU identified by a Barcode Index Number (BIN). This streamlines the formal taxonomic description and naming of DNA-based taxa. This is important as the number of dark taxa is rising and a fast way of taxonomic descriptions for DNA-based taxa is desperately needed.

Our workflows will also act as a curation filter for occurrence data as, once data are imported into a manuscript in the publication pipeline, their accuracy is expected to be vetted by authors and reviewers.

Finally, if authors wish to publish complete occurrence datasets as data papers, we have developed an automatic generation of manuscripts from metadata expressed in Ecological Metadata Language (EML).