Missouri Botanical Garden Open Conference Systems, TDWG 2015 ANNUAL CONFERENCE

Font Size: 
Application of the BARCODE Data Standard: The Barcode of Wildlife Project for Endangered Species
David E. Schindel

Building: Windsor Hotel
Room: Oak Room
Date: 2015-09-28 02:00 PM – 02:15 PM
Last modified: 2015-09-02

Abstract


The Consortium for the Barcode of Life (CBOL) was created in 2004 with a mission to promote DNA barcoding as a global standard for species identifications. This mission started with development of a data standard that would ensure credibility, transparency and reliability, including extensions of the standard from mitochondrial COI for animal barcoding to standard regions for land plants, fungi and protists.

CBOL’s Database Working Group conducted a year-long community consultation that resulted in a data standard (http://www.barcodeoflife.org/sites/default/files/DWG_data_standards-Final.pdf) that was approved in mid-2005 by CBOL’s Executive Board, GenBank, and the International Nucleotide Sequence Database Collaboration (INSDC, made up of GenBank, European Molecular Biology Laboratory, and DNA Databank of Japan). The standard included requirements for a unique voucher specimen identifier, a formal or provisional species name, information about PCR (polymerase chain reaction) primers and collecting location, and a minimum sequence length and maximum number of ambiguous base calls.  Records submitted to INSDC that met these requirements were given the reserved keyword “BARCODE."

Almost 1 million potential BARCODE records were submitted to GenBank by 2011 but there were very few applications of barcoding by government agencies and private companies. Many were interested in barcoding but they lacked confidence in what was considered an academic database. In their view, regulatory and forensic use of the DNA barcode reference library called for a more complete data standard and an associated validation system.

GenBank found that compliance with the standard was low and more than half of the million+ records were suppressed. CBOL has found significant non-compliance among the remaining records, primarily in the area of linkage to voucher IDs. The standard also lacked specification of sequence quality or standard protocols for sequence trimming.

At the same time, CBOL received Google support for a DNA barcoding initiative (http://barcodeofwildlife.org) devoted to endangered species. Using DNA barcode data in courtroom prosecution required augmentation of the standard in several areas: reliability of taxonomic identifications, use of ‘e-vouchers’ rather than traditional preserved voucher specimens, and designation of chain-of-custody status.

These experiences show how a data standard evolves in response to users with real-world applications. CBOL is conducting research on sequence data quality to support refinement of the data standard and to inform development of an online data validation portal.