Missouri Botanical Garden Open Conference Systems, TDWG 2011 Annual Conference

Font Size: 
Biodiversity Data Digitizer - A tool for a richer biodiversity data content digitization
Antonio Mauro Saraiva, Allan Koch Veiga, Etienne Americo Cartolano Junior

Last modified: 2011-10-11

Abstract


The Biodiversity Data Digitizer (BDD) is a web-based information system designed for easy digitization, manipulation, and publication of quality biodiversity data. It stands out by allowing the user to manipulate its data simply and objectively, especially the data from field observations and small collections. 

BDD was designed in a multi-module architecture, each module dealing with a different data domain. Its modules allow digitization of data about: species occurrence, species, specimen interaction, multimedia resources and bibliographic resources. These modules feature the digitization (create) and handling (update, delete and search) of records. They also share and interrelate information among them. Most modules were developed based on standards published by TDWG, allowing publication and sharing the data stored in BDD to other systems via the TDWG Access Protocol for Information Retrieval (TAPIR).

The species occurrence module is based on Darwin Core (DwC) standard that is centered on taxa and their occurrence in nature as documented by observations and collections. These records can be related to records from the modules of multimedia resources and bibliographic resources, which are based on Multimedia Resources Metadata Group (MRTG) schema and Dublin Core, respectively. These two modules allow recording and uploading of still images, video or sounds, for example, and books or articles, respectively. The records digitized with the species occurrence module can also be used in the specimen interaction module, which uses the DwC Interaction Extension, developed and used by the IABIN Pollinators Thematic Network (PTN). The species module was based on Plinian Core and DwC schemas.

A BDD distinguishing feature is the assistance to improve and maintain data quality. For preventing typing errors, most of the text fields of all modules use a fuzzy matching technique in an autocomplete resource. This technique allows retrieving textual data, which is orthographically similar. Also using this technique it allows retrieving taxon names from the Catalog of Life (CoL). When the user fills in a taxon name field all other fields linked to it (kingdom, phylum, etc.) can be automatically filled with a suggested hierarchy, enhancing and completing the data record and decreasing the chance of entry errors. For improving the completeness and accuracy of location data it provides techniques of (reverse) georeferencing and an interactive map for obtaining location information from BioGeomancer, Google Maps and GeoNames data sources.

New modules, always keeping data quality in mind, are being developed, such as a module to perform statistical analyses and visualization of the database content through maps and charts, a module to synchronize electronic spreadsheets by means two web services for importing and exporting records, and data cleaning tools for batch detection and correction of errors in the database. Two other modules allow digitization of pollinator monitoring and pollination deficit data based on field protocols defined at the FAO pollinator project.

BDD was an outgrowth of the Pollinator Data Digitizer (PDD), which was developed within the scope of IABIN PTN. It is based on open source technologies, including Javascript, PHP and Java programming languages, jQuery and Yii frameworks and PostgreSQL database.