Missouri Botanical Garden Open Conference Systems, TDWG 2011 Annual Conference

Font Size: 
Data publishing from the viewpoint of a biodiversity publisher
Lyubomir Penev

Last modified: 2011-09-16

Abstract


The presentation summarizes two years of experience in data publishing in Pensoft’s journals. Currently, four ways of publishing biodiversity data are employed or being tested: (1) supplementary files published along with the respective papers and downloadable from the journals’ websites; (2) data files submitted to data repositories as independent files or packages of files and then linked to the journal article for which they provide the evidence (e.g., Dryad, Pangaea); (3) data published through data repositories and aggregators but indexed within larger databases (e.g., Genbank and the Global Biodiversity Information Facility, GBIF); and (4) data published in the form of marked-up, structured and machine-readable texts.

The main problems to solve before data publishing becomes a mass practice in biodiversity science are: (1) data citation mechanisms, which are still in a rudimentary form and need to gain acceptance within the community; (2) usage of most liberal open data licenses, because restrictive licenses represent an obstacle to sharing and re-use; (3) different quality and completeness standards for metadata employed by data repositories and publishers; (4) broader adoption of technical standards that allow easy and largely automated sharing and reuse of data and metadata; and finally (5) the economic models suitable for sustainable data publishing, which are unclear to many stakeholders. An innovative route for publishing occurrence data and taxon checklists has recently been launched by the GBIF. It is based on an approved TDWG standard (Darwin Core), enriched metadata descriptions for the published datasets, and the possibility to download both data and metadata in a machine-readable form, the so-called Darwin Core Archive. This is supported by a specialized tool, the Integrated Publishing Toolkit (IPT). Use of this tool allows the production of so-called “Data Paper” manuscripts that formally describe a dataset’s metadata as a peer-reviewed and citable scholarly publication.

Within the EU-funded project ViBRANT (www.vbrant.eu), Pensoft Publishers is tasked with developing, testing and implementing a multiple-choice model for publishing biodiversity data that will provide a non-exclusive choice of mechanisms for the publication of data of different kinds and complexity, in cooperation with specialized data repositories and data aggregators.