Missouri Botanical Garden Open Conference Systems, TDWG 2011 Annual Conference

Font Size: 
TaxPub: An extension to the NLM/NCBI Journal Archiving DTD
Terry Catapano

Last modified: 2011-09-19

Abstract


TaxPub is an extension of the Journal Archiving and Interchange Tag Suite (JATS) for the encoding of taxonomic publications. JATS is maintained by the U.S. National Center for Biotechnology Information (NCBI) of the U.S. National Library of Medicine (NLM). It has been widely adopted in scientific publishing and serves as the format for all articles in the PubMed Central Database. TaxPub aims to provide an XML tag set for the encoding of new taxonomic literature, with a focus on taxonomic descriptions. It extends the JATS Publishing Document Type Definition (DTD) very parsimoniously. A few phrase-level elements (e.g., for scientific names, references to specimens and other material, etc...) have been made available throughout the entire DTD alongside the more generic elements provided by the base JATS tagset. Most of the extension, however, occurs in a single section-level element <tp:taxon-treatment>. TaxPub defines at a high level the major components of a description with the aim of allowing extraction of treatments and the data contained in them into external repositories and databases. Further semantics can be applied by user-defined or external vocabularies (e.g., DarwinCore). Code for TaxPub is maintained at a web based SourceForge repository (http://sourceforge.net/projects/taxpub/), and documentation is available at the Species-id wiki (http://species-id.net/wiki/TaxPub). Pensoft Publishers has tested and adopted TaxPub for their journals Zookeys and PhytoKeys, which facilitates the journals acceptance for archiving and display in PubMed Central and also enables export to data aggregators such as Encyclopedia of Life, Plazi, Species-Id, and others. Goals for future development of TaxPub include: markup of legacy literature; enabling database-driven publication; further modeling for full compatibility with both the Botanical and Zoological nomenclatural codes; provision of guidance on usage and application; and development of associated tools such as conversion Extensible Stylesheet Language (XSL) stylesheets and profiling mechanisms (e.g., via Schematron)