Last modified: 2015-07-30
Abstract
The Biodiversity Heritage Library (BHL: http://www.biodiversitylibrary.org) continues to improve research methodology through collaborative digitization of taxonomic literature and a commitment to open access, the creation and dissemination of intra-institutional global standards and workflows, as well as the development and implementation of tools and services for users.
For the first time, a sub-section of BHL literature and data has been systematically evaluated in an effort to understand the taxonomic and bibliographic coverage specialists might encounter during research activities using the library’s collections. Librarians at the Smithsonian utilized two assessment methodologies, using pteridologists (fern and lycophyte, or club moss, researchers) as a use case, in an effort to develop a replicable process for determining collection comprehensiveness for subject specialists.
Cross-referencing two publicly available datasets (BHL’s scientific name metadata and relevant taxonomic names from the Catalogue of Life) provides an estimate of genus-level taxonomic name distribution within the BHL collection. Likewise, identifying key bibliographies and comparing the citations therein with BHL holdings, indicates the extent of BHL’s coverage of prominent subject-area literature.
Together, these methods can identify gaps in BHL’s key literature for specific genera as well as underrepresented genera in the collection. Public domain content identified as missing from the BHL can be compared with BHL member libraries’ holdings and, if available, queued for digitization. Taxonomic analysis can also highlight the strength of BHL’s in-copyright permissions acquisitions by showcasing the presence of genera described after the USA’s copyright cut-off year of 1923.
This level of bibliographic analysis is highly labor intensive and requires comprehensive bibliographies of the specific taxonomic group under consideration. Future work is needed to discover if efficiencies are possible through automation of some aspects of this process.
The taxonomic analysis required basic coding abilities to work with the large BHL names dataset, and to create a script capable of comparing the selected controlled vocabulary to this dataset. The script, written in Python, is now available for additional taxonomic analyses as well as potential further iterative development. Additionally, this kind of taxonomic analysis requires some subject expertise to define relevant taxonomic groupings and contextualize findings. Deeper taxonomic analysis is possible, but would be time intensive and require the participation of subject experts.
Ultimately, this taxonomic and bibliographic analysis serves as a pilot BHL collections analysis study, which can be replicated for future species and literature coverage assessments of BHL holdings.