Last modified: 2011-09-16
Abstract
The recently funded NSF-Advancing Digitization of Biological Collections (ADBC) Thematic Collections Network (TCN) project aims to digitize ca. 2.3 million North American lichen and bryophyte specimens from over 60 collections representing well over 90% of the remaining North American specimens from Canada, the United States and Mexico. This involves ca. 80% of the U.S. institutions known to hold at least 500 specimens. On-line availability of nearly the entire North American bryophyte and lichen collections will greatly accelerate knowledge and evaluation of the biodiversity of these organisms by fostering collaborations between professionals and the general public. Lichens and bryophytes share important traits, which make them some of the most sensitive indicators of environmental change. The specific goal of this project is to provide high quality data to address how species distributions change with regards to major environmental events across time and space. Large scale distribution mapping will support management decisions through identification of biodiversity hotspots, areas of most imminent environmental change, and greatest human impact.
To achieve this goal we are developing efficient digitization workflows and opportunities for crowd sourcing. A centralized approach to digitization will take advantage of extensive duplication of specimens in different collections. In addition, it will involve the highest degree of automation currently achievable with Optical Character Recognition (OCR) and Natural Language Processing (NLP). Digital images of specimen labels will undergo an OCR processing step and then be exposed to customized NLP. Optimization of NLP will be achieved via grouping of similar label layouts (e.g. from one collector), development of lookup tables for frequently used words, and use and expansion of existing taxonomic and geographic thesauri. Geo-referencing will take advantage of existing programs like geo-locate and biogeomancer. Following these automation steps, the digitized information will be available online for review, adjustment, and even key stroking, if necessary.
This last step of human intervention in the digitization process can be accessed by anyone interested in helping to advance this digital resource, therefore, we are planning on developing a vibrant volunteer community by extensively giving back in the form of local and online lectures, local field events, introductions to specimen determinations, etc.