Last modified: 2011-10-11
Abstract
Past and ongoing efforts in the digitization of biological collections and the dramatic acceleration in the production of molecular data provide a tremendous opportunity for innovative research synthesis. However, this increase of available information can also result in redundant, erroneous and ambiguous taxon names caused by misspellings and contaminations. The presence of synonyms further complicates the task of identifying duplicate taxa within and across datasets. Given the ever-increasing size of available datasets, this scrubbing task cannot be performed manually anymore. Importantly, such bad taxomomies ultimately result in an incorrect interpretation of scientific data and thus scientific progress and policy decisions.
The Taxonomic Name Resolution Service (TNRS) is an online tool to correct and standardize taxonomic names. Given a list of taxa names at the rank of family or below, the TNRS identifies the closest matching name and returns it, with corrected spelling and formatting and completion of authors as well the updated familial classification. If a name is out-of-date, the name currently accepted by an authority of choice is returned instead.
The TNRS makes use of and extends two existing applications: Dmitry Mozzherin's GNI parser splits taxon names in their components, which are then matched by Tony Rees's TAXAMATCH against a reference database of plant names. Custom code uses taxonomically-informed decision rules to interpret and rank results, enabling the TNRS to select the single most likely match to the name submitted. The database itself is compiled from lists from different authorities, which can be chosen by the users and ranked in order of preference. The current version of the database has been populated with the Missouri Botanical Garden’s Tropicos list of names and with the NCBI taxonomy, but additional sources can be added, including regional lists or clade-specific lists. If a taxon has one or more synonyms, the current accepted name according to the chosen authority is returned, but the user can still access alternative interpretations and select a different name. Cases in which the acceptance is ambiguous or where only the genus is matched are flagged for user review. Importantly, the source of the match is also returned, including a web link. The service can be accessed either through a web interface (http://tnrs.iplantcollaborative.org/) or through an API, which permits the incorporation of a "spellchecking" step into automated analytical workflows. Through the web interface, users can paste a list of up to 5,000 names to be processed immediately or upload longer (or shorter) lists and be notified by email upon completion.
The chief goal of the TNRS is to provide a way for plant scientists to ensure that new publications include the most up-to-date nomenclature and that taxonomic names are correctly spelled. This permits the proper identification of biological samples and facilitates standardization and consistency. Secondarily, the TNRS greatly simplifies the integration across heterogeneous datasets, thus expanding the potential for research synthesis. It is important to note that whereas the TNRS only include plant names, its architecture makes it very easy to expand to handle names across different nomenclatural codes.