Missouri Botanical Garden Open Conference Systems, TDWG 2014 ANNUAL CONFERENCE

ETC: From description to matrix and beyond in a web-based toolbox
Thomas Rodenhausen, Hong Cui, Fengqiong Huang, Bertram Ludäscher, James Macklin, Bob Morris, Shizhuo Yu

Building: Elmia Congress Centre, Jönköping
Room: Rum 10
Date: 2014-10-30 03:05 PM – 03:10 PM
Last modified: 2014-10-03


Biologists typically consult a number of information sources to obtain, organize and analyze comparative data about organisms. Further, the scientific names of organisms associated with these resources are often used to broker comparison between them and are generally assumed to be reliable. In practice, the use of these names often reflects different opinions or concepts of the authors typically captured in floras, faunas, monographs, and other scholarly publications. However, detailed morphological descriptions associated with these publications contain rich comparative information that can be parsed out and analysed to provide a more quantitative comparison of concepts across resources. The Explorer of Taxon Concepts (ETC) project will provide tools to support knowledge extraction, analysis and visualization to facilitate the comparison of concepts across resources and foster reuse and integration.

We present our current progress on the implementation of the ETC as a web-based toolbox. To facilitate the fine-grained semantic markup of literature containing morphological (taxonomic) descriptions, the natural language processing (NLP) tool CharaParser [1] has been integrated to generate machine-readable XML output. The extracted knowledge contains morphological characters and their states along with other descriptive information, which are semantically organized based on the relevant input term ontology. The user may choose an existing ontology or enhance or create one collaboratively by interfacing with the Ontological Term Organizer (OTO) [2]. A character matrix can be generated from CharaParser’s output, or input from another source, and collaboratively reviewed online using a matrix-review tool. The tool consists mainly of three views to review and edit the matrix. A spreadsheet view gives the accustomed presentation of the matrix, which is able to focus taxa by taxonomy structure, character value filtering or manual user selection. The compact configuration view focuses on the matrix layout by displaying an editable overview of available taxa and characters. The desktop view provides the ability to display auxiliary information, such as morphological descriptions of taxa or analysis diagrams of characters. The matrix review tool provides the ability to use controlled vocabularies and annotate the matrix using colors and comments. The matrix can be analysed using Information Gain methods to produce an interactive identification key [3], which support for polymorphic characters. Other visualizations based on the matrix are also planned. Through collaboration with the Euler project [4], we will support the logical comparison of taxonomies through semi-automated articulation of concepts and tree-based visualization of their alignments.

The ETC toolbox can be utilized at [5]. If you wish to contribute, the project is open source. The source is modularized in a set of components, which can be found at [6]. The source of the parent project is located at [7].

This material is based upon work supported by the National Science Foundation under Grant No. DBI-1147266. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.

