Font Size:
Text Mining for Biodiversity Ontologies
Building: Grand Hotel Mediterraneo
Room: Sala dei Continenti
Date: 2013-11-01 10:11 AM – 10:24 AM
Last modified: 2013-10-07
Abstract
There is a clear need for better semantic representation of biodiversity concepts, to facilitate more effective discovery and re-use of information resources relevant to scientists doing integrative research. In order to develop general-purpose biodiversity ontologies, however, it is necessary to represent concepts and relationships that working scientists are utilizing in their research. Traditional knowledge modeling through ontologies utilizes expert knowledge but inevitably favors the particular perspectives of the ontology engineers, as well as the domain experts who interacted with them. This often leads to ontologies that lack robust coverage of synonymy, while also missing important relationships among concepts that can be extremely useful for working scientists to be aware of. In this presentation we will discuss methods we have developed that utilize statistical topic modeling on a large corpus of scientific abstracts. As an exemplar we will discuss a corpus we collected of over 50,000 abstracts related to plant trait research. We performed latent Dirichlet allocation topic modeling on this corpus to discover a set of latent topics, which consist of terms that commonly co-occur in the abstracts. We match terms in the topics to concept labels in existing plant trait ontologies to reveal gaps, and we examine which terms are commonly associated in natural language discourse about plant traits, to identify relationships that are important to formally model in ontologies.
Our text mining methodology uncovers significant gaps in the content of some popular existing ontologies (such as the Trait Ontology), and show how, through a workflow involving human interpretation of topic models, we can bootstrap ontologies to have better coverage and richer semantics. Because we base our methods directly on what working scientists are communicating about their research, it gives us an alternative bottom-up approach to populating and enriching ontologies, that complements more traditional knowledge modeling endeavors.
Our text mining methodology uncovers significant gaps in the content of some popular existing ontologies (such as the Trait Ontology), and show how, through a workflow involving human interpretation of topic models, we can bootstrap ontologies to have better coverage and richer semantics. Because we base our methods directly on what working scientists are communicating about their research, it gives us an alternative bottom-up approach to populating and enriching ontologies, that complements more traditional knowledge modeling endeavors.