Font Size:
SemantEco Annotator for Linked Data Generation and Generalized Semantic Mapping
Building: Grand Hotel Mediterraneo
Room: Sala dei Continenti
Date: 2013-10-30 03:17 PM – 03:26 PM
Last modified: 2013-10-07
Abstract
Many science projects face the challenges of enabling search and discovery of scientific data. Our work on the NSF DataNet program’s DataONE project focuses on enabling this discovery in the domains of biodiversity and earth sciences. Semantic web technologies and linked data (LD) provide a medium for addressing these challenges, but in leveraging them we face two major obstacles: (1) the process of translating tabular data and domain knowledge sources into a linked data format still has its difficulties with existing tools; and (2) the notion of building an IT infrastructure that relies heavily on linked data can be perceived as a risky proposition due to maturity of current LD management tools.
Our recent efforts on the SemantEco Annotator assists with mitigating both obstacles. To address the first, the Annotator plays the role of translator, for converting any type of table-structured information into RDF, leveraging OWL ontologies and vocabularies, such that the enriched RDF data it produces can be used immediately within RDF stores and hosted as LD. To address the second obstacle, the Annotator plays the role of a semantic mapper, that enables a user to map column headers to OWL properties and type the values of columns to OWL classes or datatypes, where the mappings are also serialized in a linked data format, RDF. The mapping are described such that they can be applied for translating the tabular information to RDF within the Annotator, or instead, can be serialized further into XML or other formats for use in non-linked data environments for clarifying the schema of their data, enabling optimized semantic search, etc. The mapping output thus duly services both LD or non-LD IT environments, and provides the architects of non-LD environments the ability to “future proof” and migrate to LD at their own pace.
The Annotator is web application that a user visits in a web browser, loads a CSV-delimited file, and uses the ontology selector menu to select hard-coded ontologies (e.g., OBO-E, SWEET, ENVO) or enter in a URI that is a URL that resolves to an RDF graph for vocabulary selection. The Annotator provides advanced manipulation features such as column based translation, and aggregating columns along implicit entity representations. To demonstrate its utility for discovering data in support of biodiversity research, we have successfully used the Annotator to convert eBird data and the eBird taxonomy (which follows the Clements Checklist) into RDF, which is available now in our SemantEco Discovery and Search Portal, alongside water quality data, to enable a researcher to identify potential trends between water quality and organism counts.
The Annotator will continue to expand its capabilities including automatic mappings directed to a particular graph closed under a predicate/object pair, use of OWL domain and range restriction axioms to guide the user in vocabulary selection decisions, use of OWL class definitions to enable a top-down approach for modeling their data, and ontology extraction to complement and enable reasoning alongside the generated RDF. We are also architecting a platform for better management of linked data, within which the Annotator plays a vital role.
Our recent efforts on the SemantEco Annotator assists with mitigating both obstacles. To address the first, the Annotator plays the role of translator, for converting any type of table-structured information into RDF, leveraging OWL ontologies and vocabularies, such that the enriched RDF data it produces can be used immediately within RDF stores and hosted as LD. To address the second obstacle, the Annotator plays the role of a semantic mapper, that enables a user to map column headers to OWL properties and type the values of columns to OWL classes or datatypes, where the mappings are also serialized in a linked data format, RDF. The mapping are described such that they can be applied for translating the tabular information to RDF within the Annotator, or instead, can be serialized further into XML or other formats for use in non-linked data environments for clarifying the schema of their data, enabling optimized semantic search, etc. The mapping output thus duly services both LD or non-LD IT environments, and provides the architects of non-LD environments the ability to “future proof” and migrate to LD at their own pace.
The Annotator is web application that a user visits in a web browser, loads a CSV-delimited file, and uses the ontology selector menu to select hard-coded ontologies (e.g., OBO-E, SWEET, ENVO) or enter in a URI that is a URL that resolves to an RDF graph for vocabulary selection. The Annotator provides advanced manipulation features such as column based translation, and aggregating columns along implicit entity representations. To demonstrate its utility for discovering data in support of biodiversity research, we have successfully used the Annotator to convert eBird data and the eBird taxonomy (which follows the Clements Checklist) into RDF, which is available now in our SemantEco Discovery and Search Portal, alongside water quality data, to enable a researcher to identify potential trends between water quality and organism counts.
The Annotator will continue to expand its capabilities including automatic mappings directed to a particular graph closed under a predicate/object pair, use of OWL domain and range restriction axioms to guide the user in vocabulary selection decisions, use of OWL class definitions to enable a top-down approach for modeling their data, and ontology extraction to complement and enable reasoning alongside the generated RDF. We are also architecting a platform for better management of linked data, within which the Annotator plays a vital role.