Missouri Botanical Garden Open Conference Systems, TDWG 2013 ANNUAL CONFERENCE

Font Size: 
Semantic matching of interests to annotations with SPARQL queries on reasoned triples
Paul J Morris, James Hanken, Maureen Kelly, David Lowery, Bertram Ludäscher, James A. Macklin, Robert A Morris, Tianhong Song

Building: Grand Hotel Mediterraneo
Room: Sala dei Continenti
Date: 2013-10-30 03:35 PM – 03:44 PM
Last modified: 2013-10-07

Abstract


An implemented use case scenario for FilteredPush network systems describes a researcher who expresses an interest in all assertions made in the network about members of a family, and then notification of this researcher of annotations asserting identifications of occurrences using names within this family, without the family name itself being expressed in the annotation. A generalization of scenario is matching expressions of interest with annotations in which the content of the annotation does not itself match the interest, but where some reasoner would infer additional information to match the two. In our implementations of FilteredPush, users express interests as simple key-value expressions, which are used as parameters to construct SPARQL queries that conform to the configuration of the network instance. Annotations are constructed using the World Wide Web Consortium (W3C) community group Open Annotation Ontology, W3C Content in RDF 1.0, an OWL representation of TDWG DarwinCore, dwcFP, and several other ontologies (Bug Ontology Model, FOAF, MARL). Network clients are assisted in formulating consistent annotations for particular domain business operations (e.g., consistent formulation of new identifications) by helper tools that invoke SPARQL rule based configurations. The SPARQL queries launched by interests are also informed by these configurations. Annotations entering the network are placed into a document store and a triple store. Interests are checked for matching annotations by launching the SPARQL queries derived from the interests on this triple store. In order to support matching on higher taxonomies or geography, we harvest hierarchical data from sources that are authoritative for the community supported by a network instance. In order to avoid reasoning on the fly, upon harvest of these data into the triple store we assert reasoned triples that allow the expression of simple query parameters to match a child node present in an annotation (e.g., a scientific name) with an arbitrary parent found in the harvested graphs (e.g., the containing family). Annotations may be generated by human users of client systems (e.g. taxonomists expressing new determinations of specimen data shown in a Symbiota portal) or by software actors (e.g., data quality control actors in a Kepler Kuration workflow present as an analytical capability in the network). Consumers of annotations may respond to them, thereby creating annotation conversations linked by the globally unique identfiers (UUIDs in our current deployments) of the annotations.