Missouri Botanical Garden Open Conference Systems, TDWG 2016 ANNUAL CONFERENCE

Font Size: 
Linking external SQL databases and the Semantic Web: A Pipeline for dynamic web publication with stable URI identifiers for database structural information and content schemes
Dagmar Triebel, Anton Link, Gregor Hagedorn, Andreas Plank, Markus Weiss, David Fichtmueller, Tanja Weibulat, Gerhard Rambold

Last modified: 2016-09-29


In biodiversity research, many recognized content standards for data exchange, technical norms, and schemes for structuring data elements and terms exist. (Over 30 relevant collection standards are listed under http://gfbio.biowikifarm.net/wiki/Data_exchange_standards,_protocols_and_formats_relevant_for_the_collection_data_domain_within_the_GFBio_network). They include thousands of single elements with definitions.

Expert teams are needed for the structuring and management of these data elements, sub-elements, items, and element relations to develop new standards or new database models. The work on element definition, datatype and content description, as well as on the relationships among various schemes might be done by using relational Structured Query Language (SQL) databases with rich clients. Advanced mechanisms are needed to link these external databases and their structural elements as well as content output with dynamic Semantic Web representations.

For this reason, we implemented a pipeline to create Semantic MediaWiki (SMW) pages with stable Uniform Resource Identifiers (URIs) for the database structural elements of all entity-relation models in the SQL databases of the Diversity Workbench (DWB) (http://diversityworkbench.net/). This makes it possible to cite and persistently reference each of the more than 2,000 DWB elements, as so-called concepts, and each of the 250 tables, as so-called collections. Furthermore, this will facilitate mapping efforts between DWB elements and established TDWG content standards.

A similar pipeline is established for the publication of novel content schemes or content standards for data exchange as needed, e.g., for the research project ‘MOD-CO: Towards an integrative and comprehensive standard for meta-omics data of collection objects’ (http://www.mod-co.net; funded by the German Research Foundation). The terms, concepts, or descriptors are managed in DiversityDescriptions. This relational database has a generalized triple-structured design allowing the flexible organization of descriptor states, (hierarchical) dependencies, and interrelations between descriptors. As a result of the pipeline the content of the database is integrated in a dynamic web publication, which allows appropriate citation of each single MOD-CO element with a stable URI and community-driven annotation.

Both pipelines are building on work by Hagedorn, Endresen, O Tuama, Plank 2013, establishing the TDWG Terms Wiki (http://terms.tdwg.org) on the biowikifarm in the ViBRANT (Virtual Biodiversity Research and Access Network for Taxonomy) project. The pipelines use the same, unmodified SMW-templates for the SKOS (Simple Knowledge Organization System) compatible definition of concepts, classes and collections and thereby they are compatible with other schemes in the TDWG Terms Wiki.