Management and publication of an integrative and comprehensive scheme for meta-omics data of collection objects (MOD-CO)
Pelin Yilmaz, Anton Link, Tanja Weibulat, Frank Oliver Glöckner, Dagmar Triebel, Gerhard Rambold

With the advent of advanced molecular techniques and methods, a new era has opened for analyzing and characterizing natural history collection specimens, as well as various other kinds of environmental samples, comprising organisms and their parts. Nucleic and amino acid sequence data-based analyses allow the recognition of identity, provenance, composition, and physiological status of a given organism or an organismal assemblage.

The fact that microbial organisms may colonize any substrate makes their composition and related traits valuable markers for various kinds of deposited dead, deep-frozen or living collection objects. It is therefore of highest importance, to elaborate a standard schema and vocabulary for the assignment of any kind of (meta-) genome, -transcriptome, -proteome, and -metabolome data to reference samples in natural history and living culture collections. Such Meta-Omics Data (MOD) from analyses, referenced by environmental samples, are necessary (a) for basic research on biodiversity and functional diversity of microbial communities, as well as (b) in the fields of environmental monitoring and control, biotechnology, and diagnostics.

The aim of the MOD-CO project (http://www.mod-co.net), which is funded by the German Research Foundation (DFG), is to select and categorize relevant descriptors from a wide range of analysis protocols, and to set up a standard that largely avoids structural redundancy. The schema follows a hierarchical design, considers elementary workflow processes, treating each process product as an object. The MOD-CO schema consequently focuses on the description of each occurring intermediate of work- and data flows. This means that it can be applied to describe all relevant steps and traits of procedural intermediates since the establishment of the preceding one (‘retrospective view’). The schema is implemented in the relational database application DiversityDescriptions, a database with a generalized data model. All data elements are organized as descriptors with their dependencies and descriptor trees.

The schema will be published by two different means: 1) via a http-formatted Wiki webpage, which can be downloaded as citable document and 2) via Wiki web pages, comprising TDWG term-compliant information. The latter approach will provide stable Uniform Resource Identifiers (URIs) for each descriptor and categorical descriptor state (‘concept’), along with information on descriptor hierarchies and dependencies.