Missouri Botanical Garden Open Conference Systems, TDWG 2011 Annual Conference

Font Size: 
Comparison of performance in different data models for taxonomic databases
Joern Vorwald, Sebastian Rick

Last modified: 2011-10-12

Abstract


Database modeling is an ongoing story in software development processes. During requirement analysis software developers are always faced with the same challenges in biological database design. Thus, the results of the development processes are similar. The EDAPHOBASE project focuses on the establishment of a complex GBIF information system for soil organisms. Within the project an innovative approach to database design is combined with high performance and high flexibility regarding stored information. Three types of database models,

  1. a generic model,
  2. a classic relational model, and
  3. a noSQL model

were analysed regarding the resulting performance of database queries. From the first to the last model, performance increased, while flexibility decreased. However, the differences were not dramatic, although the loss of flexibility could cause severe problems for the maintenance of the respective information system.

Within the project a chimaera of the first and second model has been implemented for data input and storage. While handling most tables in a classical way, observational data are stored in a generic part of the model. For data analyses and the resulting query tools, a second data-evaluation database based on a mixture of the second and third model is used due to the higher importance of performance aspects.