Font Size:
Advances in Biodiversity Modeling: Development of Algorithms and Predictive Performance Analysis
Last modified: 2011-09-09
Abstract
Modeling of species geographic distribution is a technique that has been used in many tasks related to biodiversity conservation. The modeling tools produce niche-based models, that is, models that represent environmental conditions for the species survival for a long period of time. The openModeller is an example of a modeling tool which provides several algorithms and functionalities. This tool was developed by three Brazilian institutions: Polytechnic School of the University of São Paulo (EPUSP), Reference Center on Environmental Information (CRIA) and National Institute for Space Research (INPE). In this context, there are some Artificial Intelligence (AI) techniques that have been studied to model species geographic distribution. Two modeling algorithms were developed and integrated in the openModeller, one based on Artificial Neural Networks and another based on Maximum Entropy. One of the group aims is to research possible improvements in the algorithms available on the tool, as well as studying new techniques with potential to modeling task. Thus, two alternative versions for the maximum entropy algorithm were implemented. The first one was a parallel version and the second one was an adaptive version. Besides this, an adaptive approach was evaluated as possible substitute of a parameter in an analysis carried out on tuning of parameters of the maximum entropy algorithm. Furthermore, the group has been studying the predictive performance of the algorithms with the aim of specifying and formalizing a performance analysis method. This method may be used as reference to solve some challenges regarding to modeling tasks, such as the validation of a new algorithm, the comparison among different modeling techniques, the choice of the more suitable modeling algorithm to the studied problem as well as to the available data set and the choice of the algorithm parameters initial values. The performance analysis method will be used to validate the algorithms based on Artificial Neural Network and Maximum Entropy developed. This validation will be made through a comparison with other algorithms available in openModeller as GARP (Genetic Algorithm for Rule-set Production) and SVM (Support Vector Machine). Additionally, a new technique that has been researched for modeling tasks is the Minimum Description Length (MDL) principle. It was already successfully used for the clustering problem and it has some interesting properties that could be useful in modeling. However, the MDL principle was never employed to model species geographic distribution. The most interesting property of the MDL principle is its capacity of automatically avoiding overfitting when learning the parameters of a model, which is a classical AI problem. Avoiding overfitting is a fundamental feature of techniques used to make predictions. In a preliminary study, the MDL principle with irregular histograms was used to select the best data set for modeling of species geographic distribution. The modeling of species geographic distribution is a promising area with several challenges, great developing potential and, as well as all the predictive tasks, it needs of techniques more and more precise. The advance in the study of algorithms and performance analysis techniques can contribute for modeling task development.