Font Size:
Providing Statistical Algorithms as-a-Service
Building: Grand Hotel Mediterraneo
Room: Sala dei Continenti
Date: 2013-10-29 03:12 PM – 03:30 PM
Last modified: 2013-10-05
Abstract
In computational statistics, algorithms often have specialized implementations that address very specific problems. Every so often, these algorithms are applicable also to other problems than the original ones. Today, interest is growing towards modular and pluggable solutions that enable the repetition and validation of the experiments made by other scientists and allow the exploitation of those algorithms in other contexts. Furthermore, such procedures are requested to be remotely hosted and to “hide” the complexity of the calculations, managed by remote computational infrastructures behind the scenes. For such reasons, the usual solution of supplying modular software libraries containing implementations of algorithms is leaving the place to Web Services accessible through standard protocols and hosting such implementations. The protocols describing the computational capabilities of these Services are more and more elaborate, so that modular workflows can rely on them.
We propose a Web Service, named Statistical Manager (SM) that hosts both general and special purpose algorithms implementations for statistical computing and data mining, which can be applied to a variety of biological and marine related problems. SM is distributed “as-a-Service” by the D4Science distributed e-Infrastructure, that supports large-scale resource sharing and distributed computing. SM exploits the heterogeneous resources offered by D4Science to retrieve and store data and to run distributed processes on large datasets. SM provides an environment in which experts can easily manipulate data and run experiments by themselves. Users can interact with SM through a Web interface. Such interface is automatically generated on the basis of a rigid definition of the algorithms inputs and outputs. The pluggable environment in terms of algorithms and resources and the flexibility of the connected interfaces are the main benefits with respect to alternative solutions. Furthermore, the hosted procedures are independent of each other and allow for an easy integration by third party workflow management systems.
The procedures currently managed by SM address problems ranging from ecological modelling to vessels activity monitoring. In the former case, the algorithms try to model complex phenomena in order, for example, to predict the impact of climate changes on biodiversity, help in conservation planning and estimate the geographical distribution of species. Vessel monitoring systems, instead, use data mining to monitor and control fishing activity in the oceans. SM currently supplies 30 algorithms implementations in such contexts, taken from state-of-the-art libraries, which include: real valued features clustering, functions and climate scenarios simulations, niche modelling, models performances evaluation, time series analysis, catch statistics, analysis of occurrence data about marine species and of vessels transmitted information. Statistical Manager is currently used in the i-Marine European project and the Web interface is accessible through the related Web portal. We are currently working to include compliancy with the standard "Open Spatial Consortium - Web Processing Service" specifications and to integrate SM with general purpose Workflow Management Systems like Taverna.
We propose a Web Service, named Statistical Manager (SM) that hosts both general and special purpose algorithms implementations for statistical computing and data mining, which can be applied to a variety of biological and marine related problems. SM is distributed “as-a-Service” by the D4Science distributed e-Infrastructure, that supports large-scale resource sharing and distributed computing. SM exploits the heterogeneous resources offered by D4Science to retrieve and store data and to run distributed processes on large datasets. SM provides an environment in which experts can easily manipulate data and run experiments by themselves. Users can interact with SM through a Web interface. Such interface is automatically generated on the basis of a rigid definition of the algorithms inputs and outputs. The pluggable environment in terms of algorithms and resources and the flexibility of the connected interfaces are the main benefits with respect to alternative solutions. Furthermore, the hosted procedures are independent of each other and allow for an easy integration by third party workflow management systems.
The procedures currently managed by SM address problems ranging from ecological modelling to vessels activity monitoring. In the former case, the algorithms try to model complex phenomena in order, for example, to predict the impact of climate changes on biodiversity, help in conservation planning and estimate the geographical distribution of species. Vessel monitoring systems, instead, use data mining to monitor and control fishing activity in the oceans. SM currently supplies 30 algorithms implementations in such contexts, taken from state-of-the-art libraries, which include: real valued features clustering, functions and climate scenarios simulations, niche modelling, models performances evaluation, time series analysis, catch statistics, analysis of occurrence data about marine species and of vessels transmitted information. Statistical Manager is currently used in the i-Marine European project and the Web interface is accessible through the related Web portal. We are currently working to include compliancy with the standard "Open Spatial Consortium - Web Processing Service" specifications and to integrate SM with general purpose Workflow Management Systems like Taverna.