Font Size:
Integrated Digitized Biocollections (iDigBio) Cyberinfrastructure Status and Futures
Building: Grand Hotel Mediterraneo
Room: Sala dei Continenti
Date: 2013-10-30 11:30 AM – 11:45 AM
Last modified: 2013-10-05
Abstract
Integrated Digitized Biocollections (iDigBio) has been building and deploying a cloud-based cyberinfrastructure customized to support the digitization of non-federal biological and paleontological collections across the USA. The iDigBio cyberinfrastructure - consisting of hardware, software, data, and resources - is designed to support semantically richer and thematically-based biological collections data. The data, which include media, can be integrated with other data types (e.g., genomics, and morphology) that represent expanded use cases in the biodiversity community. Initial iDigBio efforts focused on deploying a portal to engage the community in collaborative creation, sharing and dissemination of content and activities; on building a specimen portal for managing the workflow around collection object information and its ancillary data; and on provisioning of Virtual Private Server (VPS) platforms that support the computational needs of the Advancing Digitization of Biodiversity Collections (ADBC) community. The community portal is based on known web-based tools deployed and maintained by iDigBio (Drupal, MediaWiki, and Redmine) as well as third-party services (AdobeConnect, GitHub, Facebook, Twitter, Google Drive, and Qualtrics) that meet the needs of workshop and working group activities. The specimen portal offers web and application program interfaces (APIs) backed by core cyberinfrastructure components to store, index and access digitized data. It is built on distributed object and document stores (Swift, and Riak) that maintain three replicas of data to improve availability, even in the presence of certain types of failure. Document and object stores provide flexibility in storing data when compared to traditional relational systems, which facilitates ingestion of richer data. VPS platform provisioning builds upon the integration of machine virtualization with system management, configuration, monitoring, and deployment technologies. Current main uses of VPS platforms are: Thematic Collections Network (TCN) portals and data aggregation systems, tool development environments, and community network nodes. Near-term deliverables include the improvement of the specimen portal capabilities and the offering of iDigBio-produced virtual appliances, i.e., special purpose virtual machines. The specimen portal will provide additional search capabilities (disjunction and presence/absence operations), interactive geospatial mapping based on user searches, self-service data ingestion, and data reporting. Development plans include a fully featured triple store or graph database into the iDigBio platform to support queries and rich linking of objects within iDigBio or elsewhere to the broader universe of available data. To facilitate the digitization workflows taking place at TCNs and Partners to Existing Networks (PENs), the media ingestion appliance will enable intake of media objects (images, vocalizations, videos, etc.) into iDigBio with minimal effort and in a robust manner, while the Specify appliance will facilitate installation and configuration of a known collection management tool by lowering the barrier to adoption, particularly where IT support is minimal.