Missouri Botanical Garden Open Conference Systems, TDWG 2014 ANNUAL CONFERENCE

Font Size: 
Shades of Grey: Yet Another role for BIS-TDWG
Arturo H. Ariño

Building: Elmia Congress Centre, Jönköping
Room: Rydbergsalen
Date: 2014-10-27 09:30 AM – 10:00 AM
Last modified: 2014-10-29


The existence of a “long tail” of biodiversity research data, composed of multitude of hard-to-reach datasets by small publishers has been often postulated as a potential source of much biodiversity information that remains “dark data”: we know they are there but do not know where. Much of this heap of data could eventually be mobilized and become usable, but there is a distinct risk that such data, once known, may be lost forever (“unknown knowns”) as, e.g., their media become obsolete, or persons that might know the metadata vanish.

Still, a vast body of data exists in a variety of “grey” forms as regarded from the point of view of their usability in biodiversity research, which now generally requires to be digitally accessible: those residing in paper publications or reports that have not (yet) been digitized, and may well never be so, e.g. very short-run printings, local reports, or merely notebooks never formalized into some form of publication. These may be known and localizable (thence not completely “dark”) but not practically usable until they become mobilized in digital form.

TDWG’s evolving standards are the state-of-the-art semantic tools enabling such mobilization, and can further be used to ascertain the fitness-for-use of the derivative datasets. However, the disproportionate effort required to mobilize undigitized data (as compared to digital datasets) might readily move such grey data into darkness. Incentives for mobilization (in effect, rescue boards for data) may emerge if the semantic corpus already in TDWG is progressively expanded to capture ancillary data types that may appeal to a wider community. As a starting point, perhaps the Ecology community could be targeted, and eventually a linkage to Ecological Metadata Language (EML) could again be considered (cf. Charter of the Observations Task Group) to enhance the desirability to extract and digitize, within TDWG standards, biodiversity information currently fading from grey shades into the darkness.