Join the Discussion on Proposed NetCDF Uncertainty Conventions

Scientific data formats such as NetCDF have made great strides in areas like interoperability, scalability, and data compression. By comparison, methods for representing the uncertainty inherent in the values stored in scientific data sets are less robust. A group organized by researchers from the National Research Council of Italy's Institute for Atmospheric Pollution Research is trying to address this issue by creating a set of conventions for the representation of uncertainty values associated with data stored in NetCDF files.

A draft public discussion paper titled NetCDF Uncertainty Conventions (NetCDF-U) 1.0, authored by Lorenzo Bigagli and Stefano Nativi, was submitted to the Open Geospatial Consortium (OGC) in late 2011. That paper introduces the problem in this way:

From a theoretical perspective, it can be said that no dataset is a perfect representation of the reality it purports to represent.

Inevitably errors, arising from the observation process, including the sensor system and subsequent processing, differences in scales of phenomena and the spatial support of the observation mechanism as well as a lack of knowledge about the detailed conversion between the measured quantity and the target variable means that in principle all data should be treated as uncertain.

The most natural representation of an uncertain quantity is in terms of random variables (or fields / functions for spatially and temporally distributed variables), with a probabilistic approach.

However, it must be acknowledged that almost all existing data resources are not treated in this way. Most datasets come simply as a series of values, often without any uncertainty information. If there is uncertainty information, then this is typically contained within the metadata, in a data quality element. This is typically a global (dataset wide) representation of uncertainty, often derived through some form of validation process. Typically, it is a statistical measure of spread, for example the standard deviation of the residuals (data set measured minus 'true' value).

The discussion paper goes on to discuss an approach to encoding uncertainty information in a NetCDF dataset. (The full paper is available here in PDF format.)

The authors have created a e-mail list intended to encourage broad community input into the OGC discussion paper. The mailing list is hosted by the OGC but is a freely available to participants beyond the OGC member institutions. To join the discussion, visit https://lists.opengeospatial.org/mailman/listinfo/netcdf-u.

Comments:

Post a Comment:
Comments are closed for this entry.
News@Unidata
News and information from the Unidata Program Center
News@Unidata
News and information from the Unidata Program Center

Welcome

FAQs

Developers’ blog

Take a poll!

What if we had an ongoing user poll in here?

Browse By Topic
Browse by Topic
« April 2024
SunMonTueWedThuFriSat
 
5
6
7
8
9
10
11
12
13
14
15
19
20
21
22
23
24
25
26
27
28
29
30
    
       
Today