Unidata - To provide the data services, tools, and cyberinfrastructure leadership that advance Earth system science, enhance educational opportunities, and broaden participation. Unidata
         
  advanced  
 

Unidata Observation Dataset Conventions

This Conventions is deprecated in favor of the CF Conventions for Point Observations. We recommend that you use that as soon as it is officially adopted.

This document describes a Convention for NetCDF (version 3) files for writing Point, Trajectory, Profile, and Station observation data. It uses the NetCDF Attribute Convention for Dataset Discovery. Files that follow this conventions can be read by the NetCDF-Java library and the IDV into the Point, Trajectory, Profile and Station data types of the Common Data Model.

Also see:

Definitions

Point, Trajectory, Station and Profile Observation datasets

An observation is a collection of measurements at one time and location. A Point Observation dataset contains observations which are not necessarily related in space or time. A Station Observation dataset contains time series of observations at named locations called stations. A trajectory is a collection of observations which are connected along a one dimensional track in space, with time increasing monotonically along the track. A Trajectory Observation dataset contains one or more trajectories. A profile is a kind of trajectory in which the observations have the same x,y location, and different z coordinates, and time may change or be the same for all observations in the profile. A Profile Observation dataset contains one or more profiles.

Grouping Variables into Structures

A Structure is a NetCDF Variable that contains other Variables, like a struct in C. Structures are a new part of NetCDF, introduced in the NetCDF-Java library, version 2.2 as part of the Common Data Model, and also implemented in the NetCDF-4 library.

NetCDF version 3 files can have only one real Structure, namely the record Structure that contains all of the Variables in the file that use the record (unlimited) dimension. (see Using Records in NetCDF-3 files for a full explanation). However we can also abstractly create a pseudo-Structure, which is a Structure that contains all the Variables in a file which have the same outer dimension (the first dimension in C and Java, the last dimension in Fortran). For example, the variables

   float temperature( record);
   float humidity( record);
   char name( record, name_len);

can also be modeled as a Structure :

   Structure {
     float temperature;
     float humidity;
     char name( name_len);
  } record( record);

This Convention will use the technique of identifying groups of NetCDF variables through their use of shared dimensions. These groups will effectively be one dimensional Structures with the same name as the defining dimension. Each Structure therefore has a unique index, namely its index in the defining dimension.

Associating Structures

Often we need to associate a list of structures together, and attach them to another structure. For example, we need to identify the list of all observations for a particular station. In this case we call the station the container/parent structure and the observations the contained/child structures. The order of the children is important. Another example is the list of observations for a particular trajectory, especially when there are multiple trajectories in a single file. In that case the order is crucial, since trajectory observations are assumed to be connected along a line in time and space. Given a parent, we want to find its children efficiently, i.e. not have to read the entire file.

There are three ways that you can create parent-children associations in this Convention.

1. A linked list uses the index of the children to create a forward or backward linked list. The parent structure maintains the first or last child index, and the children structures have a next or previous child index. The end of the list is indicated by a child index equal to -1. Note that indices are zero-based. The advantage of linked lists is that each parent can have a variable number of children, which can be stored in any order. The variables containing the child indices are determined in one of two ways:

  1. Child structure variables explicitly named nextChild or prevChild, and parent Structure variables named firstChild or lastChild.
  2. Global attributes firstChild_variable and nextChild_variable or lastChild_variable and prevChild_variable, whose values are the names of the corresponding variables.

Each child structure also keeps its parent index in a variable called parent_index or named by the parent_index_variable attribute. Optionally, each parent keeps track of the number of children in a variable called numChildren or named by the numChildren_variable attribute. Note that indices are zero-based.

2. A contiguous list stores the children contiguously in the array, and the parent structures maintain the first child index and the number of children. The the ith parent structure contains children between firstChild(i) and firstChild(i) + numChildren(i) - 1 inclusive. Note that indices are zero-based. The advantage of contiguous lists is that each parent can have a variable number of children, and all the chilren for one parent are stored together on disk for fast access. The variables containing the child indices are determined in one of two ways:

  1. Parent Structure variables explicitly named firstChild and numChildren.
  2. Global attributes firstChild_variable and numChildren_variable, whose values are the names of the corresponding variables.

Each child structure also keeps its parent index in a variable called parent_index or named by the parent_index_variable attribute. Note that indices are zero-based.

3. When each parent has the same number of children (or there is a maximum number of children and you don't mind wasting some space), then you can also use a multidimensional structure. The parent structure dimension must be the outermost dimension, and the child structure dimension must be the next outer dimension, e.g. the children structure variables would look like float varName( parentDim, childDim). Concrete examples are given below.

Conventions used by all Observation Datasets

Conventions global attribute

NetCDF files conforming to this specification must add the global attribute:

  :Conventions = "Unidata Observation Dataset v1.0"; 

When following multiple conventions, list them with a comma separator, for example if you conform to both this Convention and one called MyConventions, you would use:

  :Conventions = "MyConventions, Unidata Observation Dataset v1.0";

Note: The CF-1.0 Convention for observation data is considered incomplete, and we do not recommend using it at this time.

Identifying the Observations

Use the global attribute

  :observationDimension = "dimName";

to name the observation dimension. If there is no such attribute, then the record (unlimited) dimension will be the observation dimension.

An observation is a collection of measurement values at a single time and location. All Variables with the observation dimension.as their outer dimension constitute the the observation measurement. The number of observations in the file will then be the length of the observation dimension. (When using multidimensional structures, its more complicated, see examples below)

Identifying the Coordinate Variables

Each observation must have a latitude, longitude, and time coordinate value associated with it. An altitiude variable is optional. There are four ways to do this:

1. _Coordinate attributes:

  1. The latitude variable must have an attribute named _CoordinateAxisType with value equal to "Lat".
  2. The longitude variable must have an attribute named _CoordinateAxisType with value equal to "Lon".
  3. The altitude variable must have an attribute named _CoordinateAxisType with value equal to "Height".
  4. The time variable must have an attribute named _CoordinateAxisType with value equal to "Time".

2. Explicit variable names:

  1. The latitude variable must be named latitude.
  2. The longitude variable must be named longitude.
  3. The altitude variable must be named altitude or depth.
  4. The time variable must be named time.

3. Global attributes name coordinate variables:

  1. The latitude variable is named by the global attribute latitude_coordinate.
  2. The longitude variable is named by the global attribute longitude_coordinate.
  3. The altitude variable is named by the global attribute zaxis_coordinate.
  4. The time variable is named by the global attribute time_coordinate.

4. CF-1 compatible method:

  1. The latitude variable must have units attribute "degrees_north".
  2. The longitude variable must have units attribute "degrees_east".
  3. The altitude variable must have the attribute positive = "up" or "down".
  4. The time variable must have units attribute that is a udunits date.
  5. At least one observation variable must have an attribute coordinates, whose value is the list of the latitude, longitude, altitude, and time variable names, e.g. varName:coordinates = "latName lonName altName timeName". To be strictly CF-1 compliant, all observation variables must have the coordinates attribute. The name of the time variable in this list is optional if the time variable is a coordinate variable, but for clarity we recommend putting it in the list.

In all cases:

  1. The latitude variable must be in decimal degrees north (units "degrees_north").
  2. The longitude variable must be in decimal degrees east (units "degrees_east").
  3. The altitude variable must be in meters, or have a units attribute that is udunits compatible with "meters". It should have the attribute positive = "up" or "down". If missing the positive attribute, "down" will be assumed if the name is depth, and "up" otherwise. Generally it will be assumed to be referenced to mean sea level (msl). Use the long_name attribute to indicate to the user otherwise.
  4. The time variable must have a units attribute that is a udunits date or an ISO 8601 date string.

Optional (but recommended) Information

  1. Nominal Time: The time coordinate should be the time of the observation, as exact as possible. Often, there is a nominal time associated with the observation, for example hourly observations have a nominal time on the hour. To indicate this, use an observation variable called time_nominal or named by the time_nominal_variable global attribute.

Other Attributes

The following are the required attributes from the set of data discovery attributes defined by NetCDF Attribute Convention for Dataset Discovery:

Generally, it is recommended that all the attributes defined by that specification be used if possible.

Point Observation datasets

The latitude, longitude, altitude, and time variables must either have the observation dimension as their only dimension, or be a scalar variable, in which case their scalar value applies to all the observations.

It must have a global attribute

 :cdm_datatype = "Point"; 

The latitude, longitude, altitude, and time variables must all be observation variables, i.e. have the observation dimension as their outer dimension.

Station Observation datasets

A station observation dataset must have a global attribute

 :cdm_datatype = "Station"; 

A station observation dataset needs another group of variables to describe the station information. Use the global attribute

  :stationDimension = "dimName";

to name the station dimension. If there is no such attribute, then there must be a dimension named station.

All Variables with the station dimension.as their outer dimension are station Variables, containing information about the stations. The latitude, longitude, and altitude variables must all be station variables, i.e. have the station dimension as their outer dimension. In addition, there must exist a station id variable and optionally a station description variable. The station ids must be unique within the file. These can be identified in two ways:

Station variables explicitly named station_id and station_description.

  1. Global attributes station_id_variable, and station_description_variable whose values are the names of the station id and station description variables.

The station dimension will likely never be unlimited, so you may need to guess the maximum stations you need, for instance if you are creating the file from streaming data. In this case, set a variable called number_stations (or name it through a global attribute number_stations_variable) to the number of stations actually used. If that variable is not present, it will be assumed that all stations are valid.

The observations must be associated with their corresponding station using linked lists, contiguous lists, or multidimensional structures.

Example Station CDL

Example METAR station file (45 Mb)

Trajectory Observation datasets

If there is only one trajectory in the file, then a Trajectory dataset follows the same rules above as a Point dataset, except that it must have the global attribute

 :cdm_datatype = "Trajectory"; 

If there are multiple trajectories in the same file, then the trajectories are identified through the trajectory dimension. Use the global attribute

  :trajectoryDimension = "dimName";

to name the trajectory dimension. If there is no such attribute, then there must be a dimension named trajectory.

All Variables with the trajectory dimension.as their outer dimension are considered trajectory Variables, containing information about the trajectory. The number of trajectories in the file will then be the length of the trajectory dimension.

The latitude, longitude, altitude, and time variables must all be observation variables, i.e. have the observation dimension as their outer dimension. There must also exist a trajectory id variable and optionally a trajectory description variable. The trajectory ids must be unique within the file. These can be identified in two ways:

  1. Trajectory variables explicitly named trajectory_id and trajectory_description.
  2. Global attributes trajectory_id, and trajectory_description whose values are the names of the trajectory id and trajectory description variables.

The observations must be associated with their corresponding trajectory using linked lists, contiguous lists, or multidimensional structures.

Example CDLs:

Profile Observation datasets

If there is only one profile in the file, then a Profile dataset follows the same rules above as a Point dataset, except that it must have the global attribute

 :cdm_datatype = "Profile"; 

If there are multiple profiles in the same file, then the profiles are identified through the profile dimension. Use the global attribute

  :profileDimension = "dimName";

to name the profile dimension. If there is no such attribute, then there must be a dimension named profile.

All Variables with the profile dimension.as their outer dimension are considered profile Variables, containing information about the profile. The number of profiles in the file will then be the length of the profile dimension.

If the profiles are at different locations, then the latitude and longitude variables must be profile variables. The altitude variable must be an observation variable (has the observation dimension as its outer dimension). The time variable may be a profile or an observation variable, corresponding to whether the time varies along the profile or not.

If the profiles can be grouped into time series at the same location(s) then you can use a station dimension to group them into "stations". Use a dimension named station (or the global attribute stationDimension = "dimName" to name the station dimension). Create station variables station_id, latitude, and longitude, and optionally station_description. The altitude variable must be an observation variable (has the observation dimension as its outer dimension). The time variable may be a profile or an observation variable, corresponding to whether the time varies along the profile or not.

There must also exist profile variables named profile_id and optionally profile_description (or global attributes profile_id, and profile_description whose values are the names of the profile id and profile description variables). The profile ids must be unique within the file.

The observations must be associated with their corresponding profile using linked lists, contiguous lists, or multidimensional structures.

Example:


Changes:


This document is maintained by John Caron and was last updated on June 20, 2007

 
 
  Contact Us     Site Map     Search     Terms and Conditions     Privacy Policy     Participation Policy
 
National Science Foundation (NSF) UCAR Community Programs   Unidata is a member of the UCAR Community Programs, is managed by the University Corporation for Atmospheric Research, and is sponsored by the National Science Foundation.
P.O. Box 3000     Boulder, CO 80307-3000 USA     Tel: 303-497-8643     Fax: 303-497-8690