CDM GRIB Collection Processing

Overview

As of CDM version 4.3, GRIB datasets are handled as collections of GRIB files. A GRIB file is a collection of GRIB records. A GRIB dataset is a therefore a collection of GRIB records in one or more files. You cannot access these files remotely, eg through an OPeNDAP server, they must be local. A THREDDS Data Server (TDS) can make GRIB datasets remotely accessible, eg through OPeNDAP.

The CDM can only read GRIB files, it cannot write them. It can, however, rewrite GRIB into netCDF using CF Conventions. As of version 4.3.12, it can only write netCDF-3 format files, which are typically 4-40 times larger than GRIB. As of 4.3.13, the CDM can write to netCDF-4 format, with file sizes comparable to GRIB, typically within a factor of two.

A GRIB collection must follow these homogeneity constraints:

  1. The GRIB records must be either GRIB-1 or GRIB-2, you cannot mix different editions in the same collection.
  2. The GRIB collection should be coherent, eg from the same model (you can mix multiple runs from the same model, however). This is because the user does not have access to the metadata in the individual records, but only to global and variable attributes deccribing the collections of GRIB records.
  3. The GRIB records should all be from the same center and subcenter, since these are used for table lookups. (In principle, one could relax this if all records use only standard WMO entries. The global metadata may be misleading, however). Different table versions may be mixed in the same collection in GRIB-1.
  4. The GRIB records may have differerent reference dates. (This was not true in versions before 4.3)

In addition:

  1. A best practice is that all GRIB records in the collection should use the same Grid Definition (GDS). If there is more than one GDS in the collection, each GDS will be placed in a seperate group. This can be a problem for older software that doesnt deal with groups.
  2. Global attributes are taken from a single record, and so may be misleading if these vary within the collection. For example:
    1. The originating center and subcenter.
    2. The master and local table version (GRIB-2).
    3. The generating process type.
    4. The generating and background process name, if known.

Indexing

For each GRIB file, a GRIB index file is written with suffix .gbx9. This file contains everything in the GRIB file except the data. Generally it is 300-1000 times smaller than the original file. Once written, it never has to be rewritten. If the GRIB file changes, the CDM should detect that and rewrite the index file. If there is any doubt about that, delete the index file and let it get recreated.

For each GRIB collection, a GRIB collection index file is written with suffix .ncx. This file contains all the metadata and the coordinates for the collection. It is usually fairly small (a few dozen Kbytes to a few Mbytes for a large collection), and once created, makes accessing the GRIB collection very fast. In general it will be updated if needed, but one can always delete it and let it be recreated.

If one opens a single GRIB file in the CDM, a gbx9 and ncx file will be created for that file. If one opens a collection of multiple GRIB files, a gbx9 file is created for each file, and one ncx file is created for the entire collection.

Both kinds of index files are binary, private formats for the CDM, whose format may change as needed. Your application should not depend in any way on the details of these formats.

GRIB Tables

The use of external tables in GRIB is quite problematic (read here for more details). Nonetheless, GRIB files are in wide use internationally and contain invaluable data. The CDM is a general-purpose GRIB reading library that makes GRIB data available through the CDM/NetCDF API, that is, as multidimensional data arrays and CF-compliant metadata and coordinates.

Because of flaws in the design of GRIB and flaws in actual practice when writing GRIB, any general purpose GRIB reader can only make a best effort in interpreting arbitrary GRIB records. It is therefore necessary, for anything other than casual use, to carefully examine the output of CDM GRIB datasets and compare this against the documentation. In particular, GRIB records may refer to local tables that are missing or incorrect in the CDM, and they may override standard WMO tables without the CDM being able to detect that they are doing so. It is often necessary for users to contact the data producer to obtain the correct tables for the particular dataset they want to read. This is also necessary for other GRIB reading tools like wgrib (NCEP) and gribex (ECMWF).

The CDM has a number of ways to allow you to use new tables or override incorrect ones globally or by dataset. The good news is that if users contribute these fixes back to the CDM, everyone can take advantage of them and the set of "correct" datasets will grow. The WMO has greatly improved the process of using the standard tables, and hopefully GRIB data producers will continue to improve methods for writing GRIB and maintaining local tables.

Opening a GRIB Collection in the CDM

The CDM is used primarily to open single GRIB files, and the TDS is used to manage large and very large collections of files. Here is a summary of the ways that an application might use the CDM to open GRIB files.

Single Data File Mode

Pass the local data file location to any of the standard dataset opening classes:

The GRIB Index (.gbx9) and GRIB Collection index (.ncx) files will be created if needed.

GRIB Collection

You can create an ncx file based on a collection spec using ToolsUI: IOSP/GRIB1(2)/GribCollection. Enter the collection spec and hit Enter. To write the index file, hit the "Write Index" button on the right. Give it a memorable name and hit Save.

Collection Index Mode

If the GRIB Collection index (.ncx) already exists, one can pass that to any of the standard dataset opening classes. In this case, the collection is created from reading the ncx file with no checking against the original data file(s). The original data files are only accessed when data is requested from them.

Mapping a GRIB Collection into Multidimensional Variables

A GRIB file is an unordered collection of GRIB records. A GRIB record consists of a single 2D (x, y) slice of data. The CDM library reads a GRIB file and creates a 2, 3,4, or 5 dimension Variable (time, ensemble, z, y, x), by finding the records with the same parameter, with different time / level / ensemble coordinates. This amounts to guessing the dataset schema and the intent of the data provider, and is unfortunately a bit arbitrary. Most of our testing is against the NCEP operational models from the IDD, and so are influenced by those. Deciding how to group the GRIB records into CDM Variables is one of the main source of problems.It uses the following GRIB fields to construct a unique variable:

GRIB-1 Variables

The GRIB-1 variable name is:

%paramName[_%level][_layer][_%interval][_%statName]

where:
  %paramName = parameter name from GRIB-1 table 2 (cleaned up). if unknown, use VAR_%d-%d-%d-%d (see below)
  %level = short form of level name from GRIB-1 table 3, if defined.
  _layer = added if its a vertical layer (literal)
  %timeInterval = time interval name (eg "12_hour" or "mixed")
  %statName = name of statistical type if applicable, from GRIB-1 table 5
The GRIB-1 variable id is:
VAR_%d-%d-%d-%d[_L%d][_layer][_I%s][_S%d]

where:
  %d-%d-%d-%d = center-subcenter-tableVersion-paramNo
  L%d = level type  (octet 10 of PDS), if defined.
  _layer = added if its a vertical layer (literal)
  I%s = interval name (eg "12_hour" or "mixed") if a time interval
  S%d = stat type (octet 21 of PDS) if applicable

GRIB-2 Variables

The GRIB-2 variable name is:

%paramName[_error][_%level][_layer][_%interval][_%statName][_%ensDerivedType][_probability_%probName]

where:
  %paramName = parameter name from GRIB-2 table 4.2 (cleaned up); if unknown, use
               VAR_%d-%d-%d_FROM%d-%d = VAR_discipline-category-paramNo_FROM_center-subcenter
  %level = short form of level name from GRIB-2 table 4.5, if defined.
  _layer = added if its a vertical layer (literal)
  %timeInterval = time interval name (eg "12_hour" or "mixed")
  %statName = name of statistical type if applicable, from GRIB-2 table 4.10
  %ensDerivedType = name of enseble derived type if applicable, from GRIB-2 table 4.7
  %probName = name of probability type if applicable

The GRIB-2 variable id is:

VAR_%d-%d-%d[_error][_L%d][_layer][_I%s_S%d][_D%d][_Prob_%s]
where:
  VAR_%d-%d-%d = discipline-category-paramNo
  L%d = level type code
  I%s = time interval name (eg "12_hour" or "mixed")
  S%d = statistical type code if applicable
  D%d = derived type code if applicable
  Prob_%s = probability name if applicable

See ucar.nc2.grib.grib1.Grib1Rectilyser.cdmVariableHash() and ucar.nc2.grib.grib2.Grib2Rectilyser.cdmVariableHash() for complete details.

GDS Hashcode

The CDM creates a different group for each different GDS used in the collection. It identifies the GDS by creating a hashcode for it, and then groups on the hashcode. Unfortunately, in some cases, GRIB records have GDS that differ in the fifth decimal place in the starting x and/or y coordinate. Its clear that these are minor defects in the writing of the GRIB records. If desired, the user can fix these problems through NcML.

First, one must find the GDS hashcodes by using ToolsUI. In the IOSP/GRIB1(2)/GribCollection tab, enter the file name to show the records in the file. Select the two GDS (at the bottom), right click for the context menu and choose: compare GDS. This will show the differences in the GDS and the coresponding hashcodes. If you verify that they are, in fact, the same GDS, then you can use NcML (or TDS) to fix this problem. As of ToolsUI 4.3.18, there is a button with tooltip "generate gds xml" to help with this. The NcML you need looks like:

<?xml version="1.0" encoding="UTF-8"?>
<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2" location="E:/ncep/NDFD_CONUS_5km_conduit_20120119_1800.grib2">
 <iospParam>
   <gdsHash from="-2121584860" to="28944332"/>
 </iospParam>
</netcdf>

This changes those variables using GDS hashcode ="-2121584860" to use "28944332", which then eliminates the spurious group in the resulting ncx file. In order for this to work, you must first delete the ncx file so it will get recreated when the NcML is read. After that, you can open the ncx file directly or the NcML file.

Time Interval Coordinates

GRIB makes extensive use of time intervals as coordinates. In CF, time interval coordinates use an auxiliary coordinate to describe the intervals, for example a coordinate named time1(30) will have an auxiliary coordinate time1_bounds(30,2) containing the lower and upper bounds of the time interval for each coordinate. Currently, the CDM places all intervals in the same variable (rather than create seperate variables for each interval size). When all intervals have the same size, the interval size is added to the variable name. Otherwise the phrase "mixed_intervals" is added to the variable name.

Generally, the CDM places the coordinate value at the end of the interval, for example the time interval (0,6) will have a coordinate value 6. The CDM looks for unique intervals in constructing the variable. This implies that the coordinate values are not always unique, but the coordinate bounds pair are always unique. Application code needs to understand this to handle this situation correctly, by checking CoordinateAxis1D.isInterval().

NCEP GRIB2 model output, at least, has some issues that we are slowing learning how best to deal with. Currently there are two situations which the user can fix in NcML (or the TDS):

  1. One can choose to ignore (0,0) intervals.
  2. One can choose that certain parameters use only selected intervals. This is helpful when the parameter has redundant mixed levels, which can be derived from the set of intervals of a fixed size . For example, the 3 hour intervals (0,3), (3, 6), (6,9), (9,12) intervals are all present, and so other intervals (0,6), (0, 9), (0,12) can be ignored.

One uses the same process as in the "GDS Hashcode" section above to configure this. Here are examples using NcML:

<?xml version="1.0" encoding="UTF-8"?>
<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2" location="E:/ncep/NDFD_CONUS_5km_conduit_20120119_1800.grib2">

 <iospParam>
1) <intvFilter excludeZero="true"/>
2) <intvFilter intvLength="3">
<variable id="0-1-8"/>
<variable id="0-1-10"/>
</intvFilter>
</iospParam> </netcdf>
  1. Exclude intervals that have (0,0) bounds.
  2. Only include the 3 hour intervals for parameters 0-1-8 and 0-1-10, defining the parameter using discipline-category-number (GRIB-2) or center-subcenter-version-param (GRIB-1).

This document is maintained by John Caron and was last updated July 2013