As of CDM version 4.3, GRIB datasets are handled as collections of GRIB files. A GRIB file is a collection of GRIB records. A GRIB dataset is a therefore a collection of GRIB records in one or more files. You cannot access these files remotely, eg through an OPeNDAP server, they must be local. A THREDDS Data Server (TDS) can make GRIB datasets remotely accessible, eg through OPeNDAP.
The CDM can only read GRIB files, it cannot write them. It can, however, rewrite GRIB into netCDF using CF Conventions. As of version 4.3.12, it can only write netCDF-3 format files, which are typically 4-40 times larger than GRIB. AS of 4.3.13, an experimental netCDF-4 format, with file sizes comparable to GRIB.
A GRIB collection must follow these homogeneity constraints:
For each GRIB file, a GRIB index file is written with suffix .gbx9. This file contains everything in the GRIB file except the data. Generally it is 300-1000 times smaller than the original file. Once written, it never has to be rewritten. If the GRIB file changes, the CDM should detect that and rewrite the index file. If there is any doubt about that, delete the index file and let it get recreated.
For each GRIB collection, a GRIB collection index file is written with suffix .ncx. This file contains all the metadata and the coordinates for the collection. It is usually fairly small (a few dozen Kbytes to a few Mbytes for a large collection), and once created, makes accessing the GRIB collection very fast. In general it will be updated if needed, but one can always delete it and let it be recreated.
If one opens a single GRIB file in the CDM, a gbx9 and ncx file will be created for that file. If one opens a collection of multiple GRIB files, a gbx9 file is created for each file, and one ncx file is created for the entire collection.
Both kinds of index files are binary, private formats for the CDM, whose format may change as needed, transparent to any application.
The use of external tables in GRIB is quite problematic (read here for more details). Nonetheless, GRIB files are in wide use internationally and contain invaluable data. The CDM is a general-purpose GRIB reading library that makes GRIB data available through the CDM/NetCDF API, that is, as multidimensional data arrays and CF-compliant metadata and coordinates.
Because of flaws in the design of GRIB and flaws in actual practice when writing GRIB, any general purpose GRIB reader can only make a best effort in interpreting arbitrary GRIB records. It is therefore necessary, for anything other than casual use, to carefully examine the output of CDM GRIB datasets and compare this against the documentation. In particular, GRIB records may refer to local tables that are missing or incorrect in the CDM, and they may override standard WMO tables without the CDM being able to detect that they are doing so. It is often necessary for users to contact the data producer to obtain the correct tables for the particular dataset they want to read. This is also necessary for other GRIB reading tools like wgrib (NCEP) and gribex (ECMWF).
The CDM has a number of ways to allow you to use new tables or override incorrect ones globally or by dataset. The good news is that if users contribute these fixes back to the CDM, everyone can take advantage of them and the set of "correct" datasets will grow. The WMO has greatly improved the process of using the standard tables, and hopefully GRIB data producers will continue to improve methods for writing GRIB and maintaining local tables.
The CDM is used primarily to open single GRIB files, and the TDS is used to manage large and very large collections of files. Here is a summary of the ways that that an application might use the CDM to open GRIB files.
Pass the local data file location to any of the standard dataset opening classes:
The GRIB Index (.gbx9) and GRIB Collection index (.ncx) files will be created if needed.
You can create an ncx file based on a collection spec using ToolsUI: IOSP/GRIB1(2)/GribCollection. Enter the collection spec and hit Enter. To write the index file, hit the "Write Index" button on the right. Give it a memorable name and hit Save.
If the GRIB Collection index (.ncx) already exists, one can pass that to any of the standard dataset opening classes. In this case, the collection is created from reading the ncx file with no checking against the original data file(s). The original data files are only accessed when data is requested from them.
A GRIB file is an unordered collection of GRIB records. A GRIB record consists of a single 2D (x, y) slice of data. The CDM library reads a GRIB file and creates a 2, 3,4, or 5 dimension Variable (time, ensemble, z, y, x), by finding the records with the same parameter, with different time / level / ensemble coordinates. This amounts to guessing the dataset schema and the intent of the data provider, and is unfortunately a bit arbitrary. Most of our testing is against the NCEP operational models from the IDD, and so are influenced by those. Deciding how to group the GRIB records into CDM Variables is one of the main source of problems.It uses the following GRIB fields to construct a unique variable:
The GRIB-1 variable id is:%paramName[_%level][_layer][_%interval][_%statName] where: %paramName = parameter name from GRIB-1 table 2 (cleaned up). if unknown, use VAR_%d-%d-%d-%d (see below) %level = short form of level name from GRIB-1 table 3, if defined. _layer = added if its a vertical layer (literal) %timeInterval = time interval name (eg "12_hour" or "mixed") %statName = name of statistical type if applicable, from GRIB-1 table 5
VAR_%d-%d-%d-%d[_L%d][_layer][_I%s][_S%d] where: %d-%d-%d-%d = center-subcenter-tableVersion-paramNo L%d = level type (octet 10 of PDS), if defined. _layer = added if its a vertical layer (literal) I%s = interval name (eg "12_hour" or "mixed") if a time interval S%d = stat type (octet 21 of PDS) if applicable
The GRIB-2 variable name is:
%paramName[_error][_%level][_layer][_%interval][_%statName][_%ensDerivedType][_probability_%probName] where: %paramName = parameter name from GRIB-2 table 4.2 (cleaned up); if unknown, use VAR_%d-%d-%d_FROM%d-%d = VAR_discipline-category-paramNo_FROM_center-subcenter %level = short form of level name from GRIB-2 table 4.5, if defined. _layer = added if its a vertical layer (literal) %timeInterval = time interval name (eg "12_hour" or "mixed") %statName = name of statistical type if applicable, from GRIB-2 table 4.10 %ensDerivedType = name of enseble derived type if applicable, from GRIB-2 table 4.7 %probName = name of probability type if applicable
The GRIB-2 variable id is:
VAR_%d-%d-%d = discipline-category-paramNo
L%d = level type code I%s = time interval name (eg "12_hour" or "mixed") S%d = statistical type code if applicable D%d = derived type code if applicable Prob_%s = probability name if applicable
See ucar.nc2.grib.grib1.Grib1Record.cdmVariableHash() and ucar.nc2.grib.grib2.Grib2Record.cdmVariableHash() for complete details.
The CDM creates a different group for each different GDS used in the collection. It identifies the GDS by creating a hashcode for it, and then groups on the hashcode. Unfortunately, in some cases, GRIB records have GDS that differ in the fifth decimal place in the starting x and/or y coordinate. Its clear that these are minor defects in the writing of the GRIB records. If desired, the user can fix these problems through NcML.
First, one must find the GDS hashcodes by using ToolsUI. In the IOSP/GRIB1(2)/GribCollection tab, enter the file name to show the records in the file. Select the GDS (at the bottom) right click for the context menu and choose: compare GDS. This will show the differences in the GDS and the coresponding hashcodes. If you verify that they are, in fact, the same GDS, then you can use NcML (or TDS) to fix this problem, for example:
<?xml version="1.0" encoding="UTF-8"?> <netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2" location="E:/ncep/NDFD_CONUS_5km_conduit_20120119_1800.grib2"> <iospParam> <gdsHash from="-2121584860" to="28944332"/> </iospParam> </netcdf>
This changes those variables using GDS hashcode ="-2121584860" to use "28944332", which then eliminates the spurious group in the resulting ncx file. In order for this to work, you must first delete the ncx file so it will get recreated when the NcML is read. After that, you can open the ncx file directly or the NcML file.
GRIB makes extensive use of time intervals as coordinates. In CF, time interval coordinates use an auxiliary coordinate to describe the intervals, for example a coordinate named time1(30) will have an auxiliary coordinate time1_bounds(30,2) containing the lower and upper bounds of the time interval for each coordinate. Currently, the CDM places all intervals in the same variable (rather than create seperate variables for each interval size). When all intervals have the same size, the interval size is added to the variable name. Otherwise the phrase "mixed_intervals" is added to the variable name.
Generally, the CDM places the coordinate value at the end of the interval, for example the time interval (0,6) will have a coordinate value 6. The CDM looks for unique intervals in constructing the variable. This implies that the coordinate values are not always unique, but the coordinate bounds pair are always unique. Application code needs to understand this to handle this situation correctly, by checking CoordinateAxis1D.isInterval().
NCEP GRIB2 model output, at least, has some issues that we are slowing learning how best to deal with. Currently there are two situations which the user can fix in NcML (or the TDS):
One uses the same process as in the "GDS Hashcode" section above to configure this. Here are examples using NcML:
<?xml version="1.0" encoding="UTF-8"?> <netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2" location="E:/ncep/NDFD_CONUS_5km_conduit_20120119_1800.grib2"> <iospParam>
1) <intvFilter excludeZero="true"/>
2) <intvFilter intvLength="3">