[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: netCDF / GRIB



Hi Dennis,

> Within NCAR (CGD and ACD, in particular) there
> is a project called the Climate System Modeling
> effort. It is a very large effort within NCAR and will
> likely grow in importance over the years.  This will
> develop a suite of models to attack climate related 
> problems. One issue  is the archival of the data from
> a wide range of models. How best to do it is a [problem] which
> several people have begun to (informally) look at.
> 
> In *MY* opinion there are two possible formats:
> netCDF and GRIB. The latter is a WMO standard for
> meteorological and, I think, oceanographic gridded
> data. From what *I* can see each has some advantages:
> 
> e.g.,
> 
> netCDF is used by a variety of installations
> it is well defined 
> considerable documentation (easy to access)
> it is self describing
> computer transparent

Here are a few additional advantages:

  - there is a standard applications programming interface to netCDF data
    (actually several: C, Fortran, C++)
  - netCDF can handle more than 2 dimensions
  - multiple variables (model parameters) can be stored together in a single
    netCDF file
  - direct access to subsets of the data is possible

> GRIB is slightly more compact and thus a little
>      better for archival.

In some cases GRIB can be much more compact, especially for very
low-resolution data.  For data that requires 9 bits or 17 bits or 33 bits,
netCDF will typically require almost twice as much space even if the data is
packed, since netCDF data types are only available for 8 bits, 16 bits, 32
bits, or 64 bits.  If you are not using netCDF packed integers to represent
low-resolution floating-point data, then netCDF can require almost 4 times
as much space in representing 8-bit data as 32-bit floats.  For storing bit
maps of data that require only a single bit per data, for example a land-sea
matrix, GRIB has a "bit map section" that uses only one bit per datum.
Although it is possible to store a bit map in netCDF as an opaque array of
bites, the simplest representation would use an 8-bit byte for each 1-bit
value.

> it is (sort of) self describing
> computer transparent
> it is the format used by the two largest
>      operational meteorological centers (ECMWF and NMC)
>      THIS IS A HUMUNGOUS ADVANTAGE ... IN MY OPINION
>      It means that the output from climate models
>      and the operational analyses will have the
>      same format.

Yes, this is a significant benefit if you will be comparing ECMWF and NMC
outputs to the CSMP model outputs.  However you should be aware that a
single NMC model run can produce hundreds of different GRIB products, one
for each parameter, level, forecast time, and area of coverage.  The NMC
1.25 x 1.25 thinned WAFS grids number 6864 GRIB products per day.  If you
store these one per file, you have a significant file management problem.
If you are going to store multiple GRIB products per file, you will need an
extra layer of record management to locate particular GRIB products within a
file.  With thousands of products per file, sequential access is not
practical, so some sort of direct access using an index is necessary.  As
far as I know, there is no standard way to do this; however you choose to do
it might not match the way it is done at ECMWF or NMC.  Perhaps a database
approach that hid the underlying files would provide a simple solution.

> I am not sure of the downsides of each format although
>      documentation for GRIB is not as good or as easily
>      available as netCDF.

The GRIB format is well-documented in freely-accessible network documents
such as John Stackpole's GRIB Edition 1 document.  However there is no good
documentation for program interfaces to read or write GRIB data, as far as I
have been able to determine.  GRIB is an excellent and compact format for
transmitting gridded meteorological data, but I'm not sure anyone would
claim it is an ideal form in which to access such data from programs.  More
software would be needed to make GRIB data access convenient for data
management, display, and analysis programs, or the GRIB data would have to
be converted into a form more convenient for such uses.

Another difference between GRIB and netCDF has to do with extendibility.  To
get a new parameter approved by the WMO and added to the GRIB standard would
be a major undertaking.  You could use the sections of GRIB tables reserved
for local use, but then you have to communicate your extensions to GRIB to
those who want to use your data.  Creating a new netCDF variable for a new
derived quantity does not require any centralized registry: just think of a
name for it and add some appropriate attributes such as units and long_name.
Programs that don't know about the new variable won't notice it's there and
will work as before.  Similarly, new dimensions and attributes may be added
to existing netCDF files without affecting working programs, since the
programs access the data through an interface that insulates them from such
changes.

> What may be a downside to netCDF?

Limitations of netCDF include:

 - performance and size trade-offs to achieve machine-independent data
   (for example access is slow to floating-point arrays on Crays)

 - use of direct access instead of sequential access precludes use
   in simple pipes and filters

 - lack of built-in compression

 - necessity for developing local conventions for variable and dimension
   names, attributes for geo-referencing information, etc.

 - difficulty of developing truly generic netCDF applications, since there 
   are multiple ways to represent the same data in netCDF 
 
 - only one unlimited dimension per file, so representing a variable number
   of variable-length data items requires some indirection

Some of these limitations were intentional to get the corresponding benefits
of an engineering trade-off.  For example, direct access files can't fit
easily into the "pipes and filters" model of Unix programs, but they permit
efficient access of small subsets of large datasets, which is useful in
visualization applications.  Similarly, including general compression would
put constraints on the ability to write netCDF data in a different order
than it will be read.

--
Russ Rew                                              UCAR Unidata Program
address@hidden                                        P.O. Box 3000
(303)497-8645                                 Boulder, Colorado 80307-3000