[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 961113: data compression for netCDF variables



>To: address@hidden
>From: Tom Umeda <address@hidden>
>Subject: data compression for netCDF variables
>Organization: BAAQMD
>Keywords: 199611132316.AA02067

Hi Tom,

> We have been using netCDF routines for a number of years.
> We have implemented a simple data compression (arbitrary precision)
> feature on top of the netCDF library.
> However, we would like to see a general data compression
> feature incorporated into the library itself.
> Is there any interest in adding a data compression option
> to the netCDF library?

Yes, except that we call it data packing instead of data compression,
because the latter term is often used to describe schemes where random
access to small subsets of large data sets is no longer possible.

In our Frequently Asked Questions list for netCDF at 

   http://www.unidata.ucar.edu/packages/netcdf/faq.html#compression

we've provided a more extended answer to the question "Are there plans
to add facilities for data compression to netCDF?"

Our packing plans require a new netCDF file format, so we are trying to
be careful to make sure we support backward compatibility with version 1
files and anticipate other requirements of the file format so that we
won't have to change it again.  We have been discussing integrating our
packing plans for netCDF 4 with NCSA's HDF packing by using a common
format for files, but that's still at the very early stages.

One of our users has taken a different approach, integrating netCDF I/O
with Zlib (see http://quest.jpl.nasa.gov/zlib/), to produce a version of
the netCDF library that 

    ... will automatically recognize a compressed netCDF file and
    decompress only the records needed. No modifications to existing
    source code is necessary (but you will have to relink to the
    modified library). To create a compressed netCDF file, the nccreate
    function is called with an additional mode flag set: ncid =
    nccreate("some_file.ncz",NC_CLOBBER | NC_COMPRESS);

    That's it. There are some adjustable parameters that may be modified
    to individual tastes, but generally that is all that needs to be
    done.

One disadvantage of this approach is that it makes updating the data in
a large compressed files fairly inefficient in some cases, since a write
may require rewriting all subsequent blocks in the file.  This may not
be practical unless writes are just appends of new records to the end of
a file.

The developer of this "znetcdf library" (William Noon) has chosen not to
release or announce this version yet, but I CC:ed him on this reply (Hi
Bill!), in case he has corrections to my oversimplified extract and
summary of znetcdf or updates to the availability of znetcdf.

I'd also be very interested in hearing what approach you've taken to
compressing netCDF data.

--Russ

_____________________________________________________________________

Russ Rew                                         UCAR Unidata Program
address@hidden                     http://www.unidata.ucar.edu