[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 20000719: GRIB2NetCDF, filesize, offset and scaling



>From: Nils Olav Handegard <address@hidden>
>Subject: GRIB2NetCDF, filesize, offset and scaling
>Organization: ?
>Keywords: 200007190747.e6J7lJT22878

Hi Nils,

> I've managed to implement software to convert NCEP-GRIB 6h reanalysis
> data to daily mean, stored in NetCDF (I've used lats4d and grads).
>
> Most of these data (daily average) are already available in the NetCDF
> format via ftp, but some of them have to be converted.
>
> The problem is that the files produced by NCEP is half the size, they
> use the representation 'short' with 'add_offset' and 'scale_factor' and
> my files use the 'float' representation. Is there anything to do about
> this? Is there any tools for converting this? I didn't find anything in
> the grads/lats package to set this (I'm sure there is, but I don't know
> the tools that well).

Using the 'add_offset' and 'scale_factor' conventions for packing
floating-point data in 16-bit shorts is a common way to save space,
but the netCDF library doesn't provide any automatic conversions that
uses these attributes.  

Some software I've seen that follows these conventions includes the
FAN library described at

  http://www.unidata.ucar.edu/packages/netcdf/fan_utils.html

which says in the FAN User's Guide:

    Scaling and Unit Conversion

    All netCDF input and output values are transformed by a linear
    equation defined by the attributes add_offset, scale_factor and
    units; together with any unit defined by the -u option mentioned
    above. The output units attribute is defined or modified in some
    situations such as when it is undefined but the corresponding
    input attribute is defined.
    ...

I don't know whether you can just use the FAN utilities directly, or
whether it might be easier to extract the necessary conversions
functions from fanlib, the supporting C library.  In any case, the
software is available from

    ftp://ftp.unidata.ucar.edu/pub/netcdf/contrib/fan.tar.Z

although it's "user-contributed" software that we don't support
directly, and the author, Harvey Davies, is no longer actively
supporting it either.

Another possibility is some other user-contributed software,
nc_float.c: 

    Harry Edmon's interfaces to ncvarget and ncvarput that convert
    (optionally packed) data to floating point, handling missing data
    and units conversions.

available from the catalog of user-contributed software at 

    http://www.unidata.ucar.edu/packages/netcdf/contrib.html

though it's pretty old and may need some updating to netCDF-3.  There
may be other utilities that handle the packing attributes, but I'm not
aware of them.  I've appended an excerpt from an earlier reply that
has some example code for packing data, but you may already know this.

If you find or develop a data access layer that handles this 2:1
packing, please let us know ...


--Russ
_____________________________________________________________________

Russ Rew                                         UCAR Unidata Program
address@hidden                     http://www.unidata.ucar.edu

...
The netCDF library doesn't treat these attributes in any special way, so you
have to use their values for packing before you write values and unpacking
after you read values.  As an example, if you want to pack floating-point
values between 950 and 1050 into 8-bit bytes for a program variable named
`x' that is to be stored into a netCDF variable named x_packed, the
structure of the netCDF file might include a data specification like the
following:

    variables:
        ...
        byte x_packed(n);
                x_packed:scale_factor = 0.3937;
                x_packed:add_offset = 950;
                x_packed:_fillValue = 255;
         ...

where we just use the minimum value, 950, for the offset to keep all packed
values positive, and we compute the scale factor by using

        scale_factor = (Max - Min)/(2^Nbits - 2) 
                     = (1050 - 950) / (256-2)
                     = 0.39370079

Now before you store the value x, you pack it with the formula:

        x_packed = (x - add_offset) / scale_factor

and you store the byte value x_packed (which will be between 0 and 254)
instead.  You can use the byte value 255 for a missing value.

Similarly, when you read the data back in, you can unpack it using the
formula:

        x = (x_packed - 1)*scale_factor + add_offset

If you need more than 8-bits of precision but you still want to each value
as one netCDF value, you will have to use 16-bit shorts, and then the
formula above will use Nbits = 16 instead of Nbits = 8.

If you are using C, you may have to declare x_packed to be an `unsigned
char' to get these formulas to work out, or change the formulas to assume
signed values.  In Fortran there are no unsigned integers, so change the
formulas to use signed integers instead.