[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 2.3.2 vs. 2.4 netCDF question



> Keywords: 199604222201.AA01657
> From: address@hidden (John Sheldon)
> Subject: Re: missing_values
> To: address@hidden (Steve Emmerson)
> Date: Mon, 22 Apr 1996 18:03:13 -0400 (EDT)

Hi John,

Thanks for sending back all the requested info on the C90 netcdf problem.

> What is the "default" value for 'missing_value'?  I see one for the
> _FillValue...is it considered the same?

There is no default for "missing_value", since it's only an (older)
convention for an attribute name, and not treated in any special way by the
library.  With version 2, the "_FillValue" attribute was introduced to
permit user specification of the value you get when you read a value that
hasn't been written yet.  Now with 2.4.1, the ncdump/ncgen utilities also
treat "_FillValue" specially, using "_" as a CDL representation for data
with that value.

Here's what the new User's Guide has to say about these in the section on
Attribute Conventions:

 _FillValue 

    The _FillValue attribute specifies the fill value used to pre-fill disk
    space allocated to the variable. Such pre-fill occurs unless nofill mode
    is set using ncsetfill (or NCSFIL for FORTRAN). See section Set Fill
    Mode for Writes: ncsetfill and NCSFIL, for details. The fill value is
    returned when reading values that were never written. If _FillValue is
    defined then it should be scalar and of the same type as the
    variable. The _FillValue is typically outside the valid range and
    therefore treated by generic applications as missing. However it is
    legal for _FillValue to be within the valid range so that it is treated
    like any other valid value. It is not necessary to define your own
    _FillValue attribute for a variable if the default fill value for the
    type of the variable is adequate. However, use of the default fill value
    for data type byte is not recommended. Note that if you change the value
    of this attribute, the changed value applies only to subsequent writes;
    previously written data are not changed. See section Fill Values, for
    more information.
    
 missing_value 
    
    This attribute is not treated in any special way by the library or
    conforming generic applications, but is often useful documentation and
    may be used by specific applications. The missing_value attribute can be
    a scalar or vector containing values indicating missing data. These
    values should all be outside the valid range so that generic
    applications will treat them as missing. See section Fill Values, for
    more information.

> > If you could make the files (or smaller versions of them) available for FTP,
> > that might help.  The ideal would be if you could send us a small program
> > that created files that demonstrated the problem, so we could see if we
> > could duplicate it on any of our machines or on the Cray YMP we have access
> > to.
> 
> OK, I put several things out on our anon-FTP in:
> 
>    ftp://ftp.gfdl.gov/pub/jps/netcdf/debug/C90/
> 
> 
> The following files are subsets (one variable) of the user's data that
> started all this. They were generated on our C90 running Unicos 8:
> 
>       "test05_TEMP_only.nc" - created using netCDF 2.3.2
>       "test06_TEMP_only.nc" - created using netCDF 2.4.1

I've downloaded all the files in this directory, thanks.

The symptoms appear to be a minor difference in the two C90 XDR
implementations used.  I assume the netCDF 2.3.2 version and 2.4.1 version
are not linked with the same XDR library, and that with the 2.4.1 version,
you are using something more highly optimized.  (This could be tested by
calling xdrmem_create() and xdr_float() and then looking at the result of
the encoding.)

> I *was* able to reproduce this problem on our C90 using a test code I
> have that writes out a file containing an eclectic variety of
> variables.  I modified this code so that the first element of the first
> data variable (SST) is a missing_value.  I've run it using both 2.3.2
> and 2.4.1 and the .nc files are
> 
>       "test_ncsetup-232.nc"
>       "test_ncsetup-241.nc"

These binary files differ, so that means the 2.3.2 and 2.4.1 libraries are
producing different XDR representations for the same value.  By looking at
the values, I can see that the different IEEE floating-point values for the
missing_value attribute differ only by one unit in the last place (one bit
of the mantissa).

It's plausible to me that an optimized XDR library might produce a floating
point representation that differed in the least significant bit of the
mantissa from that produced by a less highly optimized XDR library, so I
don't know if you could convince Cray this is a bug in the optimized XDR.

I think the real problem here is the assumption that you can take an
arbitrary native 64-bit floating-point number (9.9999994e+34), convert it to
a 32-bit IEEE representation, convert it back to native 64 bits again, and
expect to get exactly the same 64 bits you started with.  It's not possible
to represent all 64-bit Cray floating-point values in 32-bit IEEE floats.

When we chose the default value for the _FillValue attribute for floats, we
chose a number that could be represented exactly in a mantissa with very few
bits:

   #define FILL_FLOAT   (9.9692099683868690e+36F) /* near 15 * 2^119 */

so it is possible to convert the FILL_FLOAT to IEEE and back without
roundoff or truncation.

> I ncdump-ed each of these using both the 2.3.2 ncdump and the 2.4.1
> ncdump - the CDL files are:
> 
>       "test_ncsetup-232_ncdump-232.cdl"
>       "test_ncsetup-232_ncdump-241.cdl"
>       "test_ncsetup-241_ncdump-232.cdl"
>       "test_ncsetup-241_ncdump-241.cdl"
> 
> 
> For both the libraries, note that the variable SST is defined as
> 
>         float SST(time, lat, lon_odd) ;
>                 SST:long_name = "sea surface temperature" ;
>                 SST:units = "deg_K" ;
>                 SST:valid_range = 0.f, 3457.f ;
>                 SST:_FillValue = 9.96921e+36f ;
>                 SST:missing_value = 9.9999994e+34f ;
> 
> BUT, the first line of the data listing for SST written using 2.3.2 is
> 
>        SST =
>         9.999999419e+34, 1012, 1013, 1014, 1015, 1016, 1017,
> 
> while that for the 2.4.1 version is
> 
>        SST =
>         1.000000041e+35, 1012, 1013, 1014, 1015, 1016, 1017,

The digits beyond the 7th are actually just noise here, because IEEE
floating point can only represent 7 to 8 significant digits of precision.
These values again differ in only one bit as IEEE floats.

>  -------
> 
> NOW, ON THE SGI (IRIX 5.3), I ran the same code.  The .nc files
> produced using the 2.3.2 vs. 2.4.1 libraries are identical to one
> another (compared using the UNIX "cmp").  I've put the .nc file and the
> .cdl file (both completely in a 2.4.1 context) in:
> 
>   ftp://ftp.gfdl.gov/pub/jps/netcdf/debug/sgi/
>
> One interesting thing is that a diff b/w the SGI and C90 renditions :
> 
>   diff ./sgi/test_ncsetup-241.cdl ./C90/test_ncsetup-241_ncdump-241.cdl
> 
> yields quite a few differences:
> 
> 34c34
> <               time:_FillValue = 9.969209968386869e+36 ;
> ---
> >               time:_FillValue = 9.96920996838687e+36 ;
> 45c45
> <               SST:missing_value = 1.e+35f ;
> ---
> >               SST:missing_value = 9.9999994e+34f ;
> 89c89
> <               SST_timeavg_T1:_FillValue = 9.969209968386869e+36 ;
> ---
> >               SST_timeavg_T1:_FillValue = 9.96920996838687e+36 ;
> 93c93
> <               SST_timeavg_T2:_FillValue = 9.969209968386869e+36 ;
> ---
> >               SST_timeavg_T2:_FillValue = 9.96920996838687e+36 ;
> 105c105
> <               SST_timeavg_ITEMDT:_FillValue = 9.969209968386869e+36 ;
> ---
> >               SST_timeavg_ITEMDT:_FillValue = 9.96920996838687e+36 ;
> 113c113
> <               SST_timeavg_TIMEOFITEM:_FillValue = 9.969209968386869e+36 ;
> ---
> >               SST_timeavg_TIMEOFITEM:_FillValue = 9.96920996838687e+36 ;
> 124c124
> <               SST_timeavg_2_T1:_FillValue = 9.969209968386869e+36 ;
> ---
> >               SST_timeavg_2_T1:_FillValue = 9.96920996838687e+36 ;
> 128c128
> <               SST_timeavg_2_T2:_FillValue = 9.969209968386869e+36 ;
> ---
> >               SST_timeavg_2_T2:_FillValue = 9.96920996838687e+36 ;
> 140c140
> <               SST_timeavg_2_ITEMDT:_FillValue = 9.969209968386869e+36 ;
> ---
> >               SST_timeavg_2_ITEMDT:_FillValue = 9.96920996838687e+36 ;
> 148c148
> <               SST_timeavg_2_TIMEOFITEM:_FillValue = 9.969209968386869e+36 ;
> ---
> >               SST_timeavg_2_TIMEOFITEM:_FillValue = 9.96920996838687e+36 ;
> 154,155c154,155
> <               :missing_value = -1.e-29f ;
> <               :Global_float_ADDATT = 123.456f ;
> ---
> >               :missing_value = -9.9999993e-30f ;
> >               :Global_float_ADDATT = 123.45599f ;
> 347c347
> <   25071, 25089, 25108, 25127, 25146, 25165, 25184 ;
> ---
> >   25070, 25089, 25108, 25127, 25146, 25165, 25184 ;

All these differences are on the order of the last bit in a 32-bit IEEE
float.  Also ncdump uses sprintf(3) to print floating-point values, and
different platforms may display the same floating-point value slightly
differently as an ASCII string.  I note that there are really only about 4
or 5 floating-point values that differ by one bit in the SGI and C90 241
versions of the netCDF file.

> The very last difference may well be a consequence of the arithmetic in
> my own packing routine, but the other differences are a little
> unnerving to see when one is looking for true "platform independence".
> Are these sorts of "differences" expected?

Yes, because the Cray has to do arithmetic to convert from its native
floating-point representation to IEEE and back, and thus rounding or
truncation may lead to single-bit differences.  One way around this is to
not test for exact floating-point equality for missing values, but instead
use a slightly fuzzy test, as the 2.4.1 ncdump does in testing for
floating-point fill values:

   #define absval(x)  ( (x) < 0 ? -(x) : (x) )
        if(varp->has_fillval &&
           (absval(*vp.fp - *fillp.fp) <= absval(float_eps * *fillp.fp))) {
              (void) sprintf(sout, FILL_STRING);
        } else {
              (void) sprintf(sout, fmt, *vp.fp);
        }

where float_eps is defined in terms of the machine precision, FLT_EPSILON in
float.h.

> -----------
> 
> Now, let me know if you still need me to extract a piece of code that
> can reproduce this problem.  Since that's always about about as much
> fun as a root canal, and since I've already spent the better part of an
> afternoon assembling the above info, I won't tackle that unless you
> really need me to.

No, that's OK.  But I hope this question comes up again, so I can justify
including some of this in the FAQ :-) ...

--Russ

______________________________________________________________________________

Russ Rew                                           UCAR Unidata Program
address@hidden                              http://www.unidata.ucar.edu