[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: NetCDF should round, not truncate! (fwd)



> Date: Sun, 4 Sep 2005 21:29:49 -0400
> From: Remko Scharroo <address@hidden>
> To: address@hidden
> Cc: GMT Team <address@hidden>
> Subject: NetCDF should round, not truncate!
>
> Dear NetCDF developers,
>
> While improving the NetCDF reading and writing routines of GMT
> (Generic Mapping Tools, http://gmt.soest.hawaii.edu/), we stumbled on
> a serious mishap in NetCDF, one that for all these years of relying
> on the NetCDF package, none of us had ever appreciated.
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Howdy Remko and the GMT Team!
The existing type conversion of netCDF is well-documented. The
documentation can be found here:
http://my.unidata.ucar.edu/content/software/netcdf/docs/netcdf/Type-Conversion.html
This is cross-referenced from the C and Fortran manuals in the
documentation for all the nc_get/put_var[a1sm]_{TYPE} functions, with
the documentation of the final parameter. For example:
http://my.unidata.ucar.edu/content/software/netcdf/docs/netcdf-f77/NF_005fGET_005fVAR1_005f-type.html
So this is not a bug, it's documented netCDF standard
behavior. However, it may be that you have a point in that the way
conversion is done is not ideal.
>
> This mishap concerns the conversion of floats or doubles to integers,
> shorts or bytes. Reading from the Reference Manual:
>
>      int nc_put_var_float(int ncid, int varid, const float out[])
>
>            The  netCDF dataset must be open and in data mode.  The
>            type of the data is specified in the function name, and
>            it  is  converted to the external type of the specified
>            variable, if possible, otherwise an NC_ERANGE error  is
>            returned.
I wonder what version of the docs you are using? We did some fairly
significant documentation upgrades with 3.6.0, so if you are using
docs from earlier versions, you should probably upgrade them. See the
netCDF documentation page to get the docs in a variety of formats:
http://my.unidata.ucar.edu/content/software/netcdf/docs/index.html
Here's the latest version of the docs for this function:
http://my.unidata.ucar.edu/content/software/netcdf/docs/netcdf-c/nc_005fput_005fvar_005f-type.html
 
As you can see, the documentation of the last parameters include a
link to the documented type conversion behavior.
>
>  From this description most users will expect that a float is rounded
> to the nearest integer, otherwise known as normal rounding. However,
> that is not what happens in NetCDF: the float is simply truncated, or
> rounded towards zero. Although the distinction is seemingly
> unimportant, it can lead to very unexpected results:
>
> 1) Very small deviations in storing an integer as a float can lead to
> the conversion to a value 1 different from what is expected.
> Example: 60000 may end up as 59999.999 in a float, being rounded down
> to 59999
> 2) When storing floating point data equally distributed in a range
> including 0 as integers, the number 0 will appear twice as often as
> the other values in the range. This is because all numbers -0.999999
> to 0.999999 are truncated to 0.
> 3) Furthermore, because there is a limit check on the original
> floating point number, the extremes for any given data type rarely
> get used.
> For example, when storing 127.001 as a byte, it would perfectly round
> to 127, but this value gets rejected because of a check against the
> limit 127.
> 4) Truncation in general is not the preferred type for conversion of
> floating point numbers to integers. It will lead inevitably to the
> magnification of rounding errors. For example, (int)(0.6/0.2) will
> likely end of as 2, not 3.
I agree with all of these points. However it is not clear to me that
we can change behavior that has been in place, and documented, for
many years.
I will leave further consideration of changing netCDF-3 conversions to
Russ Rew, here at Unidata, who makes decisions about the direction of
netCDF.
For netCDF-4, it is a different story. Since it is not released, it
would be possible to change this behavior. There are some
implementation complexities, but for the moment the question is: what
do we want netCDF-4 to do with type conversions?
I will raise this question to the netCDF-4 requirements team.
>
> Here's an example of the offending code (from NetCDF-4.0-beta7)
>
> 1    int ncx_put_short_float(void *xp, const float *ip)
> 2    {
> 3        ix_short xx = *ip;
> 4        put_ix_short(xp, &xx);
> 5        if(*ip > X_SHORT_MAX || *ip < X_SHORT_MIN)
> 6            return NC_ERANGE;
> 7        return ENOERR;
> 8    }
>
> Line 3 simply truncates the float *ip to xx. So this would allow
> 32767.1 become 32767, but also 32767.999.
> Line 5 checks the float against the limits 32767 and -32768,
> respectively. This makes 32767.1 be rejected. Only exactly the value
> 32767 would come across.
>
> This is the way we think the code should be written:
>
> 1    int ncx_put_short_float(void *xp, const float *ip)
> 2    {
> 3        ix_short xx = floor (*ip + 0.5);
> 4        put_ix_short(xp, &xx);
> 5        if(*ip > X_SHORT_MAX + 0.5 || *ip < X_SHORT_MIN - 0.5)
> 6            return NC_ERANGE;
> 7        return ENOERR;
> 8    }
>
> ... or something similar. For example, X_SHORT_MAX and X_SHORT_MIN
> could include the extra 0.5. The function floor (*ip + 0.5) makes
> sure the numbers are rounded to the nearest integer.
>
> Is there an overriding reason to truncate numbers in float->integer
> conversion rather than to round them to the nearest integer? We can
> not think of any. Please consider changing this anomaly in all float-
>  >integer conversion, but in ncx_get and ncx_put routines.
Sorry, not quite sure what you are saying about ncx_get and ncx_put here...
Thanks for this very interesting email! As far as netCDF-4 goes, it
will receive very serious consideration. I will leave the netCDF-3
response to Russ.
Thanks again!
Ed Hartnett
-- 
Ed Hartnett  -- address@hidden
Date: Tue, 06 Sep 2005 08:55:06 -0600
In-Reply-To: <address@hidden> (Jennifer
 Oxelson's message of "Mon, 5 Sep 2005 19:48:40 -0600 (MDT)")
Message-ID: <address@hidden>
User-Agent: Gnus/5.1006 (Gnus v5.10.6) Emacs/21.4 (gnu/linux)