Re: automatic type conversion issues: range errors

Hi Ed,
    HDF5 provides the user with a method of catching these sort of errors,
using the H5T{get|set}_overflow() routines.  The overflow routine is called
any time that a value in the source can't be represented exactly in the
destination, with the datatypes provided.  This allows a user application to
catch range errors and potentially change the value set for the destination
value.
    Currently, there isn't a way to pass the information about the fill value
along to the user's callback routine, which looks like it would be necessary to
fulfill the current netCDF-3 functionality.  Additionally, I think it would be
better to have these routines set on an individual data transfer, instead of
globally for the library, so I think we should move them to be get|set routines
on a data transfer property list.
    For now, you can use the current H5Tset_overflow routine to trap any
overflows and set them to be zero and when I've fixed this up better, you will
have access to the actual fill value the user set for the dataset.

    Quincey

> In the type conversion process, range errors will require special
> handling to live up to the netcdf-3 standard.
> 
> Netcdf defines a range error as occurring when you try to stuff too
> large (or small) of a number from one type, into a more restrictive
> type.
> 
> For example, let's say you have a length 2 array of long:
> 
> long arr[] = {10, 1232134};
> 
> Now you want to write this out as a byte (i.e. signed one byte
> int). The first array element is no problem. The second is too large
> to fit.
> 
> The netcdf answer to this is to write the first array element as
> instructed, then to write a fill value for the second, and return the
> NC_ERANGE error.
> 
> Uniquely (I believe) for netcdf errors, the NC_ERANGE error indicates
> that the operation (i.e. the write of the array) DID take place, but
> that at least one range error was found, and that value replaced with
> a fill value.
> 
> Usually, as is the C convention, a netcdf function returning an error
> should not be expected to have completed it's operation.
> 
> Quincey, what does HDF do if you try and write a too-large long into a
> signed char? Does it give an error?
> 
> 
>From owner-netcdf-hdf@xxxxxxxxxxxxxxxx 23 2003 Oct -0600 11:21:50 
Message-ID: <wrxr81327kh.fsf@xxxxxxxxxxxxxxxxxxxxxxx>
Date: 23 Oct 2003 11:21:50 -0600
From: Ed Hartnett <ed@xxxxxxxxxxxxxxxx>
In-Reply-To: <200310231643.h9NGh3Ob006055@xxxxxxxxxxxxxxxx>
To: Russ Rew <russ@xxxxxxxxxxxxxxxx>
Subject: Re: question for Russ - signed vs. unsigned char and NC_BYTE
Received: (from majordo@localhost)
        by unidata.ucar.edu (UCAR/Unidata) id h9NHLtpu013551
        for netcdf-hdf-out; Thu, 23 Oct 2003 11:21:55 -0600 (MDT)
Received: from rodney.unidata.ucar.edu (rodney.unidata.ucar.edu 
[128.117.140.88])
        by unidata.ucar.edu (UCAR/Unidata) with ESMTP id h9NHLoOb013523;
        Thu, 23 Oct 2003 11:21:50 -0600 (MDT)
Organization: UCAR/Unidata
Keywords: 200310231721.h9NHLoOb013523
Cc: netcdf-hdf@xxxxxxxxxxxxxxxx, support-netcdf@xxxxxxxxxxxxxxxx
References: <200310231643.h9NGh3Ob006055@xxxxxxxxxxxxxxxx>
Lines: 37
User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.2
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Sender: owner-netcdf-hdf@xxxxxxxxxxxxxxxx
Precedence: bulk

> We have both functions so that users can read data into either signed
> or unsigned arrays of char without requiring an ugly cast.  If we only
> had nc_get_att_schar() and a user wanted to read NC_BYTE data into an
> array of unsigned char, they would have to use a cast or get a
> compiler complaint.
> 
But as a user looking at a new file, there's no way for me to tell if
I am dealing with signed or unsigned? I just have to know that in
advance?

Perhaps, someday, we should consider adding a type to netcdf-4 to
allow us to tell the difference?

We are saying, are we not, that NC_CHAR is to be used exclusively for
text strings? Or should that also be used for unsigned char, leaving
NC_BYTE to mean always signed data?

> In either case the same 8 bits are read into the same location in
> memory, but we have to provide both schar and uchar versions to allow
> the user to treat byte data as either signed or unsigned.  No
> conversion takes place reading/writing a signed or unsigned char in
> memory to or from a byte on disk, so users can still treat NC_BYTE
> data as unsigned char if they want to.  To allow them to do this
> without a cast, we provide the convenience function.

I understand that no conversion takes place.

In terms of checking for range errors, as in going from an int to a
NC_BYTE, my understanding was that I treat it as always signed. Is
that right?

> 
> For the same reason, we provide both nc_get_var_schar() and
> nc_get_var_uchar(), and similarly for the corresponding put_var
> functions.

Yes.