[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: signedness



Harvey,

> I am in process of doing some of the simpler tasks on my FAN 'to-do' list.
> One I added to the list while over there was support of the 'signedness'
> attribute.
> 
> 'Signedness' is a horrible word (although it passes the standard unix
> 'spell') for a horrible kludge.  And the more I looked into it the
> horribler it became.  :-(

A weak defense: it's arguably better than the alternative, 'signosity'!  :-)

> The user guide entry on 'signedness' (p21) states that bytes default to
> unsigned.  But this is inconsistent with following in netcdf.h:
> 
> #define FILL_BYTE       ((char)-127)            /* Largest Negative value */

You're the first to point out this discrepancy, which I hadn't noticed.

> Incidently the Largest Negative value is -128 although it may be better not
> to use this (just as it is best not to use min short of -32768 because it
> can be difficult to print, etc.)

I vaguely recollect we were thinking of platforms that use 1's-complement
rather than 2's-complement representations for negative integers; such
machines have a -0 that is represented differently from 0 and have one fewer
negative values as a result.  But now that I think about it, there may not
be any 8-bit byte machines that still use 1's-complement representations.
The only 1's complement machine I'm familiar with is the Control Data
6x00/7x00 that had 6-bit bytes and 60-bit words.

So if there are no such machines around any more, then you're right and
FILL_BYTE should have been -128.  Similarly, I think the FILL_SHORT should
have been defined as -32768 rather than -32767, and the FILL_LONG should
have been defined as -2147483648 rather than -2147483647, but it would break
things to correct these now.  Incidentally, I don't see any problems
printing these constants, when using the right printf formats.

> My fan system treats bytes as signed.
> 
> All the current fill values are unsuitable for unsigned case. The obvious 
> unsigned fill value is the max e.g. 255 for unsigned byte.   

I agree.

> I thought about using sign of nc_type to represent signedness.  So unsigned
> short would have nc_type of -3.  But this would end up being half-hearted
> kludgy representation of unsigned types.  My feeling is that if unsigned is
> needed then it should be done properly by implementing proper full-blown new
> unsigned types.  Perhaps we could implement signed & unsigned 64-bit integers
> at the same time.

The biggest obstacle is the Fortran interface.  Although there are
(nonstandard but almost universally available) ways to declare bytes and
shorts in Fortran, I don't know of any way (even in Fortran 90) to declare
unsigned bytes or unsigned shorts.  So into what type of variable would a
Fortran program read an array of unsigned shorts?  It's currently not
possible to create a netCDF file from C that you can't access from Fortran,
and I think that is a desirable characteristic of netCDF.

We should add support for 64-bit integers (called long longs or hyperlongs
in the XDR documentation) when the vendor's XDR libraries include the
functions xdr_hyper or xdr_longlong_t.  I'm not sure what the current
availability of these is, except they seem to be available on Solaris 2.4,
but not on SunOS 4.1.4.

> In short term I suggest deprecating 'signedness' attribute in user guide
> (since it will be redundant when proper unsigned types are implemented) &
> stating that meanwhile, bytes default to SIGNED like other integers.  Perhaps
> we should first post an item to the netcdfgroup asking if anyone uses the
> attribute because we are considering deprecating it.

So far, the netCDF C library doesn't care whether bytes are signed or
unsigned, because no library calls ever do any arithmetic on the data.  The
purpose of the attribute was to permit communicating the intent of the data
provider to data consumers or applications.  It would be better to do this
with signed or unsigned types, but then we have to figure out how to handle
the Fortran interface.  Our future plans for packed data (e.g. an array of
10-bit values) will store information in something like an _Offset attribute
that will imply whether the unpacked data is signed or not.

Complicating the issue is the C++ implementation, that already uses

    typedef unsigned char ncbyte;

for data of type NC_BYTE and provides conversions on values for both
variables and attributes with member functions like the following:

    // The following member functions provide conversions from the value
    // type to a desired basic type.  If the value is out of range,
    // the default "fill-value" for the appropriate type is returned.

    virtual ncbyte as_ncbyte( int n ) const;    // nth value as an unsgnd char
    virtual char as_char( int n ) const;        // nth value as char
    virtual short as_short( int n ) const;      // nth value as short
    virtual long as_long( int n ) const;        // nth value as long
    virtual float as_float( int n ) const;      // nth value as floating-point
    virtual double as_double( int n ) const;    // nth value as double
    virtual char* as_string( int n ) const;     // nth value as string

Although the C++ interface was declared experimental and subject to change,
changing its assumption that bytes are unsigned will definitely break some
programs, though perhaps not as many as changing the definition of FILL_BYTE
from -127 to 255.  Of course we urged people in the Users Guide to use their
own fill values appropriate to the data rather than accepting the default
fill values.

The long and short of it :-) is that we have a problem.  Thanks for pointing
this out.

> I intend to continue to ignore the 'signedness' attribute in my FAN code.

The C++ interface and ncdump also ignore the 'signedness' attribute, so I
think it would be OK to deprecate it.  And the XDR underpinnings support
unsigned values for all the integer types.  If there were a way to handle
the unsigned types in Fortran, perhaps with some extra functions that
converted signed integer values to unsigned for the various netCDF types, I
would agree that we should "byte the bullet" and add unsigned types.  In
that case, it would be best if NC_BYTE were interpreted as signed also, even
in the C++ interface, since there would be an NC_UBYTE type for unsigned
bytes.

______________________________________________________________________________

Russ Rew                                           UCAR Unidata Program
address@hidden                              http://www.unidata.ucar.edu