[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: nclong



> Organization: NCSA
> Keywords: 199401042213.AA10567

Hi Chris,

> Hi, how's it going?  I've been looking at improving the 
> HDF 3.3 port on the Dec Alpha.  Specifically, the difference 
> between the number of bits in an NC_LONG variable in memory
> vs. on disk is a problem for the netCDF/HDF interaction.
> 
> In looking through the code for netcdf.h there is the 
> text:
> 
> #ifdef __alpha
>     typedef long      nclong;         /* when the library is modified to
>                                        * use `nclong' declarations, this
>                                        * will become an `int' */
> #   define NCLONG_DEFINED
> #endif
> #ifndef NCLONG_DEFINED
>     typedef long      nclong;         /* default, compatible type */
> #endif
> 
> So is it the intention that in the future data written to 
> variables / attributes of type NC_LONG should use the 
> C type 'nclong' rather than 'long'?  If so, do you have
> an idea about when you would be modifying the library to
> take this into account?  

Yes, that is our intention, and the changes should be in the next release.
I can't say how soon that version will be released, but I suspect it won't
be until mid-1994.  There may be a minor release before then fixing bugs.
There's still some work involved in changing all the documentation,
examples, and test programs to use nclong instead of long for declarations
of data variables.  We had given a rationale for this in a previous
announcement on the netcdfgroup mailing list:

    You should know, however, that the port of the FORTRAN interface to
    DEC's 64-bit Alpha machine uncovered an ambiguity in.  the netCDF
    specification.  In order to allow the writing of portable netCDF code,
    we have decided, after considerable discussion, to make a slight change
    to the netCDF specification.  This change will not affect the operation
    or portablility of existing or future netCDF code that is not intended
    to run on machines such as DEC's Alpha.  netCDF code for which such a
    machine is a possible platform, however, should be modified or written
    to adhere to the new specification.

    The change is the introduction of a new datatype, which is defined in
    the netCDF header file `netcdf.h'.  This new datatype is `nclong'.
    Maximally portable C code should use this datatype to hold all values of
    type NC_LONG.  For example:

            #include "netcdf.h"
            ...
            int    ncid, lid, status, dimids[NDIM];
            long   start[NDIM], count[NDIM];
            nclong data[SIZE];              /* NB: new datatype */
            ...
            lid    = ncvardef(ncid, "somelongvar", NC_LONG, dimids);
            ...
            status = ncvarput(ncid, lid, start, count, data);
            ...
            status = ncvarget(ncid, lid, start, count, data);

    Note that only variables for NC_LONG values should have type `nclong'.
    Other, traditionally `long' variables (such as the `count' and `start'
    vectors for hyperslab access) should remain as C `long's.

    FORTRAN programmers needn't worry about this change because portable
    FORTRAN code only has INTEGER values to play with (and not, for example,
    INTEGER*4, which is a non-portable datatype).  Thus, no change to the
    FORTRAN netCDF interface specification is required.

    This is the extent of the change.  The rest of this message gives the
    rationale for the change.

    The netCDF implementation and interface assume that a C `long' maps
    naturally into the 32 bit external integer representation of an NC_LONG.
    This is rooted in historical networking code traceable to the BSD
    functions ntohl() and htonl().

    With the introduction of DEC's Alpha machine, we are seeing reasons to
    question this.  The Alpha has 64 bit `long's and 32 bit `int's.  The
    natural and efficient choice for a C datatype which maps to an NC_LONG
    would, therefore, be `int' rather than `long'.  (We have encountered 64
    bit `long's before on the Cray, but there the `int' is also 64 bits, so
    there is no advantage to using different types).

    Furthermore, on the Alpha, the FORTRAN INTEGER type is 32 bits;
    therefore, keeping the C type as `long' for NC_LONG values would add
    costly transformations to the FORTRAN interface on this platform.

    On all platforms known to us (with the exception of the Alpha) the
    typedef for the new `nclong' datatype is

            typedef long nclong;

    and existing netCDF code will run without modification.  On the Alpha,
    however, the `nclong' typedef is

            typedef int nclong;

    We realize that changes to the interface specification are a hassle.
    The alternative in this case is to burden the Alpha platform (and any
    future system which makes similar design decisions) with greater memory
    usage and poor FORTRAN performance.

> Life is easier on my end if I can just modify the library
> to use nclong.  If it is something that the Unidata group
> has agreed should / will happen eventually it puts me on
> a much stronger footing.

Yes, you should be able to just use nclong, and realize that a recompilation
on alphas will be needed later when the typedef is changed.  By the way,
please don't use the other type names like "ncbyte" that were mistakenly
included in the netcdf 2.3.2 release, because the C interface and C++
interface currently disagree on what an "ncbyte" is; the C typedef has to be
removed from netcdf.h to even compile the C++ stuff.

> BTW, I managed to change the 'brows-o-rama' in Mosaic 2.1
> to be 'scientific data brows-o-rama' but I'm not sure the
> best way to textually display the dimension names.  Things
> are confused by the facts that 1) I think we should still keep
> the dimension sizes even with dimension names and 2) HDF (and 
> CDF when I finally add that it) allow unnamed dimensions.  Any 
> thoughts on how to do this in an aesthetically pleasing way 
> would be appreciated.

I agree that the dimension sizes are useful even with the dimension names.
When two dimensions have the same size, using the names is clearer.  One way
to include the dimension names would just be to repeat them every time they
are used, as in

    Dataset Z has rank 4 with dimensions [frtime=9, level=1, lat=73, lon=145]

instead of

    Dataset Z has rank 4 with dimensions [9, 1, 73, 145]

but continue to use the latter form when no dimension names are available.

Alternatively, when dimension names are available you could include an
optional section defining them and just use dimension names after that, as
in:

    Dimensions :

       There are 5 dimensions with the following names and current sizes:

        lat: 73
        lon: 145
        frtime: UNLIMITED, currently 9
        level: 1
        timelen: 20

    Available datasets : 

       Dataset Z has rank 4 with dimensions [frtime, level, lat, lon]

When no dimensions are available the dimension section would not appear and
sizes would be used instead of names.

I think the first of these alternatives would be easier to implement, but
result in somewhat less desirable results.  I'm not sure of a good way to
represent whether a dimension is UNLIMITED this way, for example.  The
second method conveys all the information more compactly, but might give
unimportant information undeserved prominence at the beginning of the
brows-o-rama if there were lots of rarely used dimensions, e.g. for string
lengths.

--Russ