[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[netCDF #JJL-124364]: Importing data without endianness conversion



Peter,

I've looked at the sample program you sent, and understand that you're
expecting netCDF to interpret the 

  res = nc_def_var_endian(ncid, varid, NC_ENDIAN_BIG);
    ...
  res = nc_put_var(ncid, varid, data);
  
function calls to not only define the on-disk representation for floating-
point values for the variable to be big-endian, but you also want the data
in the nc_put_var() call to somehow be interpreted as using big-endian
encoding as well, rather than the native type of an array of char, which
is how data is declared:

  /* 123.456 in big-endian. */
  const char *data = "\x40\x5e\xdd\x2f\x1a\x9f\xbe\x77";

However, that's not how the nc_put_var() function works.  The only documented
uses for nc_put_var() are for writing user-defined data, such as compound
or variable-length types, not for arrays of primitive type data.  Ordinarily,
to write a numeric value in a netCDF variable of type NC_DOUBLE, you would
call

  res = nc_put_var_TYPE(ncid, varid, data);

where TYPE denotes a primitive numeric type represented in native (in-memory)
form, such as uchar, schar, short, int, long, float, double, ushort, uint, 
longlong, or ulonglong.  In each case except for "double", a conversion takes
place from the in-memory native type to big-endian double on the disk.  There
is no way to indicate that the type of the data to be written is other than a
native numeric type.

HDF5 has a richer type system that includes user-defined primitive types, but
netCDF-4 intentionally doesn't support user-defined primitive types, as
explained here (where they are called "user-defined atomic types"):

  http://www.unidata.ucar.edu/software/netcdf/docs/faq.html#fv15

So, I'm sorry to say, you'll have to either convert from big-endian to native
type in memory first, before you try to write the data, or use HDF5 instead of
netCDF-4.  Converting from big-endian to little-endian is actually fast in C,
and the netCDF library even contains internal functions to do that conversion.
See the swap8b() function in libsrc/ncx.c ...

--Russ

  

> On 07/30/2013 10:07 PM, Unidata netCDF Support wrote:
> > Hi Peter,
> >
> > Sorry to have taken so long to respond to your question ...
> >> Is there any way in which dataset can be created from binary big-endian
> >> data on a little-endian host without endianness conversion applied?
> >>
> >> I have data in big-endian, and I would like to import it into a
> >> H5T_IEEE_F64BE dataset as is. Sadly, the function nc_def_var_endian is
> >> not good enough - although it creates a H5T_IEEE_F64BE dataset, the
> >> interpretation of the raw data is still little-endian, and a conversion
> >> is done (leading to incorrect values).
> >
> > Are you reading the data from an HDF5 file, or from a netCDF-4 file?  Is
> > the little endian data marked as little endian in the file you are trying
> > to read?  That is, if you run
> >
> >   ncdump -s -v VAR INPUTFILE
> >
> > where the "-s" is for showing special virtual attributes such as endianness
> > and the "-v VAR" is for looking at a specific variable named VAR in the 
> > input
> > file, do you see the attribute
> >
> >   VAR:_Endianness = "little" ;
> >
> > where, again, VAR is the name of the variable (HDF5 dataset) you're looking 
> > for.
> 
> Actually, the data is read from a binary file (originally from a
> big-endian Fortran program). It is stored in a uint8_t *data variable,
> with every 8 bytes coding one 64-bit floating point number, but the
> order of bytes is not matching the host endianness.
> 
> >> Both nc_put_var and nc_pur_vara behave the same in this respect.
> >
> > If this is a bug, we'd like to fix it.  But we have tests for endianness, 
> > and
> > would need a small program that demonstrates this bug, so we could duplicate
> > it here and fix it.  Note that setting the endianness for a netCDF variable
> > only affects its representation on disk when writing values.  It does not
> > affect the way data is decoded and represented in native types when reading.
> > That is always determined by how HDF5 has labelled the data type, as little-
> > endian or big-endian.
> 
> I don't think this is a bug. It's just that the nc_put_* functions
> expect the raw data array to be in native endiannness. I hoped that
> there might be a way I could take the big-endian data and save it in a
> big-endian dataset with no conversion needed (at least not at the time
> of writing).
> 
> >> With HDF5 API this is easily achieved by setting the type to
> >> H5T_IEEE_F64BE, but I would prefer to use the netcdf API.
> >
> > --Russ
> >
> > Russ Rew                                         UCAR Unidata Program
> > address@hidden                      http://www.unidata.ucar.edu
> >
> >
> >
> > Ticket Details
> > ===================
> > Ticket ID: JJL-124364
> > Department: Support netCDF
> > Priority: High
> > Status: Closed
> 
> Attached is an example program which demonstrates the problem.
> 
> $ # On a little-endian host.
> $ gcc -o ncendian -Wall ncendian.c -lnetcdf
> $ ./ncendian
> $ h5dump -d data ncendian.nc
> HDF5 "ncendian.nc" {
> DATASET "data" {
> DATATYPE  H5T_IEEE_F64BE
> DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
> DATA {
> (0): 6.31921e+268
> }
> ATTRIBUTE "DIMENSION_LIST" {
> DATATYPE  H5T_VLEN { H5T_REFERENCE { H5T_STD_REF_OBJECT }}
> DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
> DATA {
> (0): (DATASET 255 /dim1 )
> }
> }
> }
> }
> 
> Expected value is 123.456. Even though nc_def_var_endian sets datatype
> to H5T_IEEE_F64BE, the interpretation of the array data is still
> little-endian. When nc_def_var_endian is not used, the output is the
> same (6.31921e+268), only datatype changes to H5T_IEEE_F64LE.
> 
> Regards,
> 
> Peter
> 
> 
Russ Rew                                         UCAR Unidata Program
address@hidden                      http://www.unidata.ucar.edu



Ticket Details
===================
Ticket ID: JJL-124364
Department: Support netCDF
Priority: High
Status: Closed