[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[netCDF #JJL-124364]: Importing data without endianness conversion


I've looked at the sample program you sent, and understand that you're
expecting netCDF to interpret the 

  res = nc_def_var_endian(ncid, varid, NC_ENDIAN_BIG);
  res = nc_put_var(ncid, varid, data);
function calls to not only define the on-disk representation for floating-
point values for the variable to be big-endian, but you also want the data
in the nc_put_var() call to somehow be interpreted as using big-endian
encoding as well, rather than the native type of an array of char, which
is how data is declared:

  /* 123.456 in big-endian. */
  const char *data = "\x40\x5e\xdd\x2f\x1a\x9f\xbe\x77";

However, that's not how the nc_put_var() function works.  The only documented
uses for nc_put_var() are for writing user-defined data, such as compound
or variable-length types, not for arrays of primitive type data.  Ordinarily,
to write a numeric value in a netCDF variable of type NC_DOUBLE, you would

  res = nc_put_var_TYPE(ncid, varid, data);

where TYPE denotes a primitive numeric type represented in native (in-memory)
form, such as uchar, schar, short, int, long, float, double, ushort, uint, 
longlong, or ulonglong.  In each case except for "double", a conversion takes
place from the in-memory native type to big-endian double on the disk.  There
is no way to indicate that the type of the data to be written is other than a
native numeric type.

HDF5 has a richer type system that includes user-defined primitive types, but
netCDF-4 intentionally doesn't support user-defined primitive types, as
explained here (where they are called "user-defined atomic types"):


So, I'm sorry to say, you'll have to either convert from big-endian to native
type in memory first, before you try to write the data, or use HDF5 instead of
netCDF-4.  Converting from big-endian to little-endian is actually fast in C,
and the netCDF library even contains internal functions to do that conversion.
See the swap8b() function in libsrc/ncx.c ...



> On 07/30/2013 10:07 PM, Unidata netCDF Support wrote:
> > Hi Peter,
> >
> > Sorry to have taken so long to respond to your question ...
> >> Is there any way in which dataset can be created from binary big-endian
> >> data on a little-endian host without endianness conversion applied?
> >>
> >> I have data in big-endian, and I would like to import it into a
> >> H5T_IEEE_F64BE dataset as is. Sadly, the function nc_def_var_endian is
> >> not good enough - although it creates a H5T_IEEE_F64BE dataset, the
> >> interpretation of the raw data is still little-endian, and a conversion
> >> is done (leading to incorrect values).
> >
> > Are you reading the data from an HDF5 file, or from a netCDF-4 file?  Is
> > the little endian data marked as little endian in the file you are trying
> > to read?  That is, if you run
> >
> >   ncdump -s -v VAR INPUTFILE
> >
> > where the "-s" is for showing special virtual attributes such as endianness
> > and the "-v VAR" is for looking at a specific variable named VAR in the 
> > input
> > file, do you see the attribute
> >
> >   VAR:_Endianness = "little" ;
> >
> > where, again, VAR is the name of the variable (HDF5 dataset) you're looking 
> > for.
> Actually, the data is read from a binary file (originally from a
> big-endian Fortran program). It is stored in a uint8_t *data variable,
> with every 8 bytes coding one 64-bit floating point number, but the
> order of bytes is not matching the host endianness.
> >> Both nc_put_var and nc_pur_vara behave the same in this respect.
> >
> > If this is a bug, we'd like to fix it.  But we have tests for endianness, 
> > and
> > would need a small program that demonstrates this bug, so we could duplicate
> > it here and fix it.  Note that setting the endianness for a netCDF variable
> > only affects its representation on disk when writing values.  It does not
> > affect the way data is decoded and represented in native types when reading.
> > That is always determined by how HDF5 has labelled the data type, as little-
> > endian or big-endian.
> I don't think this is a bug. It's just that the nc_put_* functions
> expect the raw data array to be in native endiannness. I hoped that
> there might be a way I could take the big-endian data and save it in a
> big-endian dataset with no conversion needed (at least not at the time
> of writing).
> >> With HDF5 API this is easily achieved by setting the type to
> >> H5T_IEEE_F64BE, but I would prefer to use the netcdf API.
> >
> > --Russ
> >
> > Russ Rew                                         UCAR Unidata Program
> > address@hidden                      http://www.unidata.ucar.edu
> >
> >
> >
> > Ticket Details
> > ===================
> > Ticket ID: JJL-124364
> > Department: Support netCDF
> > Priority: High
> > Status: Closed
> Attached is an example program which demonstrates the problem.
> $ # On a little-endian host.
> $ gcc -o ncendian -Wall ncendian.c -lnetcdf
> $ ./ncendian
> $ h5dump -d data ncendian.nc
> HDF5 "ncendian.nc" {
> DATASET "data" {
> DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
> DATA {
> (0): 6.31921e+268
> }
> DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
> DATA {
> (0): (DATASET 255 /dim1 )
> }
> }
> }
> }
> Expected value is 123.456. Even though nc_def_var_endian sets datatype
> to H5T_IEEE_F64BE, the interpretation of the array data is still
> little-endian. When nc_def_var_endian is not used, the output is the
> same (6.31921e+268), only datatype changes to H5T_IEEE_F64LE.
> Regards,
> Peter
Russ Rew                                         UCAR Unidata Program
address@hidden                      http://www.unidata.ucar.edu

Ticket Details
Ticket ID: JJL-124364
Department: Support netCDF
Priority: High
Status: Closed

NOTE: All email exchanges with Unidata User Support are recorded in the Unidata inquiry tracking system and then made publicly available through the web. If you do not want to have your interactions made available in this way, you must let us know in each email you send to us.