[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[netCDF #UGI-412728]: Unexpected chunking behavior test-case



Hi Charlie,

> As reported earlier, I get unexpected behavior with the netCDF4
> chunking routines. Below is a test case. Currently using
> netcdf-4.1-beta1-snapshot2009071200 on Ubuntu Jaunty.
> All previous tested netcdf4 versions show similar behavior.
> 
> 1. After defining a record variable but
> not defining any chunking for the record variable, a call to
> nc_inq_var_chunking() returns storage_type=NC_CHUNKED.
> 
> zender@neige:~/c$ gcc -std=c99 -o bug_cnk bug_cnk.c -L/usr/local/lib
> -lnetcdf -lhdf5_hl -lhdf5 -lcurl
> zender@neige:~/c$ ./bug_cnk
> Sorry! Unexpected result, bug_cnk.c, line: 51

This is actually the expected and (not very well) documented behavior.
Under the documentation of the "storage" parameter for
nc_def_var_chunking(), it says:

  storage
      If NC_CONTIGUOUS, then contiguous storage is used for this
      variable. Variables with one or more unlimited dimensions cannot
      use contiguous storage. If contiguous storage is turned on, the
      chunksizes parameter is ignored.

I've just added the following paragraph to the introductory
description of the nc_def_var_chunking function, to make this more
prominent:

  Variables that make use of one or more unlimited dimensions,
  compression, or checksums must use chunking.  Such variables are
  created with default chunk sizes of 1 for each unlimited dimension and
  the dimension length for other dimensions, except that if the
  resulting chunks are too large, the default chunk sizes for non-record
  dimensions are reduced.

We actually said in response to one of your earlier reports

  ... When vars are created, they are contiguous by default, if they
  have no compression, checksum, or unlimited dimensions.

but that was sort of buried in more information about handling a
bug.  The preceding statement in the same response

  ... Any variable may be chunked or contiguous without reference to
  the settings of other variables.

is not quite correct, but should have instead said:

  ... Any variable that is not compressed, checksummed, or that makes
  use of unlimited dimensions may be chunked or contiguous without
  reference to the settings of other variables.

I think it's a bug that nc_def_var_chunking ignores an attempt to set
contiguous storage for a variable that must use chunking, and the
function should return an error in this case, rather then letting the
bug result in subsequent HDF-level errors.  Assuming Ed agrees, we'll
get this fixed in a subsequent snapshot release and let you know about
it.

--Russ

> 2. Modifying the code by one line and uncommenting line 47,
> which explicitly sets the variable to unchunked by calling
> nc_def_var_chunking() with storage type = NC_CONTIGUOUS,
> leads to numerous HDF complaints followed by failure of the
> nc_put_var_int() call.
> 
> ender@neige:~/c$ gcc -std=c99 -o bug_cnk bug_cnk.c -L/usr/local/lib
> -lnetcdf -lhdf5_hl -lhdf5 -lcurl
> 
> zender@neige:~/c$ ./bug_cnk
> HDF5-DIAG: Error detected in HDF5 (1.8.3-snap2) thread 0:
> #000: H5Ddeprec.c line 170 in H5Dcreate1(): unable to create dataset
> major: Dataset
> minor: Unable to initialize object
> #001: H5Dint.c line 430 in H5D_create_named(): unable to create and
> link to dataset
> major: Dataset
> 
> minor: Unable to initialize object
> #002: H5L.c line 1639 in H5L_link_object(): unable to create new link
> to object
> major: Links
> minor: Unable to initialize object
> #003: H5L.c line 1862 in H5L_create_real(): can't insert link
> major: Symbol table
> minor: Unable to insert object
> #004: H5Gtraverse.c line 877 in H5G_traverse(): internal path
> traversal failed
> major: Symbol table
> minor: Object not found
> #005: H5Gtraverse.c line 703 in H5G_traverse_real(): traversal
> operator failed
> major: Symbol table
> minor: Callback failed
> #006: H5L.c line 1685 in H5L_link_cb(): unable to create object
> major: Object header
> minor: Unable to initialize object
> #007: H5O.c line 2596 in H5O_obj_create(): unable to open object
> major: Object header
> minor: Can't open object
> #008: H5Doh.c line 293 in H5O_dset_create(): unable to create dataset
> major: Dataset
> minor: Unable to initialize object
> #009: H5Dint.c line 1141 in H5D_create(): unable to initialize layout
> information
> major: Dataset
> minor: Unable to initialize object
> #010: H5Dcontig.c line 406 in H5D_contig_construct(): extendible
> contiguous non-external dataset
> major: Dataset
> minor: Feature is unsupported
> Sorry! Unexpected result, bug_cnk.c, line: 48
> 
> Are either/both of these the expected behavior?
> I thought all variables, including record variables, were
> NC_CONTIGUOUS until/unless explicitly set otherwise.
> Both these tests work run to completion when
> working on a non-record dimension/variable (as you may
> verify by commenting line 43 and uncommenting line 44).
> 
> Probably I just don't understand the intended chunking conventions.
> But I did read the documentation carefully. Any insights appreciated.
> 
> Thanks,
> Charlie
> 
> ***********************************************************************
> /* First line of bug_cnk.c */
> 
> /* Purpose: Demonstrate netCDF4 chunking behavior */
> 
> /* Usage:
> cd ~/c;./bug_cnk
> cd ~/c;gcc -std=c99 -o bug_cnk bug_cnk.c -L/usr/local/lib -lnetcdf
> -lhdf5_hl -lhdf5 -lcurl */
> 
> #include <stdio.h>
> #include <netcdf.h> /* netCDF definitions and C library */
> 
> /* Glue code */
> #define FILE_NAME "./bug_cnk.nc"
> int ncid; // [id] netCDF file ID
> 
> #define ERR do { \
> fflush(stdout); /* Make sure our stdout is synced with stderr. */ \
> err++; \
> fprintf(stderr, "Sorry! Unexpected result, %s, line: %d\n", \
> __FILE__, __LINE__); \
> } while (0)
> int err=0; // global
> 
> int main(){
> /* Code from netCDF4 C Users Manual p. 90 */
> #define NDIMS6 1
> #define DIM6_NAME "D5"
> #define VAR_NAME6 "V5"
> #define DIM6_LEN 100
> int dimids[NDIMS6], dimids_in[NDIMS6];
> int varid;
> int ndims, nvars, natts, unlimdimid;
> nc_type xtype_in;
> char name_in[NC_MAX_NAME + 1];
> int data[DIM6_LEN], data_in[DIM6_LEN];
> size_t chunksize_in[NDIMS6];
> int storage_in;
> int i, d;
> for (i = 0; i < DIM6_LEN; i++)
> data[i] = i;
> /* Create a netcdf-4 file with one dim and one var. */
> if (nc_create(FILE_NAME, NC_NETCDF4, &ncid)) ERR;
> if (nc_def_dim(ncid, DIM6_NAME, NC_UNLIMITED, &dimids[0])) ERR;
> // if (nc_def_dim(ncid, DIM6_NAME, DIM6_LEN, &dimids[0])) ERR;
> if (dimids[0] != 0) ERR;
> if (nc_def_var(ncid, VAR_NAME6, NC_INT, NDIMS6, dimids, &varid)) ERR;
> //if (nc_def_var_chunking(ncid, varid, NC_CONTIGUOUS, NULL)) ERR;
> if (nc_put_var_int(ncid, varid, data)) ERR;
> /* Check stuff. */
> if (nc_inq_var_chunking(ncid, 0, &storage_in, chunksize_in)) ERR;
> if (storage_in != NC_CONTIGUOUS) ERR;
> }
> ************************************************************************

Russ Rew                                         UCAR Unidata Program
address@hidden                     http://www.unidata.ucar.edu



Ticket Details
===================
Ticket ID: UGI-412728
Department: Support netCDF
Priority: Normal
Status: Closed