[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[netCDF #WWQ-664381]: netcdf 4.0 filesize for large arrays



Hi Morten,

Sorry it has taken so long to respond to your question, but I think
I have a solution.

It looks like you have uncovered a problem with the default chunk sizes
used in netCDF-4, and the temporary workaround is to set variable chunk
sizes explicitly until we have the default chunk sizes fixed in the
upcoming release.  Setting the chunksizes to what will be the defaults
not only makes your file size what you expect, it also greatly improves
the time it takes to write your file.

Specifically, you could use the new "-s" option of ncdump to output the
chunk sizes for your variable (this is only available in the latest beta
or daily snapshot release):

  $ ncdump -s -h mwp-bug.nc
  netcdf mwp-bug {
  dimensions:
        x = 702 ;
        y = 1201 ;
        z = 1301 ;
  variables:
        int var(x, y, z) ;
                var:_Storage = "chunked" ;
                var:_ChunkSizes = 351, 600, 650 ;
                var:_Endianness = "little" ;

  // global attributes:
                :_Format = "netCDF-4" ;
  }

shows that by default, your 3D variable was tiled into
351 x 600 x 650 size tiles, so instead of 8 tiles of
data (as 351 x 601 x 651 would have given you), you got
18 tiles of data!  The last tile along the y and z
dimensions was mostly empty.

A temporary fix for this bug is to explicitly set the 
chunk sizes, which right now requires a call from the C
library, as the C++ library hasn't got this call yet:

    static size_t var_chunks[3] = {351, 601, 651};
  ...
    /* Set chunk sizes to something better than the defaults */
   stat = nc_def_var_chunking(ncid, var_id, 0, var_chunks);
   check_err(stat,__LINE__,__FILE__);

When I did this on your example, the resulting file was the
expected size, and it took much less time to write (4 minutes
instead of 10 minutes).

Thanks for pointing out this important bug!  Fixing this should
result in better performance for everyone using netCDF-4 with
large variables ...

--Russ


   
 

  

Russ Rew                                         UCAR Unidata Program
address@hidden                     http://www.unidata.ucar.edu



Ticket Details
===================
Ticket ID: WWQ-664381
Department: Support netCDF
Priority: High
Status: Closed