[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[netCDF #WWQ-664381]: netcdf 4.0 filesize for large arrays



Morten,

After thinking more about the problem you uncovered with default chunk sizes in
netCDF-4,
I wondered why ordinary contiguous storage wasn't used for that case.  I don't
think there is
any advantage to using chunked storage for variables that use only fixed-size
dimensions
and use no compression, checksums, or any other "filters" that require chunked
storage.  There
is a large advantage to chunked storage if the data will be read in a different
order than the
order in which it was written, for example reading slices along the x or y
dimension after
writing slices along the z dimension.

But there is also a price in performance if chunked storage is used instead of
contiguous
storage.  For example, here are the times for writing the netCDF file in your
example, with
one 3D int variable dimensioned for 702 x 1201 x 1301:

  Using contiguous storage:
  26.36user 28.57system 3:05.05elapsed 29%CPU (0avgtext+0avgdata 0maxresident)k
  72inputs+16912048outputs (0major+109737minor)pagefaults 0swaps
  ls -l mwp-cont.nc
  -rw-r--r-- 1 russ ustaff 4387508754 Jan 24 06:55 mwp-cont.nc

  Using chunk sizes 351 x 601 x 651
  46.10user 41.00system 3:53.36elapsed 37%CPU (0avgtext+0avgdata 0maxresident)k
  5760inputs+17089216outputs (34major+1182343minor)pagefaults 0swaps
  ls -l mwp-cs.nc
  -rw-r--r-- 1 russ ustaff 4394540323 Jan 24 07:01 mwp-cs.nc

  Using the currently buggy default chunk sizes of 351 x 600 x 650
  56.23user 75.91system 6:23.57elapsed 34%CPU (0avgtext+0avgdata 0maxresident)k
  208496inputs+32052344outputs (34major+2515730minor)pagefaults 0swaps
  ls -l mwp-bug.nc
  -rw-r--r-- 1 russ ustaff 9856089091 Jan 24 07:08 mwp-bug.nc

I think we should consider making the default storage layout contiguous instead
of
chunked for fixed-size variables when compression is not used, such as in your
example.

--Russ




Russ Rew                                         UCAR Unidata Program
address@hidden                     http://www.unidata.ucar.edu



Ticket Details
===================
Ticket ID: WWQ-664381
Department: Support netCDF
Priority: High
Status: Closed