[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[netCDF #TSI-527912]: nccopy advice - rechunking very large files



> On the subject of compression:
> The compression has finished for 3 different rates -d 0/5/9,
> and here are the results:

You may already be aware of this, but just to make sure, the
compression level corresponding to -d0 is *no* compression.  So
it might be useful to compare -d1, the lowest and supposedly 
fastest level of compression with -d5 and d9.  In my experience,
-d1 is a little bit faster than higher levels and they are a
little bit better compression, for a lot of large floating-point
data.  So I usually just use -d1 for compression, as the time
it saves is usually worth the small amount of extra data volume.

I used -d0 in the example I ran to explicitly specify that the
output was to be uncompressed.  I tought that would be somewhat
faster than compressing it when the output chunks were written
to disk, and it was significantly faster:

Writing uncompressed output took 35:24.38 seconds elapsed:

  $ nccopy -ctime/98128,x/8,y/6 -e 102000 -m 40M -h 40G -d0 tmp.nc4 
tmp-rechunked.nc4
  $ ls -l tmp-rechunked.nc4
  -rw-rw-r-- 1 russ ustaff 38970737448 Oct  7 12:36 tmp-rechunked.nc4
  
whereas compressing the output using level 1 (the default for 
nccopy is to compress the output at the same level as the 
input) took 52:29.25 seconds elapsed:

  $ nccopy -w -ctime/98128,x/8,y/6 -e 102000 -m 40M -h 40G tmp.nc4 
tmp-rechunked.nc4
  $ ls -l tmp-rechunked.nc4
  -rw-rw-r-- 1 russ ustaff 10951640022 Oct  7 18:55 tmp-rechunked.nc4

So in this case it looks like -d1 did pretty well, because the size of the
original compressed file (which used -d1 level compression) was only

  $ ls -l tmp.nc4
  -rw-rw-r-- 1 russ ustaff 10143354510 Oct  4 16:45 tmp.nc4

So I'm puzzled why the -d5 and -d9 were so much larger than the -d1 result.
If anything, I'd expect them to be a little smaller than the -d1 result.
But maybe your -d5 and -d9 were assuming 1/4 the size of output chunks,
using only 98128/4 along the time dimension?

--Russ

Russ Rew                                         UCAR Unidata Program
address@hidden                      http://www.unidata.ucar.edu



Ticket Details
===================
Ticket ID: TSI-527912
Department: Support netCDF
Priority: Normal
Status: Closed


NOTE: All email exchanges with Unidata User Support are recorded in the Unidata inquiry tracking system and then made publicly available through the web. If you do not want to have your interactions made available in this way, you must let us know in each email you send to us.