Hi Joe, > Thank you for that detailed answer! No problem. I'm at a conference all this week, so can't provide much detail until next week. However, I'm very interested in chunking issues, and would like to get back to finding a good solution for your use case. > So the chunk cache is only used to cache output chunks? Then it makes sense > that it doesn't need much. Right, nccopy reads the input one chunk at (if it's chunked), and writes to the output, spreading the values among as many output chunks as needed. So if there's insufficient cache for holding all the output chunks that have values from one input chunk, it will have to write the same chunk multiple times, for each chunk that gets kicked out of the chunk cache to make room for other chunks. > So in the example file you looked at, rechunked from 1x360x720 to 60x60x60, > nccopy will read in the first 60 time slices in order to create the first > output chunk, and then restart and read them again (hopefully from OS disk > cache next time) to create the next output chunk? No, I think it works as described above, only reading each input chunk once. It might work faster in some cases to read input values multiple times in order to write output chunks fewer times, but nccopy isn't smart enough to know how to do that yet. > My real data files are 100 times larger than the one I sent you (~7GB > compressed, ~70GB decompressed). I can easily rechunk them to fairly square > chunks, takes 20 minutes or so, but with very large chunk sizes on the time > dimension (say 4000x1x1) it takes hours. Currently my strategy has been to do > it in several steps, which seems to help. Is that a reasonable approach for > large files or am I doing something wrong if it's that slow? I'm very interested in knowing about an example that works faster by rechunking in several nccopy steps. I've suspected that might sometimes be a better strategy, but haven't figured out a good example that demonstrates it. --Russ > I've been rechunking compressed files, and it seems to be CPU bound, if > there's no input chunk cache (except the OS disk cache) that makes sense > since it would need to decompress the same chunks over and over. > > Thanks and have a nice weekend! > > - Joe > ________________________________________ > From: Unidata netCDF Support address@hidden > Sent: 03 April 2014 19:25 > To: Joe Siltberg > Cc: address@hidden > Subject: [netCDF #ACI-624328]: Experiences from rechunking with nccopy > > Hi Joe, > > > I've been doing some rechunking of large files with nccopy, and have > > encountered two potential issues. > > > > The first is that the -h switch seems to work only up to around 1.7 > > GB. Specifying -h 2G or 5G doesn't seem to affect memory usage of > > the process, or its performance. This was tested with 4.3.1.1 on 64 > > bit CentOS, and also with your prebuilt 64 bit Windows binaries (also > > 4.3.1.1). > > What you're seeing is that the performance isn't limited by the chunk > cache size, it's limited by the actual I/O. Although you requested 5 > GBytes of chunk cache, the library doesn't malloc that much, but only > mallocs what it needs up to that amount. So it doesn't need more than > 1.7 GBytes for the output file chunk cache in your case. > > Here's a demonstration you can see from the example file you sent, > where you want to rechunk the variable Tair: > > dimensions: > lon = 720 ; > lat = 360 ; > time = UNLIMITED ; // (365 currently) > ... > double Tair(time, lat, lon) ; > > from 365 x 360 x 720 to 60 x 60 x 60. I renamed your input file > "htest.nc" and time each of the following commands after purging the > OS disk buffers, so the data is actually read from the disk rather > than from OS cache. If I run (and time) > > $ ./purge; time nccopy -c 'time/60,lat/60,lon/60' -m 5g -h 5g htest.nc tmp.nc > real 0m28.33s > > But I get essentially the same time if I only reserve enough chunk > cache for one output chunk, 60*60*60*8 bytes: > > $ ./purge; time nccopy -c 'time/60,lat/60,lon/60' -m 5g -h 1.728m htest.nc > tmp.nc > real 0m27.89s > > And I don't actually need to reserve any memory at all for the input > buffer, because all it needs is enough for one chunk of the input, and > it allocates that much by default: > > $ ./purge; time nccopy -c 'time/60,lat/60,lon/60' -h 1.728m htest.nc tmp.nc > real 0m24.86s > > But if I don't have enough chunk cache to hold at least one chunk of > the output file, it takes *much* longer: > > $ ./purge; time nccopy -c 'time/60,lat/60,lon/60' -h 1.727m htest.nc tmp.nc > real 16m20.36s > > Your results will vary with more variables and more times. For > example, with 2 variables of the size of Tair, you need to hold two > chunks of output (about 3.5m), but in that case it turns out to be > good to specify a larger input file buffer: > > $./purge; time nccopy -c 'time/60,lat/60,lon/60' -m 100m -h 3.5m htest2.nc > tmp.nc > real 2m49.04s > > Incidentally, I get better performance on the small example by using > the "-w" option to use a diskless write and keep the output in memory > until the output is closed, not using -h at all: > > $ ./purge; time nccopy -c 'time/60,lat/60,lon/60' -m 100m -w htest2.nc tmp.nc > real 2m13.14s > > --Russ > > Russ Rew UCAR Unidata Program > address@hidden http://www.unidata.ucar.edu > > > > Ticket Details > =================== > Ticket ID: ACI-624328 > Department: Support netCDF > Priority: Normal > Status: Closed > > Russ Rew UCAR Unidata Program address@hidden http://www.unidata.ucar.edu Ticket Details =================== Ticket ID: ACI-624328 Department: Support netCDF Priority: Normal Status: Closed
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata inquiry tracking system and then made publicly available through the web. If you do not want to have your interactions made available in this way, you must let us know in each email you send to us.