Hi Joe, > I've been doing some rechunking of large files with nccopy, and have > encountered two potential issues. > > The first is that the -h switch seems to work only up to around 1.7 > GB. Specifying -h 2G or 5G doesn't seem to affect memory usage of > the process, or its performance. This was tested with 4.3.1.1 on 64 > bit CentOS, and also with your prebuilt 64 bit Windows binaries (also > 4.3.1.1). What you're seeing is that the performance isn't limited by the chunk cache size, it's limited by the actual I/O. Although you requested 5 GBytes of chunk cache, the library doesn't malloc that much, but only mallocs what it needs up to that amount. So it doesn't need more than 1.7 GBytes for the output file chunk cache in your case. Here's a demonstration you can see from the example file you sent, where you want to rechunk the variable Tair: dimensions: lon = 720 ; lat = 360 ; time = UNLIMITED ; // (365 currently) ... double Tair(time, lat, lon) ; from 365 x 360 x 720 to 60 x 60 x 60. I renamed your input file "htest.nc" and time each of the following commands after purging the OS disk buffers, so the data is actually read from the disk rather than from OS cache. If I run (and time) $ ./purge; time nccopy -c 'time/60,lat/60,lon/60' -m 5g -h 5g htest.nc tmp.nc real 0m28.33s But I get essentially the same time if I only reserve enough chunk cache for one output chunk, 60*60*60*8 bytes: $ ./purge; time nccopy -c 'time/60,lat/60,lon/60' -m 5g -h 1.728m htest.nc tmp.nc real 0m27.89s And I don't actually need to reserve any memory at all for the input buffer, because all it needs is enough for one chunk of the input, and it allocates that much by default: $ ./purge; time nccopy -c 'time/60,lat/60,lon/60' -h 1.728m htest.nc tmp.nc real 0m24.86s But if I don't have enough chunk cache to hold at least one chunk of the output file, it takes *much* longer: $ ./purge; time nccopy -c 'time/60,lat/60,lon/60' -h 1.727m htest.nc tmp.nc real 16m20.36s Your results will vary with more variables and more times. For example, with 2 variables of the size of Tair, you need to hold two chunks of output (about 3.5m), but in that case it turns out to be good to specify a larger input file buffer: $./purge; time nccopy -c 'time/60,lat/60,lon/60' -m 100m -h 3.5m htest2.nc tmp.nc real 2m49.04s Incidentally, I get better performance on the small example by using the "-w" option to use a diskless write and keep the output in memory until the output is closed, not using -h at all: $ ./purge; time nccopy -c 'time/60,lat/60,lon/60' -m 100m -w htest2.nc tmp.nc real 2m13.14s --Russ Russ Rew UCAR Unidata Program address@hidden http://www.unidata.ucar.edu Ticket Details =================== Ticket ID: ACI-624328 Department: Support netCDF Priority: Normal Status: Closed
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata inquiry tracking system and then made publicly available through the web. If you do not want to have your interactions made available in this way, you must let us know in each email you send to us.