[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[netCDF #TSI-527912]: nccopy advice - rechunking very large files



Hi Dan,

This is just a short followup on using nccopy to rechunk files.

I'm assuming the goal is to allow fast access to all the data for a point
or small region for all 98128 times (each originally stored in a separate
chunk) without having to access 98128 distinct disk blocks.  This goal can
certainly be achieved by rechunking with data for all times in each chunk,
but that can require a lot of memory, because all the output chunks must be
kept in memory throughout the rechunking.

If you can accept making only a few disk accesses instead of only one to get
data for all the times for a point or small region, then the rechunking can
be done faster and using a lot less memory.  For example, if you measure and
conclude that using only 4 disk accesses instead of 98128 suffices for the
use case you have in mind, then rechunking to chunks with length 98128/4 = 
24532 along the time access means you only have to have enough memory for
1/4 of the output file, and the rechunking can still be done in about 30 minutes
on a disktop machine. For example, here's what it took on my Linux desktop,
reserving only 10 GB of memory for the chunk cache:

  $ /usr/bin/time nccopy -ctime/24532,x/16,y/12 -e 102000 -m 40M -h 10G -d0 
tmp.nc4 tmp-rechunked.nc4
  1264.99user 175.39system 31:34.06elapsed 76%CPU (0avgtext+0avgdata 
12299388maxresident)k
18554864inputs+77738408outputs (22856major+12001463minor)pagefaults 0swaps

Interactive access with 4 disk reads per query would probably seem just as fast 
as with
one disk access per query.  Similarly, accepting a number larger than 4 might 
be a good
compromise between access time and processing time to rechink the data ...

--Russ

Russ Rew                                         UCAR Unidata Program
address@hidden                      http://www.unidata.ucar.edu



Ticket Details
===================
Ticket ID: TSI-527912
Department: Support netCDF
Priority: Normal
Status: Closed


NOTE: All email exchanges with Unidata User Support are recorded in the Unidata inquiry tracking system and then made publicly available through the web. If you do not want to have your interactions made available in this way, you must let us know in each email you send to us.