Hi Henri, > I have a 200GB uncompressed NetCDF file with 5 variables (+lat,lon,time) of > ECMWF ERA-Interim data like this: > > dimensions(sizes): lon(480), lat(241), time(99351) > > I need to access all time instants of the data, one gridpoint at a time. > Unfortunately the data is organized inefficiently for this, and retrieving > one slice takes 10 minutes or so. I have tried to rechunk the data with this > command: > > nccopy -w -c time/99351,lat/1,lon/1 all5.nc all5_T.nc > > but the processing has taken 9 days already (I have allocated 1 CPU and 250GB > of memory to it). Is there some way to estimate how itâs doing and how long > this will take? I ran the same command with a test file of only 9 grid > points, and estimated that if the process scaled perfectly, the full data > would be finished in 2 days. > > Alternatively, is there some smarter way to do this? I suppose I should have > done this in smaller pieces, but Iâd hate to kill the process now if itâs > close to finishing. You might want to read these blog posts, if you haven't already: http://www.unidata.ucar.edu/blogs/developer/en/entry/chunking_data_why_it_matters http://www.unidata.ucar.edu/blogs/developer/en/entry/chunking_data_choosing_shapes You haven't mentioned whether the 200 GB source file is a netCDF-4 classic model format using compression. That might make a difference, as you may be spending an enormous amount of time uncompressing the same source chunks over and over again, due to using a small chunk cache. Even if the source data is not compressed, you probably need to specify use of a chunk cache to make sure the same source data doesn't need to be reread from the disk repeatedly for each of the 480x241 points. And I would advise using a different shape for the output chunks, something more like time/10000,lat/10,lon/20 so that you can get the data for one point with 10 disk accesses instead of 1, probably still fast enough. Also, such a shape would store data for 200 adjacent points together in 1 chunk, so if it's cached, nearby queries will be very fast after the first. I would also advise just giving up on the current nccopy, which may well take a year to finish! Spend a little time experimenting with using some of the advanced nccopy options, such as -w, -m, -h, and -e, which could make a significant difference in rechunking time: http://www.unidata.ucar.edu/netcdf/docs/nccopy-man-1.html What works best is platform-specific, but you may be able to get something close to optimum by timing with smaller examples. I'd be interested in knowing what turns out to be practical! --Russ Russ Rew UCAR Unidata Program address@hidden http://www.unidata.ucar.edu Ticket Details =================== Ticket ID: UAU-670796 Department: Support netCDF Priority: Normal Status: Closed
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata inquiry tracking system and then made publicly available through the web. If you do not want to have your interactions made available in this way, you must let us know in each email you send to us.