[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Support #CUV-251255]: Nccopy extremly slow / hangs



Hi Mark,

I've created another Jira issue for this problem, which you can track here:

  https://www.unidata.ucar.edu/jira/browse/NCF-85

Currently by manually adding a specification to use 4GB of chunk cache
to nccopy, it's possible to rechunk for time series access in under 10
minutes.  This is from the original 17.5GB netCDF-3classic file with 
time being a record dimension (but works just as well if time is a 
fixed-size dimension) to a 203MB compressed netCDF-4 classic model file, 
using made-up data:

   ./nccopy -d1 -k4 -m 6G -c time/1698,latitude/7,longitude/6 classic.nc 
reshaped.nc

With real data that's less compressible I expect your rechunking will
take somewhat longer, but I can't estimate how much.

Although eventually I'd like to modify nccopy to figure out a good
size for the chunk cache without need for user guidance, for now I'm
adding an option so you'll be able to specify the size of chunk cache
to use, so you can experiment with what works best for your data.

I'll let you know when a modified nccopy.c will be available with the
changes to try out.  I'm be interested in whether you will get similar
results to the above with real data.

--Russ  

> > Thanks for filling me in - finding a bug is not such a bad thing! :-) I'm 
> > currently away on holiday the next week or two, so I won't be able to try 
> > it for a while. But please let me know how things process with the rest of 
> > it .
> 
> Your example revealed another bug in the way the netCDF-4 library chooses 
> default chunk sizes for variables.  You may have noticed that if you use 
> nccopy on your original classic format file to copy your 17GB file to a 
> netCDF-4 file without compression, the result is 24GB.  This was entirely due 
> to bad default chunksizes chosen for the latitude and longitude dimensions.  
> When I fixed the code to select better defaults, the netCDF-4 fie is 
> essentially the same size, 17 GB, with the big variable chunked using 
> chunksizse s809 and 798 for lat and lon instead of 382 and 376.  See the 
> following Jira entry for details:
> 
> https://www.unidata.ucar.edu/jira/browse/NCF-81
> 
> This would have affected the time taken to rechunk the data somewhat, but 
> still doesn't explain why it took so long or hung.  I'm still working on that 
> ...
> --Russ
> 
> > Mark
> >
> >
> > ________________________________________
> > Fra: Unidata netCDF Support [address@hidden]
> > Sendt: 21. juni 2011 19:21
> > Til: Mark Payne
> > Cc: address@hidden
> > Emne: [Support #CUV-251255]: Nccopy extremly slow / hangs
> >
> > Mark,
> >
> > > Thanks for the reply. I attach here a copy of ncdump -c foo.nc.
> > >
> > > I've been playing some more, and have been able to make some progress by 
> > > working in stages:
> > >
> > > nccopy -u -k3 -d0 foo.nc intermediate.nc
> > > nccopy -c time/1,longitude/6,latitude/7 intermediate.nc intermediate2.nc
> > > nccopy -c time/1698,longitude/6,latitude/7 intermediate2.nc 
> > > intermediate3.nc
> >
> > In looking into your problem, I discovered an embarrassing bug in
> > nccopy.  For netCDF-4 input files, it was ignoring the -d option.  This
> > is now fixed in the snapshot, but unfortunately didn't make the 4.1.3
> > release last Friday.  The fix is fairly simple, just deleting a couple of
> > lines in ncdump/nccopy.c, which you can get from
> >
> > http://svn.unidata.ucar.edu/repos/netcdf/trunk/ncdump/nccopy.c
> >
> > I don't think this fix helps your problem, but it may clear up questions
> > about why nccopy doesn't seem to be working as documented on netCDF-4
> > input files when parameters are specified to change the compression or
> > chunking in the output copy.
> >
> > I'm now digging into how to solve your problem, and I'm pretty sure it will
> > involve providing better variable chunk cache parameters for both the input
> > and output variable.  This is a very good example for seeing how using the
> > default chunk cache settings doesn't perform very well on this kind of large
> > multi-dimensional transpose operation.  I'll let you know when I have some
> > progress to report ...
> >
> > --Russ
> >
> > > Its the last line that it stalls on. I have been able to get nccopy to 
> > > run to completion using
> > >
> > > nccopy -c time/2 intermediate2.nc intermediate3.nc
> > >
> > > but time/1698 fails - and even takes the rest of the machine with it 
> > > sometimes! :-(
> >
> > Russ Rew                                         UCAR Unidata Program
> > address@hidden                      http://www.unidata.ucar.edu
> >
> >
> >
> > Ticket Details
> > ===================
> > Ticket ID: CUV-251255
> > Department: Support netCDF
> > Priority: Normal
> > Status: Closed
> >
> >
> 
> Russ Rew                                         UCAR Unidata Program
> address@hidden                      http://www.unidata.ucar.edu
> 
> 

Russ Rew                                         UCAR Unidata Program
address@hidden                      http://www.unidata.ucar.edu



Ticket Details
===================
Ticket ID: CUV-251255
Department: Support netCDF
Priority: Normal
Status: Closed