[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Support #CUV-251255]: Nccopy extremly slow / hangs



Mark,

> That sounds like a very sensible approach - and the improvements that you 
> have made seem to justify the improvement. Just to clarify, you are 
> essentially going to create a new command line option which allows 
> specification of the chunking cache - but this will be different to the 
> memory cache option e.g. the -m 6G below? What is the difference between the 
> two? I assume that there must be some form of optimum memory useage e.g. give 
> 8GB of RAM, how do you distribute the cache between the two?

Yes, good question.  I intended to create a new command line option,
different from the copy buffer option -m, just for the HDF5 global
chunk cache.  This would be used for holding uncompressed chunks of
the output file, which get compressed when they are ejected from the
cache and uncompressed when they are read into the cache.  There may
also be one chunk from the input file in the chunk cache, if the input
is chunked.  That's because nccopy currently reads the input a chunk
at a time (or however much will fit in the -m memory buffer if the
input is not chunked) before copying it to the output file.

I agree it would be better to just overload the -m option to cover
both uses of memory.  nccopy should figure out the best chunk cache
size, tell the HDF5 library to use that, then allocate the rest of
what was specified with -m to be used for the copy buffer.

But to do that, I have to first figure out how to make nccopy figure
out the best chunk cache size.  I'm not sure yet how to do that.  It
seems to depend on both how the input is chunked and the desired
chunking for the output, as well as the shapes of chunks in both the
input and output.  For example, if a set of N input chunks exactly
covers a set of M output chunks, then the best strategy would seem to
be to read in the N input chunks and then write out the corresponding
M output chunks, repeating for the whole file, in which case the ideal
chunk cache would only need to hold N input chunks of one shape plus M
output chunks of a different shape.

For now I can judge whether something is working better by manually
allocating memory space separately to the copy buffer and the chunk
cache, so you should probably treat the chunk cache option
specification as a temporary kludge that may go away before it gets
documented.  If I punt and can't figure out a reliable algorithm, I
may just use the -m space this way:

  - if input chunked, use size of one input chunk for copy buffer and
    rest for chunk cache
  - if input not chunked, use half for copy buffer and half for chunk
    cache

And of course the hardest part of the problem is choosing what letter
to use for the chunk cache option, what with -c and -m already taken
(I'll probably use "-h", becasue "chunk cache" has 2 "h"'s :-) ) ...

If you have any better ideas, I'm open to suggestions!

--Russ

> Hi Mark,
> 
> I've created another Jira issue for this problem, which you can track here:
> 
> https://www.unidata.ucar.edu/jira/browse/NCF-85
> 
> Currently by manually adding a specification to use 4GB of chunk cache
> to nccopy, it's possible to rechunk for time series access in under 10
> minutes.  This is from the original 17.5GB netCDF-3classic file with
> time being a record dimension (but works just as well if time is a
> fixed-size dimension) to a 203MB compressed netCDF-4 classic model file,
> using made-up data:
> 
> ./nccopy -d1 -k4 -m 6G -c time/1698,latitude/7,longitude/6 classic.nc 
> reshaped.nc
> 
> With real data that's less compressible I expect your rechunking will
> take somewhat longer, but I can't estimate how much.
> 
> Although eventually I'd like to modify nccopy to figure out a good
> size for the chunk cache without need for user guidance, for now I'm
> adding an option so you'll be able to specify the size of chunk cache
> to use, so you can experiment with what works best for your data.
> 
> I'll let you know when a modified nccopy.c will be available with the
> changes to try out.  I'm be interested in whether you will get similar
> results to the above with real data.
> 
> --Russ
> 
> > > Thanks for filling me in - finding a bug is not such a bad thing! :-) I'm 
> > > currently away on holiday the next week or two, so I won't be able to try 
> > > it for a while. But please let me know how things process with the rest 
> > > of it .
> >
> > Your example revealed another bug in the way the netCDF-4 library chooses 
> > default chunk sizes for variables.  You may have noticed that if you use 
> > nccopy on your original classic format file to copy your 17GB file to a 
> > netCDF-4 file without compression, the result is 24GB.  This was entirely 
> > due to bad default chunksizes chosen for the latitude and longitude 
> > dimensions.  When I fixed the code to select better defaults, the netCDF-4 
> > fie is essentially the same size, 17 GB, with the big variable chunked 
> > using chunksizse s809 and 798 for lat and lon instead of 382 and 376.  See 
> > the following Jira entry for details:
> >
> > https://www.unidata.ucar.edu/jira/browse/NCF-81
> >
> > This would have affected the time taken to rechunk the data somewhat, but 
> > still doesn't explain why it took so long or hung.  I'm still working on 
> > that ...
> > --Russ
> >
> > > Mark
> > >
> > >
> > > ________________________________________
> > > Fra: Unidata netCDF Support [address@hidden]
> > > Sendt: 21. juni 2011 19:21
> > > Til: Mark Payne
> > > Cc: address@hidden
> > > Emne: [Support #CUV-251255]: Nccopy extremly slow / hangs
> > >
> > > Mark,
> > >
> > > > Thanks for the reply. I attach here a copy of ncdump -c foo.nc.
> > > >
> > > > I've been playing some more, and have been able to make some progress 
> > > > by working in stages:
> > > >
> > > > nccopy -u -k3 -d0 foo.nc intermediate.nc
> > > > nccopy -c time/1,longitude/6,latitude/7 intermediate.nc intermediate2.nc
> > > > nccopy -c time/1698,longitude/6,latitude/7 intermediate2.nc 
> > > > intermediate3.nc
> > >
> > > In looking into your problem, I discovered an embarrassing bug in
> > > nccopy.  For netCDF-4 input files, it was ignoring the -d option.  This
> > > is now fixed in the snapshot, but unfortunately didn't make the 4.1.3
> > > release last Friday.  The fix is fairly simple, just deleting a couple of
> > > lines in ncdump/nccopy.c, which you can get from
> > >
> > > http://svn.unidata.ucar.edu/repos/netcdf/trunk/ncdump/nccopy.c
> > >
> > > I don't think this fix helps your problem, but it may clear up questions
> > > about why nccopy doesn't seem to be working as documented on netCDF-4
> > > input files when parameters are specified to change the compression or
> > > chunking in the output copy.
> > >
> > > I'm now digging into how to solve your problem, and I'm pretty sure it 
> > > will
> > > involve providing better variable chunk cache parameters for both the 
> > > input
> > > and output variable.  This is a very good example for seeing how using the
> > > default chunk cache settings doesn't perform very well on this kind of 
> > > large
> > > multi-dimensional transpose operation.  I'll let you know when I have some
> > > progress to report ...
> > >
> > > --Russ
> > >
> > > > Its the last line that it stalls on. I have been able to get nccopy to 
> > > > run to completion using
> > > >
> > > > nccopy -c time/2 intermediate2.nc intermediate3.nc
> > > >
> > > > but time/1698 fails - and even takes the rest of the machine with it 
> > > > sometimes! :-(
> > >
> > > Russ Rew                                         UCAR Unidata Program
> > > address@hidden                      http://www.unidata.ucar.edu
> > >
> > >
> > >
> > > Ticket Details
> > > ===================
> > > Ticket ID: CUV-251255
> > > Department: Support netCDF
> > > Priority: Normal
> > > Status: Closed
> > >
> > >
> >
> > Russ Rew                                         UCAR Unidata Program
> > address@hidden                      http://www.unidata.ucar.edu
> >
> >
> 
> Russ Rew                                         UCAR Unidata Program
> address@hidden                      http://www.unidata.ucar.edu
> 
> 
> 
> Ticket Details
> ===================
> Ticket ID: CUV-251255
> Department: Support netCDF
> Priority: Normal
> Status: Closed
> 
> 

Russ Rew                                         UCAR Unidata Program
address@hidden                      http://www.unidata.ucar.edu



Ticket Details
===================
Ticket ID: CUV-251255
Department: Support netCDF
Priority: Normal
Status: Closed