[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[netCDF #UAU-670796]: Rechunking of a huge NetCDF file



Hi Henri,

Sorry to have been unresponsive on your questions, I've been away for
about a week, and it will take some time to move through the backlog
of support questions.  I can answer a couple of your questions now,
but the rest will take some time ...

> > However, your example may be a good test case for the "-r" flag as
> > well as the "-w" flag, if you have enough memory for both an input and
> > output copy of the file in memory.  That would read the entire file
> > into memory, write a transposed version in memory, and then write that
> > out when the file is closed.
> 
> The “-r” flag didn’t make a difference for a small test file, but I’ll have 
> to try it with a bigger one.

I never found a case in which the "-r" flag saved time, but was hoping
it might work for your extreme rechunking case.

> > Your example is an important use case for nccopy to do well, so I'd
> > like to see if we can help make it work in a practical amount of time.
> > For that, nccopy needs one more feature (we're running out of flags!)
> > to provide verbose output that allows you to monitor progress and how
> > long it takes.  I've just added this to our issues tracking system,
> > but I'm afraid it won't get implemented right away:
> >
> >  https://bugtracking.unidata.ucar.edu/browse/NCF-285
> 
> Excellent. Why not a more verbose flag like “—verbose”? :)

The "-v" flag is already in use for providing a list of variables to be
copied.  Currently, ncdump, ncgen, and nccopy only support single letter
options, but we may have to change that as we run out of letters.

> > For now, I suggest just adding a print statement to the
> > ncdump/nccopy.c source to copy_var_data().
> 
> Good, this seems to work (had to declare the varname as well), although not 
> very useful with the small test file.
> 
> >
> >> The original data is in classic NetCDF3 format.
> >>
> >> As per your suggestion, I tried different chunk sizes as well, but
> >> it seems nccopy crashes for anything bigger than 3 as the lengths
> >> for lat/lon:
> >>
> >> nccopy -c time/10000,lat/2,lon/4 input.nc output.nc
> >> NetCDF: HDF error
> >> Location: file nccopy.c; line 1437
> >>
> >> netcdf library version 4.3.0 of Jan 17 2014 13:23:48
> >>
> >> The same happens on 3 different platforms (same NetCDF version), and
> >> with netcdf library version 4.2.1.1.
> >
> > That's a new bug that will require a way for us to reproduce it to fix
> > it.  The fact that it says "HDF error" is puzzling; maybe there's an
> > HDF rule about chunk lengths that I'm not aware of.  Again, this is an
> > important use case, so we should do whatever is required to fix it.
> 
> I put my small (18MB) test file here:
> 
> http://www.atm.helsinki.fi/~vuolleko/misc/small.nc
> 
> This file is the result of several operations (like merge and mergetime) with 
> CDO, so it’s of course possible that my problems originate from there.

Thanks, I downloaded the file and see the problem.  In small.nc, you
have lon a dimension of size 3, but you are specifying rechunking
along that dimension with lon/4, specifying a chunk size larger than
the fixed dimension size.  That's apparently something we should check
in nccopy.

As a workaround, if you change this to lon/3, the nccopy completes
without error.  This should be an easy fix, which will be in the next
release.

> >> I just came across the ncpdq utility, but apparently it’s unrelated
> >> to chunking and doesn’t provide huge perfomance benefits?
> >
> > It's also unrelated to Unidata, as it's a utility that's part of the
> > NCO (netCDF Operators) package developed and maintained by Charlie
> > Zender and his group at Univ. of California at Irvine.  It would
> > provide the same performance benefits, and might be a better way to
> > go, as it only has to fit each variable in memory, rather than all the
> > variables at once.  If you choose to try it for this problem, I'd be
> > very interested in knowing the results!
> 
> Ok, here’s something so far, done locally on my MacBook Air (SSD), starting 
> with the test file in classic NetCDF3 format:
> 
> ***********************
> nccopy -k 3 small.nc small_k3.nc
> 
> nccopy -w -c time/99351,lat/1,lon/1 small.nc small_T.nc  10.27s user 0.06s 
> system 99% cpu 10.358 total
> 
> nccopy -w -c time/99351,lat/1,lon/1 small_k3.nc small_k3_T.nc  19.28s user 
> 1.24s system 99% cpu 20.560 total
> 
> ncpdq -a time,lon,lat small_k3.nc small_k3_p.nc  10.24s user 5.15s system 99% 
> cpu 15.507 total
> 
> ls -lh
> total 305832
> -rw-------  1 vuolleko  ATKK\hyad-all    18M Jan 30 16:58 small.nc
> -rw-------  1 vuolleko  ATKK\hyad-all    18M Jan 30 17:02 small_T.nc
> -rw-------  1 vuolleko  ATKK\hyad-all    48M Jan 30 16:59 small_k3.nc
> -rw-------  1 vuolleko  ATKK\hyad-all    18M Jan 30 17:03 small_k3_T.nc
> -rw-------  1 vuolleko  ATKK\hyad-all    48M Jan 30 17:04 small_k3_p.nc
> ***********************
> 
> The differences in size are interesting: the default chunking for netCDF4 
> (small_k3) is quite bad for storage. Anyways, the ncpdq does appear faster 
> here, given that it’s operating on a bigger file.
> 
> Then some timings using netcdf4-python:
> 
> ***********************
> In [1]: import netCDF4
> In [2]: f1=netCDF4.Dataset('small.nc','r')
> In [3]: f2=netCDF4.Dataset('small_k3.nc','r')
> In [4]: f3=netCDF4.Dataset('small_T.nc','r')
> In [5]: f4=netCDF4.Dataset('small_k3_T.nc','r')
> In [6]: f5=netCDF4.Dataset('small_k3_p.nc','r')
> In [7]: timeit temp=f1.variables['temp'][:,1,1]
> 10 loops, best of 3: 20.9 ms per loop
> 
> In [8]: timeit temp=f2.variables['temp'][:,1,1]
> 1 loops, best of 3: 1.16 s per loop
> 
> In [9]: timeit temp=f3.variables['temp'][:,1,1]
> 1000 loops, best of 3: 1.71 ms per loop
> 
> In [10]: timeit temp=f4.variables['temp'][:,1,1]
> 1000 loops, best of 3: 1.7 ms per loop
> 
> In [11]: timeit temp=f5.variables['temp'][:,1,1]
> 1 loops, best of 3: 1.12 s per loop
> ***********************
> 
> From here it’s clear that the properly rechunked files (*_T.nc) are way 
> faster than others, and that the reordering by ncpdq is useless for this 
> purpose. I found similar observations here:
> 
> http://stackoverflow.com/questions/19936432/faster-reading-of-time-series-from-netcdf
> 
> But why is there such a huge decrease in reading performance when a file is 
> converted to netCDF4?

Small chunks cause lots of overhead in HDF5, but I'm not sure whether
that's the problem.  I'll have to look at this more closely and
respond when I've had a chance to see what's going on.

--Russ

Russ Rew                                         UCAR Unidata Program
address@hidden                      http://www.unidata.ucar.edu



Ticket Details
===================
Ticket ID: UAU-670796
Department: Support netCDF
Priority: High
Status: Closed