[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[netCDF #AWT-862217]: nccopy chunking argument



Mark,

> Thanks for the reply. I've attached a script and a cdl file here
> that can (hopefully) reproduce the problem on your machine.

Thanks for doing that, it made it possible to reproduce the problems
pretty quickly here. 

> The script takes the cdl file and uses ncgen to generate a netcdf
> file from it called regen.nc. The file is initially tiny (16K or so)
> but specifies storage worth about 40GB. The variables in this file
> are chunked as date/1,lat/720,lon/1440.
> 
> Then there are two nccopy commands. The first command attempts to do
> the rechunking where all three dimensions are specified (date, lon,
> lat). i.e
> 
> ./nccopy -m 2G -h 2G -e 12000 -c date/1,lon/30,lat/30 regen.nc 
> specify_date_rechunking.nc
> 
> This runs to completion on my machine (8GB RAM) in about 10 minutes
> and produces a 40GB file called specify_date_rechunking.nc, where
> the variables are chunked as date/1,lon/30,lat/30.
> 
> The second command is identical apart from the fact that the date
> rechunking is not explicitly specified ie
> 
> ./nccopy -m 2G -h 2G -e 12000 -c lon/30,lat/30 regen.nc 
> unspecified_date_rechunking.nc
> 
> This command fails to run on my machine - the memory useage of the
> nccopy process explodes and at some point the OS kills it. However,
> the file that is produced, unspecified_date_rechunking.nc, is
> readable with ncdump - when you do this, you can see that the
> variables are chunked as date/5186,lon/30,lat/30 i.e. nccopy has set
> the date dimension chunking to occupy the full variable size, rather
> than sticking to the current chunk size (which may explain the
> memory requirements).
> 
> So, the question is, is it desired behaviour that nccopy changes the
> chunking of dimensions that are not-specified in the re-chunking
> argument?

No, it's not the desired or documented behavior, it was a bug.  I've
fixed it and verified that the fix works for your example, finishing
in about 16 minutes on my machine:

  $ /usr/bin/time nccopy -m 2G -h 2G -e 12000 -c lon/30,lat/30 regen.nc 
unspecified_date_rechunking.nc
  243.79user 138.55system 16:00.66elapsed 39%CPU (0avgtext+0avgdata 
2076912maxresident)k
  8840inputs+86178344outputs (52major+519408minor)pagefaults 0swaps
  $ ncdump -s -h unspecified_date_rechunking.nc | grep _ChunkSizes
                date:_ChunkSizes = 1 ;
                CHL1_mean:_ChunkSizes = 1, 30, 30 ;
                CHL1_flags:_ChunkSizes = 1, 30, 30 ;
                CHL1_error:_ChunkSizes = 1, 30, 30 ;

The fix was pretty small, but I may not get it into the snapshot
today, as I also have to generate a test case that verifies the fix,
and include the necessary test files and auxiliary changes to
Makefiles and such, and I'm out of the office for a week starting
tomorrow.  So here's a patch you can apply locally to ncdump/nccopy.c
to fix the bug:

$ svn diff 
Index: nccopy.c
===================================================================
--- nccopy.c    (revision 1747)
+++ nccopy.c    (working copy)
@@ -774,9 +774,10 @@
                /* Copy all netCDF-4 specific variable properties such as
                 * chunking, endianness, deflation, checksumming, fill, etc. */
                NC_CHECK(copy_var_specials(igrp, varid, ogrp, o_varid));
+           } else {
+               /* Set chunking if specified in command line option */
+               NC_CHECK(set_var_chunked(ogrp, o_varid));
            }
-           /* Set chunking if specified in command line option */
-           NC_CHECK(set_var_chunked(ogrp, o_varid));
            /* Set compression if specified in command line option */
            NC_CHECK(set_var_compressed(ogrp, o_varid));
        }

> That's the first issue. The second issue is why does command two not
> run to completion? However, maybe we should take the first issue
> first, so as to avoid potential confusion....

I don't know the answer to that.  Before the bug fix, it took over my
machine too, and it became so unresponsive that attaching to the
process to debug it was impractical.  A guess is that it was just
thrashing with huge chunk sizes and too little memory, and that it
would have eventually finished (maybe after a week or a month :-) ).

Anyway, thanks for the bug report that helped get this fixed!

--Russ

Russ Rew                                         UCAR Unidata Program
address@hidden                      http://www.unidata.ucar.edu



Ticket Details
===================
Ticket ID: AWT-862217
Department: Support netCDF
Priority: Normal
Status: Closed


NOTE: All email exchanges with Unidata User Support are recorded in the Unidata inquiry tracking system and then made publicly available through the web. If you do not want to have your interactions made available in this way, you must let us know in each email you send to us.