[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[netCDF #PBW-682100]: nccopy deflation vs API deflation



Nick,

I meant to include a link to the issue-tracker report on that nccopy bug, 
giving more details about it, that
it also fixed use of -c chunking options for netCDF-4 files:

  https://www.unidata.ucar.edu/jira/browse/NCF-79

Your bug report reminded me that this was a fairly serious bug that we should 
highlight in the upcoming
4.2 release notes.  I'm also adding it to the list of known problems with the 
4.1.3 release:

  http://www.unidata.ucar.edu/software/netcdf/docs/known_problems.html

Thanks again for reporting this and providing sample files!

--Russ

> > I've put some sample data files at:
> > http://www2.epcc.ed.ac.uk/~njohnso1/netcdf/
> >
> > validation2_nocomp.nc is from my (serial) code with no deflation.
> > validation2_apicomp.nc is from the same code with deflation enabled.
> > validation2_nccopyd9.nc is validation2_nocomp.nc after it's been processed 
> > with nccopy -d9 -s
> > validation2_mycopyd9.nc is validation2_nocomp.nc after it's been processed 
> > with my own test compressor which copies the
> > data to a new file where the variable has deflation enabled.
> 
> Thanks for those samples.  I tried to reproduce the problem here, and 
> discovered that the
> compression works fine with the current 4.2-rc1 release, then that it failed 
> with the 4.1.3
> netCDF release, so it was evidently a bug fixed in the interim.  Then I found 
> that *I* had
> fixed the bug but forgotten about it:
> 
> http://www.unidata.ucar.edu/support/help/MailArchives/netcdf/msg10259.html
> 
> so sorry for not noticing this earlier.  Anyway, the compression with nccopy 
> -d9 now works
> fine on your file, and if you don't want to get and build the 4.2 rc1 release 
> candidate just to
> get this bug fix, you should be able to just get the nccopy.c mentioned in 
> the above support
> email to replace your current nccopy.c and rebuild that.
> 
> Just out of curiosity, I also tried other compression levels with nccopy and 
> found that -d 5
> worked better for compressing your sample file than any other level, though 
> that may not
> be the case with other data:
> 
> support$ for i in 1 2 3 4 5 6 7 8 9; do
> nccopy -d $i validation2_nocomp.nc validation2_nccopy_d$i.nc; ls -l 
> validation2_nccopy_d$i.nc
> done
> -rw-rw-r-- 1 russ ustaff 1226706 Feb 26 10:06 validation2_nccopy_d1.nc
> -rw-rw-r-- 1 russ ustaff 1217933 Feb 26 10:06 validation2_nccopy_d2.nc
> -rw-rw-r-- 1 russ ustaff 1208676 Feb 26 10:06 validation2_nccopy_d3.nc
> -rw-rw-r-- 1 russ ustaff 1184699 Feb 26 10:06 validation2_nccopy_d4.nc
> -rw-rw-r-- 1 russ ustaff 1013395 Feb 26 10:06 validation2_nccopy_d5.nc
> -rw-rw-r-- 1 russ ustaff 1017296 Feb 26 10:06 validation2_nccopy_d6.nc
> -rw-rw-r-- 1 russ ustaff 1026669 Feb 26 10:06 validation2_nccopy_d7.nc
> -rw-rw-r-- 1 russ ustaff 1046570 Feb 26 10:06 validation2_nccopy_d8.nc
> -rw-rw-r-- 1 russ ustaff 1049421 Feb 26 10:06 validation2_nccopy_d9.nc
> 
> --Russ
> 
> > On 22/02/12 05:51, Unidata netCDF Support wrote:
> > > Hi Nick,
> > >
> > >> I am running some tests with a code I am converting from using a flat
> > >> file to netcdf/hdf5. I am using the parallel MPIIO access mode so unable
> > >> to use the deflation calls via the API. I thought I would use nccopy
> > >> -d9 as a post process on my files to compress them and therefore get
> > >> some space saving whilst still retaining the ability to do a parallel
> > >> read in other related codes.
> > >>
> > >> However, I find that I get quite poor compression using nccopy, much
> > >> worse than I get if I use the API call. In some cases, nccopy -d9 gives
> > >> little or no compression whilst using the API gives me 4-5x compression.
> > >>
> > >> Is this something you would expect or am I missing something critical
> > >> in this case?
> > >
> > > No, you should expect exactly the same compression using nccopy as with 
> > > the API calls.
> > > nccopy calls the API for each variable in the file with whatever 
> > > compression level you
> > > specify.  The API calls are somewhat more flexible, in that you can 
> > > specify a differnt level
> > > of compression (or no compression) for each variable separately, but if 
> > > you use the same
> > > compression for every variable, there should be no differencce.
> > >
> > > If you are seeing something different, it sounds like a bug.  Can you 
> > > provide a sample
> > > file that we could use to reproduce the problem and diagnose the cause?
> > >
> > > --Russ
> > >
> > > Russ Rew                                         UCAR Unidata Program
> > > address@hidden                      http://www.unidata.ucar.edu
> > >
> > >
> > >
> > > Ticket Details
> > > ===================
> > > Ticket ID: PBW-682100
> > > Department: Support netCDF
> > > Priority: Normal
> > > Status: Closed
> > >
> > >
> >
> > --
> > The University of Edinburgh is a charitable body, registered in
> > Scotland, with registration number SC005336.
> >
> >
> Russ Rew                                         UCAR Unidata Program
> address@hidden                      http://www.unidata.ucar.edu
> 
> 
Russ Rew                                         UCAR Unidata Program
address@hidden                      http://www.unidata.ucar.edu



Ticket Details
===================
Ticket ID: PBW-682100
Department: Support netCDF
Priority: Normal
Status: Closed