[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[netCDF #UZM-707371]: Possible problem with zlib on OS X



> Hello,
>
> I have a quick question about netCDF4 on OS X. Specifically, the
> performance of reading netCDF4 files. This appears to be extremely
> slow; in the following (approximate) timings, I am using a GHRSST
> format AMSRE SST file in netCDF3 format (amsre.nc3) and the same file
> converted manually to a netCDF4 file (with zlib on, shuffle on and
> chunking off), called amsre.nc4.
>
> I have BOTH the nc3 and nc4 libraries and utillities installed (so I
> call them ncdump3 / ncdump4....)
>
> Time to ncdump3 amsre.nc3 -> about 1 second
> Time to ncdump4 amsre.nc4 (with zlib off) -> about 2 seconds
> Time to ncdump4 amsre.nc4 (with zlib on) -> 4 MINUTES!!!!!!
> Time to gzip amsre.nc3 -> about 1 second
>
> Clearly, there is a problem with the zlib implementation on my version
> of netCDF4 (4.0.3-beta). It takes about 300 times longer to read the
> netCDF4 file with the ncdump4, than to convert it to netCDF3 and read
> it from there!
>
> I am using netcdf3 (3.6.3), apple OS X 10.5.6 (intel)
>
> I don't doubt that I have just missed something, but just in case, I
> thought I should report this.
>
> Do have any suggestions?
>
> Many thanks
>
> Dave Poulter
>
> David J. S. Poulter, Satellite Oceanographer
> National Oceanography Centre, Southampton
> European Way, Southampton, SO14 3ZH, UK
> Tel: +44(0)23 80596107
> E-mail: address@hidden
>

Howdy David!

I believe you should be getting much better performance.

First, what version of zlib are you using? You can get 1.2.3 here:
ftp://unidata.ucar.edu/pub/netcdf/netcdf-4/zlib-1.2.3.tar.gz. Did you install
zlib, or are you using the system one?

One thing you have missed (probably because we don't make it clear enough the
docs) is that chunking is used, because it is turned on automatically if you
use deflate. If you don't specify chunk sizes, default ones are selected for
you. If you are dealing with very large variables, the defaults are poor. This
has been fixed recently in netCDF-4 code, and getting the daily snapshot will
get you that fix. (You can also test this by explicitly setting the chunksizes
with a call to nc_def_var_chunking after you define the variable.)

We (the netCDF and HDF5 teams) have recently been working on performance
issues, and several important changes have been made to both netCDF-4 and HDF5.
I suggest you grab the netCDF-4 daily snapshot.

You can get the daily snapshot here:
ftp://ftp.unidata.ucar.edu/pub/netcdf/snapshot/netcdf-4-daily.tar.gz

In HDF5 an important performance improvement has also been made, and that is
available in their 1.8.2-post6 release. If changing the chunksizes and getting
the latest netcdf-4 release doesn't help, try upgrading to this HDF5 release to
get their latest fixes.

If you can send me a CDL header dump of your netCDF-4 data file, that would be
helpful. What deflate level are you using?

Please let me know if this doesn't help.

Thanks,

Ed

Ticket Details
===================
Ticket ID: UZM-707371
Department: Support netCDF
Priority: Normal
Status: Open