Re: [netcdfgroup] slow reads in 4.4.1.1 vs 4.1.3 for some files

  • To: Dave Allured - NOAA Affiliate <dave.allured@xxxxxxxx>
  • Subject: Re: [netcdfgroup] slow reads in 4.4.1.1 vs 4.1.3 for some files
  • From: "Simon (Vsevolod) Ilyushchenko" <simonf@xxxxxxxxxx>
  • Date: Thu, 15 Dec 2016 08:48:44 -0800
On Wed, Dec 14, 2016 at 9:13 PM, Dave Allured - NOAA Affiliate <
dave.allured@xxxxxxxx> wrote:

> Simon, thanks for the small test file.  That helped.
>
> I reproduced slow reads with netcdf-C 4.4.1 and HDF5 1.8.x on both Mac OS
> and Linux.  The important thing is that I used ncdump as my test reader,
> like you did.  When I substituted a simple fortran program to read the full
> 1.8 Mb data array, read time dropped to under 0.4 seconds.  Program is
> attached.
>
> So I think you have a read cacheing failure, due to interaction between
> the ncdump read pattern, and your chunking scheme.  I think the chunking
> scheme is perfectly reasonable.  However, ncdump may be reading values
> across full rows, meaning that it potentially jumps between several file
> chunks for each data row.  This should work fine, *unless read chunk
> cacheing is not working right* for one reason or another.  Inadequate chunk
> cache size might be the cause.
>
> netcdf qualityFlags {
> dimensions:
>   columns = 4865 ;
>   rows = 3682 ;
> variables:
>   uint quality_flags(rows, columns) ;
>     quality_flags:_ChunkSizes = 891, 1177 ;
>     quality_flags:_DeflateLevel = 2 ;
>     :_Format = "netCDF-4" ;
>
> A sampling tool found that ncdump was spending more than 96% of its time
> inside an HDF5 chunk reader with decompression.  Every time an HDF5 chunk
> is physically read from disk, the *entire* chunk must be decompressed, even
> to access a single value.  You see why chunk cacheing is important.
>
> IMO ncdump should not be sluggish for a small test file, and your
> complaint about change in behavior is valid.  But look in ncdump and chunk
> cacheing for the possible cause, not in the underlying libraries.  I need
> to leave it to those with better understanding of chunk cacheing for the
> final explanation and bug fix.
>

 Dave, thanks for looking into this.

I originally noticed this problem not with ncdump, but with gdal, so the
issue is not limited to just the ncdump tool. It's possible that gdal is
doing something wrong here. Do you have suggestions on debugging it? I can
loop in our gdal maintainer who can navigate the gdal side very well.

Simon
  • 2016 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: