Re: [netcdfgroup] slow reads in vs 4.1.3 for some files

The interactions between two independent caches can cause
problems. I should look at the netcdf cache and see how it interacts
with the hdf5 cache.
=Dennis Heimbigner

On 12/15/2016 6:03 PM, Dave Allured - NOAA Affiliate wrote:
On Thu, Dec 15, 2016 at 4:46 PM, Chris Barker <chris.barker@xxxxxxxx
<mailto:chris.barker@xxxxxxxx>> wrote:

    On Thu, Dec 15, 2016 at 1:00 PM, dmh@xxxxxxxx <mailto:dmh@xxxxxxxx>
    <dmh@xxxxxxxx <mailto:dmh@xxxxxxxx>> wrote:

        1. Adding this feature to ncdump also requires adding
           it to the netcdf-c library API. But providing some means
           for client programs to pass thru parameter settings to the
        hdf5 lib
           seems like a good idea.

    absolutely! that would be very helpful.


This may be premature.  The netcdf API already has its own chunk cache
with at least two functions to adjust tuning parameters.  It seems to me
that the netcdf facility would probably handle the current ncdump and
gdal cases nicely, though I have not tested it.  Please see this
relevant documentation:

Simon, you might want to ask your gdal maintainer to give this a try.
If it works, it should be simple and robust.  I would suggest increasing
the per-variable chunk size to at least 5 uncompressed
chunks, and probably more.  5 is the number of chunks that span a single
row for this particular file.  This advice presumes that your typical
read pattern is similar to ncdump, which I speculate is first across
single whole rows, as I said earlier.

  columns = 4865 ;
  rows = 3682 ;
  uint quality_flags(rows, columns) ;
    quality_flags:_ChunkSizes = 891, 1177 ;

5 x 891 x 1177 x 4 bytes per uint uncompressed ~= 21 Mbytes

Note this is likely to be a little larger than the default cache size in
the current netcdf-C library, thus explaining some of the slow read

You might also consider rechunking such data sets to smaller chunk
size.  Nccopy and ncks can do that.  Rechunking may depend on your
anticipated spatial read patterns, so give that a little thought.

You might also consider reading the entire grid in a single get_vara
call to the netcdf API.  That is what my fast fortran test program did.
A naive reader that, for example, loops over single rows may incur bad
cache activity that could be avoided.


NOTE: All exchanges posted to Unidata maintained email lists are
recorded in the Unidata inquiry tracking system and made publicly
available through the web.  Users who post to any of the lists we
maintain are reminded to remove any personal information that they
do not want to be made public.

netcdfgroup mailing list
For list information or to unsubscribe,  visit: