Re: [netcdfgroup] Strided reads slow

In thinking about this, the only partial solution
I can think of at the moment is to do internally in the library
that which you appear to already be doing, namely
reading chunks and doing your own striding.
This would work for small strides (say < 8?)
but would require allocating a chunk of memory
internally of size 8*element-size*n, where n is
the number of stride elements to get at one time.
It might keep the external interface simple while
providing a speed up of some amount, but does not
really solve the underlying problem that reading
individual elements from a netcdf-3 or netcdf-4/HDF5 file
is slow.
=Dennis Heimbigner
 Unidata


Peglar, Patrick wrote:
Hi

I just thought I'd ask the world in general whether other people are having 
trouble with this.

I was contacted for an internal support issue by someone getting very slow 
reading performance from large Netcdf4 files.
He was doing "strided" access to a variable (i.e. reading 1-of-every-N points).
I produced a simple C api testcase, which reads all of a 1M float array in 
about 2 mSecs, but takes nearly 4 seconds to load every-other-point (stride=2).

This has already been discussed with the dev team, who replied variously...
   -----Original Message-----
   From: Unidata netCDF Support [mailto:support-netcdf@xxxxxxxxxxxxxxxx]
   Sent: 09 August 2013 21:57
   To: Peglar, Patrick
   Cc: support-netcdf@xxxxxxxxxxxxxxxx
   Subject: [netCDF #ZFB-587742]: Reading variable with strides very slow

   Patrick,

   This turns out to be a known problem with HDF5 performance:

     
http://mail.lists.hdfgroup.org/pipermail/hdf-forum_lists.hdfgroup.org/2012-November/006195.html

   --Russ

(from older discussions ..)
   > > > Patrick-
   > > >
   > > > Vars in netcdf is inherently slow
   > > > (when stride > 1) because it cannot
   > > > easily make use of bulk read operations.
   > > > So the library must read element by element
   > > > from the underlying disk storage. This has
   > > > a noticeable effect on performance. This is not
   > > > easy to fix because it must do the read using only
   > > > the memory that is passed to it by the client.
   > > >
   > > > For netcdf versions before 4.3.0 (including 4.1.3)
   > > > there was an additional factor. For historical
   > > > reasons, vars was implemented in terms of varm
   > > > so there was some additional overhead.
   > > >
   > > > If you upgrade to 4.3.0, you will see some performance
   > > > improvement but not, probably, enough to solve your problem.
   > > >
   > > > Sorry I do not have better news.
   > > > =Dennis Heimbigner
   > > >  Unidata
   > >
   > > On the netcdf-3 vs netcdf-4 issue I can at the moment
   > > only speculate. As a rule, reading small quantities of data
   > > with netcdf-4 is always slower than netcdf-3 because the
   > > underlying HDF5 file format is based on b-trees rather than the
   > > linear disk layout of netcdf-3. Since vars reads a single
   > > element at a time, that overhead can, I suspect, be significant.
   > > I am, however surprised that it is as large as you show.
   > >
   > > =Dennis Heimbigner
   > >  Unidata
   > >
   > In this case, no b-trees are involved, because the data storage is
   > contiguous, not chunked (according to ncdump -h -s).  So I'm
   > surprised how slow the strided netCDF access is, and suspect there
   > might be a performance bug in how netCDF-4 uses the HDF5 API for
   > strided access.

   Russ Rew                                         UCAR Unidata Program
   russ@xxxxxxxxxxxxxxxx                      http://www.unidata.ucar.edu


Our original usecase is constrained by memory space limitations.
Obviously, workarounds are possible, but all a bit awkward.

It seems it is not yet clear that the HDF5 problem alone can explain the 
magnitude of the problem, so I think there may still be more to learn about 
this.

The question is, does this really need addressing
-- so, is anyone else having serious problems with this ?

Regards
Patrick
--
Patrick Peglar  AVD Team Software Engineer
Analysis, Visualisation and Data Team  http://www-avd/
Tel: +44 (0)1392 88 5748
Email: patrick.peglar@xxxxxxxxxxxxxxxx<mailto:patrick.peglar@xxxxxxxxxxxxxxxx>
Met Office  Fitzroy Road  Exeter  EX1 3PB  
web:www.metoffice.gov.uk<http://www.metoffice.gov.uk>





------------------------------------------------------------------------

_______________________________________________
netcdfgroup mailing list
netcdfgroup@xxxxxxxxxxxxxxxx
For list information or to unsubscribe, visit: http://www.unidata.ucar.edu/mailing_lists/



  • 2013 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: