Re: [netcdfgroup] read performance slow compared to netCDF on other systems

Is this writing a netcdf-3 file or a netcdf-4 file?
=Dennis Heimbigner
 Unidata

On 11/10/2016 2:01 PM, Liam Forbes wrote:
Hello! We are installing netCDF 4.4.1 w/ HDF5 1.8.17 on our new Intel
based cluster. We've noticed the read performance on this cluster using
ncks is extremely slow compared to a couple other systems. For example,
parsing a file on our lustre 2.1 based filesystem takes less 8 seconds
on our Cray XK6-200m. Parsing the same file on the same filesystem on
our new cluster is taking 30+ seconds, with most of that time apparently
spent reading in the file.

Cray (hostname fish):
fish1:lforbes$ time ncks test.nc <http://test.nc> out.nc <http://out.nc>

real    0m4.804s
user    0m3.180s
sys    0m1.300s

Cluster (hostname chinook):
n0:loforbes$ time ncks mod.nc <http://mod.nc> out.nc <http://out.nc>

real    0m32.435s
user    0m29.240s
sys    0m1.936s

As part of trying to figure out what's going on, I strace'ed the process
on both systems. One thing that jumps out at me is that the process
running on a compute node on our new cluster is executing a _lot_ more
brk() calls to allocate additional memory than on a login node of our
Cray, at least 8 times as many in one test comparison (strace output
files are available). I'm not sure if this means anything, or how I can
impact this behaviour.

I've tried recompiling NetCDF on our new cluster a variety of ways,
stripping out features like szip and enabling others like MMAP, but none
of the changes have impacted the performance.

Based on what I've seen googling and reading through the mail list
archives, I've also tried using `ncks --fix_rec_dmn` to generate a new
version of the input file (which is just over 650MBs) with a limited
time dimension.

chinook01:loforbes$ ncdump -k test.nc <http://test.nc>
netCDF-4
chinook01:loforbes$ ncdump -k mod.nc <http://mod.nc>
netCDF-4
chinook01:loforbes$ ncdump -s test.nc <http://test.nc> | head
netcdf test {
dimensions:
    time = UNLIMITED ; // (21 currently)
    nv = 2 ;
    x = 352 ;
    y = 608 ;
    nv4 = 4 ;
variables:
    double time(time) ;
        time:units = "seconds since 1-1-1" ;
chinook01:loforbes$ ncdump -s mod.nc <http://mod.nc> | head
netcdf mod {
dimensions:
    time = 21 ;
    y = 608 ;
    x = 352 ;
    nv4 = 4 ;
    nv = 2 ;
variables:
    float basal_mass_balance_average(time, y, x) ;
        basal_mass_balance_average:units = "kg m-2 year-1" ;

This also didn't seem to make a difference.

Unfortunately, as the cluster administrator, my NetCDF knowledge is very
limited. The test file was provided by the researcher reporting this
problem. What he is experiencing is a significant application slow down
due to this issue occurring every time step when he reads/writes files.
It more than doubles the run time, making our new cluster unusable to
him. I don't think anything is necessarily "broken" with NetCDF, but I'm
not sure what further diagnostics to attempt or if there are other
changes to the input file I and the researcher should try. Any help
would be appreciated. Thank you.

--
Regards,
-liam

-There are uncountably more irrational fears than rational ones. -P. Dolan
Liam Forbes  loforbes@xxxxxxxxxx <mailto:loforbes@xxxxxxxxxx>  ph:
907-450-8618 fax: 907-450-8601
UAF Research Computing Systems Senior HPC Engineer  LPIC1, CISSP


_______________________________________________
NOTE: All exchanges posted to Unidata maintained email lists are
recorded in the Unidata inquiry tracking system and made publicly
available through the web.  Users who post to any of the lists we
maintain are reminded to remove any personal information that they
do not want to be made public.


netcdfgroup mailing list
netcdfgroup@xxxxxxxxxxxxxxxx
For list information or to unsubscribe,  visit: 
http://www.unidata.ucar.edu/mailing_lists/




  • 2016 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: