[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[netCDF #VUP-836503]: Netcdf slow performance with large block size.

Hi Si,

> We are meeting another severe netcdf performance when preparing for
> our new Yellowstone machine.
> As you know, the new Glade file system will have 2M or 4M block size
> and it looks like this
> really raises some performance issues.
> Here is a typical case our CESM group writes/uses:
> status = nf90_inq_varid(ncid, "time", varid)
> status = nf90_get_var(ncid, varid, data)
> The nf90_get_var command is very very slow and inefficient when "time"
> is "unlimited".
> In my test case, "time" is an unlimited variable with 248 values:
> time = UNLIMITED ; // (248 currently)
> I used "strace" on the executable and noticed that "lseek" and "read"
> are called 248+ times(one for each dimension),
> because it is an unlimited variable. If it is not an unlimited
> variable, "lseek" and "read" will be called only twice.
> This is not a big deal when the block size of the file system is
> small(4k e.g.),
> but this takes a significant long time when the block size is large(2M
> or 4M).
> My question is: Is that possible to modify the nf90_get_var or related
> program in netcdf so you do not need to
> do so many "lseek" and "read" to improve the performance? My feeling
> is maybe you can do most of the job in memory? Just my guess.
> You must have better ideas and suggestions.

This is the same performance problem reported in an earlier support
ticket, for which support responses are here:


and the NCO forum thread here:


There is an additional workaround now, if you're willing to test the
netCDF snapshot release, soon to be version 4.2.1, which I list as
number 6 below:

1.  Data writer: Don't use the unlimited dimension if not needed.

2.  Data writer: Make sure the record size for each variable is at
    least as big as large as the disk block size where it will be

3.  Data Reader: Convert record-oriented data to use only fixed size
    dimensions before using it in processing.  There's an NCO operator
    for this
       ncks --fix_rec_dmn in.nc out.nc
    or you can use nccopy (version 4.2 or current snapshot for -w
       nccopy -u in.nc out.nc
    or faster, if there is enough memory for the output file:
       nccopy -w -u in.nc out.nc

4.  If processing multiple record variables, read input a record at a
    time instead of a variable at a time, processing all the record
    variables after each record has been read.  This has already been
    done for nccopy and for some NCO operators.

5.  Convert record-oriented netCDF-3 data to netCDF-4 classic model
    files (or regular netCDF-4 files), using chunk sizes that are less
    than or equal to a small multiple of the disk block size. The
    nccopy utility may be used for this purpose

6.  Use the new "NC_DISKLESS" option when opening the record-oriented
    file, assuming you have enough memory to hold the file.  This will
    read the whole file into memory on open, after which reads will be
    fast and will not depend on the disk block size.  This is in the
    current snapshot release and will be in version 4.2.1.

Note that we haven't incorporated the NC_DISKLESS functionality into
the Fortran APIs yet, but this should be relatively easy after we have
released the 4.2.1 C netCDF library.

> I created a test case for you, which can be reached at
> /glade/home/siliu/DAV/NCTest.
> The directory includes:
> 1) Two netcdf data file: unlimitedD.nc and limitedD.nc
> The only difference is that time is an unlimited dimensional variable
> in the first one and is a limited dimensional one in the second.
> ncdump -h will show it.
> 2) unlimited.f90 and limited.f90
> Two Fortran programs that call netcdf function nf90_get_var and print
> out the time cost.
> 3) unlimited.exe and limited.exe
> The executable files built from unlimited.f90 and limited.f90
> I am trying to get rpath set for you so you can run the executable
> directly.
> In case you can not run them directly, the environment settings and
> compiler commands can be found inside the file runme-4.2.
> 4) limited_512k.trace  unlimited_512k.trace
> The tracking results with "strace" command. It shows "lseek" and
> "read" are called for each time step for the unlimited case.
> Please let me know if you have any questions I may answer.
The main question is the use case in which the performance problem

If the reading program currently accesses record variables one at a
time but needs to access all or most of the record variables, it may
be worth it to modify the reading program to access data a record at a
time instead of a variable at a time, if you want to stick to using
netCDF-3 format.

If the file is written once but read many times, with variables
extracted from it one at a time, then it makes sense to convert the
file to not use record variables for the convenience of readers.  This
just involves netCDF-3 format also.

If the file is written once but potentially read many times, sometimes
accessing all the data in a single record and other times accessing
all the data in a single variable, then it makes sense to convert the
file to netCDF-4 classic chunked, which can support either kind of
access efficiently.  This requires that the reading data be linked
against the netCDF-4 library so it can handle chunked data.

If the file is written once and single variables are read from it once
or only a few times, it may make sense to modify the writing program
to write the data without using an unlimited dimension, or as a
netCDF-4 classic format file.

I don't think there's any reason to duplicate the timings you have
observed, as the cause is clear, and the possible workarounds I know
about are listed above.  There is no netCDF-3 library modification
that can cure this problem, it is inherent in the netCDF-3 format and
the trade-offs involved in implementing an unlimited dimension versus
fixed-size dimensions.  The netCDF-4 format uses a different set of
engineering trade-offs involving a more complicated format and API
that provides better performance in some cases and worse in others.

I plan to write a user document on the performance issues involved in
this problem with similar recommendations to above for data providers,
developers of data access software, and users puzzled about why some
access patterns are slow.


Russ Rew                                         UCAR Unidata Program
address@hidden                      http://www.unidata.ucar.edu

Ticket Details
Ticket ID: VUP-836503
Department: Support netCDF
Priority: Normal
Status: Closed

NOTE: All email exchanges with Unidata User Support are recorded in the Unidata inquiry tracking system and then made publicly available through the web. If you do not want to have your interactions made available in this way, you must let us know in each email you send to us.