[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 20020805: netCDF file I/O performance (actually, mostly O)



>To: address@hidden
>From: Paul van Delst <address@hidden>
>Subject: netCDF file I/O performance (actually, mostly O)
>Organization: CIMSS@NOAA/NCEP/EMC
>Keywords: 200208052011.g75KBcK27744 netCDF I/O

Hi Paul,

> I don't know if this is the correct email address to ask about this
> (rather than the netcdfgroup list). If you think it's more appropriate
> for the netcdfgroup list, please let me know and I'll send it there.

No, I think this is the most appropriate address to try first.  If we
can't help, feel free to ask the larger mailing list.

> I have been using netCDF for all my (sometimes quite large) datafile
> input and output (v3.5 running on an SGI Origin under Irix 6.5). I have
> noticed that whenever I write out a chunk of data, the wait time on the
> machine just shoots up. I have created the files with the dimensions in
> "fortran" index order and output the data in the same order. To try and
> improve the file writing performance I now open/create the files in
> "share" mode and call nf90_sync() at the end of my write function (as
> opposed to "readwrite" mode with no sync call). There was a perceptible
> improvement but the file output still takes an inordinately long time
> (switching to regular old binary output make the file write almost
> instantaneous).

I'm not sure what you mean by using "fortran order" in creating the
files and for output of the data.  When you write a chunk of data to a
netCDF file using one of the array interfaces, such as
NF_PUT_VARA_REAL or NF_PUT_VAR_REAL, you provide an array address and
the library determines what order to use in writing the values to
disk.  For the Fortran interface, this is with the first dimension
varying fastest, but it's really just the order in which the values
occur in memory, so long contiguous blocks of values are written.

If you are using NF_PUT_VARA_REAL to write one or more "columns" of an
array at once and your column lengths are relatively large, this
should be relatively efficient unless you are writing individual
values with NF_PUT_VAR1_REAL calls, for example, or writing out a
subsampled set of values with NF_PUT_VARS_REAL calls and a stride
other than 1, or using some nontrivial mapping with NF _PUT_VARM_REAL
calls.

Are you writing to an NFS-mounted disk?  There are still some cases
where an NFS-3 server will be slow for an NFS-2 client without some
tuning at the file system level.  Try writing to a local disk and see
if that makes a big difference.  If so, then I would check your NFS
mount parameters.  Here was an example where this made a big
difference for netCDF writes:

  http://www.unidata.ucar.edu/cgi-bin/mfs/70/4533

Your strategy of using "share" mode to open/create files should
actually be slower than not using the "share" mode flag.  That's
because the buffering that's possible when access is not shared speeds
up the I/O, in general, assuming you have enough memory.

In summary, my recommendations would be:
 
  - don't write large arrays 1 or 2 values at a time; use the whole
    array or slice interfaces

  - compare writing to a local disk instead of an NFS-mounted one, to
    see if it makes a large difference, indicating a need for NFS tuning

> Does anyone there have any information or pointers on how to open/write
> netCDF files in as efficient a manner as possible? Right now anything
> would help -  the other users are starting to gang up on me for slowing
> the entire machine down :o(

If the above recommendations don't help, could you provide a small
example that we could use to try to reproduce the slowness you are
seeing, to see if it's platform-specific, or has to do with how you
are accessing the data?

--Russ

_____________________________________________________________________

Russ Rew                                         UCAR Unidata Program
address@hidden                     http://www.unidata.ucar.edu