[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 20040407: NetCDF I/O Problem



>To: address@hidden
>From: "Kevin W. Thomas" <address@hidden>
>Subject: NetCDF Problem
>Organization: OU/CAPS
>Keywords: 200404071606.i37G6MCT022491 netCDF I/O performance

Hi Kevin,

> We've run into a netCDF problem that is causing us nightmares.  We have in
> house code that we've developed that writes input files to WRF.  It writes out
> a variety of 1D, 2D, and 3D variables.  I/O is always fine, until something
> happens.  The "something" is like a light switch.  I/O performance 
> deteriorates
> quite badly.  95% of the file is written in about 5 minutes.  The last 5% make
> take half an hour.  "Truss" shows lots of seeks/reads/writes going on.  We're
> talking files on the order of 1.2gb.  We're running netCDF 3.5.1.  The machine
> that we're using an an IBM Regatta, with a 64-bit binary.  I've seen this
> slowness on other machines, including PSC's RACHEL, so I don't think this
> is unique to the machine that I'm using.
> 
> From what I've seen in the Unidata web pages for netCDF, NFS could be an 
> issue.
> NFS is used on all systems.  Due to the size of the files, using local disk
> isn't an option.
> 
> If there is anyone who can help us, it would be greatly appreciated.  We're
> going to be running WRF forecasts for the SPC.  This slowness is a serious
> problem, as it delays the start of the forecast runs.

We would like to help, but we don't have much experience on IBM
Regattas.  Here's a few questions about this problem:

 - Is it easily reproducible, that is, does it always happen in the
   same place for the same inputs or does it seem to depend on other
   factors (load on computer, order in which things execute, phase of
   the moon, etc.)?

 - Do you know what netCDF function is being called for output when
   you see the slowdown?

 - Is there any other clue to what the "something" is that suddenly
   causes performance to deteriorate?

 - Are other kinds of I/O similarly slow when the netCDF I/O slows
   down?  Can you run some sort of I/O benchmark at the same time that
   would help tell whether it's an NFS problem or a netCDF problem?

It would help to be able to duplicate the problem here to diagnose
what's going on.  That would involve writing a small program that
reliably demonstrates the performance problem, which sounds like it
will be difficult.

We also might be able to find WRF developers at NCAR who have seen
something similar, although nothing like what you report has been
reported to us, as far as I know.

--Russ

_____________________________________________________________________

Russ Rew                                         UCAR Unidata Program
address@hidden          http://www.unidata.ucar.edu/staff/russ