[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 970605: netcdf and ffio on Cray



>To: address@hidden
>From: Elizabeth Hayes <address@hidden>
>Subject: netcdf and ffio (fwd)
>Organization: Cray
>Keywords: 199706051836.MAA15954

Hi Elizabeth,

Sorry to have taken so long to respond to your support email, but it
seems to have "slipped through the cracks" during a busy period, and I
just noticed it had not been answered yet.

> I have received assistance from Steve Luzmoor
> on this problem, but wondered if y'all could
> also help, as Steve is on vacation for ten days.

Well, he's back now, so maybe he can provide help faster than we did :-).

One thing that would help us is to know what version of netCDF you are
using.  Since we just released a new version 3.3.1 a couple of days ago,
and it fixes a few Cray problems (though just in the Fortran interface,
as far as I know), it's possible that might be a better version from
which to start.  We were able to get 3.3.1 to pass the extensive test
suite executed when "make test" is invoked, using a T90 account at
GFDL.  See the notes in the INSTALL file about this.

I'll forward this to Glenn Davis, in case he knows more about the
problem ...

> Steve suggestions were:
> 1)  Isolate whether problem is ffio or not by
> changing setenv NETCDF_FFIOSPEC to cachea or buffa rather
> than the eie.sds.   
> Result:  No netcdf errors, normal completion, but still
> large locked i/o times.
> Conclusion:  Problem doesn't occur when files are
> placed in buffa,cache, or cachea, but does occur 
> using eie.mem or eie.sds as i/o layers.
> 
> 2)  set ncopts =  NC_VERBOSE;
> this setting has already been set in the m3io layer
> 
> 3)  use debugger
> Result:  Totalview gives different netcdf errors depending
> on where the breakpoints are set.  Setting breakpoints
> in ncvarpt in jackets.c somehow interfers with the
> header information being written to the file, then results in
> error saying 
>     >>> WARNING in subroutine OPNFIL3 <<<
>      Error opening file CLD_CRO_2D_G1
>       EQ256:>> ./CLD_CRO_2D_G1
>      netCDF error number   19
> 
> I believe the file was created by the totalview executable
> but the header wasn't written out, so the program doesn't 
> think it is a netcdf file.
> 
> If I run in totalview without setting breakpoints I get
> the error listed below. 
> 
> Forwarded message:
> > From eah Tue Jun  3 15:45:56 1997
> > Subject: netcdf and ffio
> > To: luzmoor (Steve Luzmoor)
> > Date: Tue, 3 Jun 1997 15:45:57 -0500 (CDT)
> > 
> > Hi Steve,
> > 
> > Your name appeared in connection with netcdf 2.4, and ffio
> > optimization in the netcdf problem archive.
> > 
> > I am working on a problem that was caused when the
> > /tmp disk on a T90 was changed from  DD42 to ND40's.
> > The users were experiencing large locked i/o wait
> > time due to small read/write operations.
> > 
> > setenv NETCDF_FFIOSPEC eie.sds.blocks.diag:184
> > The suggestions listed under the web document
> > http://www.gfdl.gov/~jps/txt/README.IO_Optimization.txt
> > did not work.
> > 
> > setenv NETCDF_FFIOSPEC eie.sds.blocks.diag:184
> > INTEGER PUTENV
> > I = PUTENV('NETCDF_FFIOSPEC=eie.sds.blocks.diag:184')
> > 
> > There errors I obtained were always similar to the following:
> >      >>> WARNING in subroutine OPNFIL3 <<<
> >      Error opening file MET_CRO_3D_G0
> >       EQ256:>> /tmp/hayes_sesarm/maqsip/sesarm/input/SMRAQ_KF_mc3_g0
> >      netCDF error number   -1
> >  
> >  
> >  
> >      >>--->> WARNING in subroutine INCONVERT:INTERP3
> >      Could not open MET_CRO_3D_G0
> >      Date and time 13:00:00 July 7, 1995    (1995188:130000)
> >  
> >  
> >      *** ERROR ABORT in subroutine INCONVERT
> >      Could not interpolate DENS from MET_CRO_3D_G0
> >  M3ERR:  DTBUF 13:00:00 July 7, 1995
> >      Date and time 13:00:00 July 7, 1995    (1995188:130000)
> > 
> > The files would open, dimensions, variables and attributes
> > were written.  When files were taken out of define mode,
> > and into data mode, and data written to the file,
> > the file seems to get corrupted.
> > 
> > I tried using the latest eag_ffio library, and using the
> > following to specify which files were put on the sds.
> > setenv FF_IO_DEFAULTS 
> > "eie.sds.blocks.diag.nolistio:184:-20mw:6:1,event.summary"
> > setenv FF_IO_OPTS "*mk1* (set.oflags_set+=2.skip|=0x200000 |event | eie) 
> > *mk3* (
> > set.oflags_set+=2.skip|=0x200000 |event | eie) *ck3* 
> > (set.oflags_set+=2.skip|=0x
> > 200000 |event | eie) *cm3* (set.oflags_set+=2.skip|=0x200000 |event | eie) 
> > *cw2*
> >  (set.oflags_set+=2.skip|=0x200000 |event | eie) *cd2* 
> > (set.oflags_set+=2.skip|=
> > 0x200000 |event | eie)"
> > setenv FF_IO_OPEN_DIAGS 1
> > setenv FF_IO_LOGFILE 
> > /flyer/cri/a/hayes/maqsip/sesarm/rel_KF/problem.ffio.part.c
> > c3
> > 
> > I was able to restructure the code which made calls to the netcdf
> > library routines to get some of the files to work on the sds using ffio. 
> > These routines are part of the EDSS/Models-3 Air Quality Modeling
> > System developed by Carlie Coats at MCNC for the EPA.
> > 
> > Files that I have been able to put on the sds using ffio contained a few 
> > variables
> > with one unlimited dimension, and other variables with fixed dimensions.  I
> > added a call to set a nofill mode after defining all the dimensions, 
> > attributes and
> > variables, and prior to taking the file out of define mode and putting into 
> > data mode.  
> > A count variable which had one unlimited dimension was then initialized 
> > using ncvpt. 
> > 
> > Files made up of only multiple Variables with an unlimited dimension are 
> > not affected 
> > by setting the nofill option.  Calls to synchronize the file to disk also 
> > don't seem 
> > to work.
> > 
> > Would you help me try to find the source of the error?  I am just learning
> > about netcdf, and ffio.
> > 
> > Many Thanks, Liz Hayes
> 
> 
> Note:  The files that I am able to put on the sds due to the modifications
> in the create file routines using the nofill mode significantly shortened the 
> locked i/o wait time.  I would appreciate any suggestions on how to make 
> changes
> to the create file routines which contain only variables with an unlimited 
> dimension.


--Russ

_____________________________________________________________________

Russ Rew                                         UCAR Unidata Program
address@hidden                     http://www.unidata.ucar.edu