[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 20040212: NFerror at nf_enddef



>To: <address@hidden>
>From: "Lars Grabow" <address@hidden>
>Subject: NFerror at nf_enddef
>Organization: University of Wisconsin
>Keywords: 200402121454.i1CEsZp1023507 netCDF Beowulf cluster

Hi Lars,

> my research group is running parallel code on a Linux RedHat 9
> Beowulf cluster (48 nodes). The fortran code uses netCDF files as in
> and output and is running (mostly) without problems on several
> different hardware platforms. Just occasionally, some jobs are
> crashing on the Linux cluster and exiting with the following error
> message:
> 
> ---------------------------------------
> netCDF library flagged error
> * netCDF error number:            5
> * netCDF error message = 
> Input/output error
> Error trace :
> subroutine nfsetmode
> last error message =NFerror at nf_enddef
> abort_calc: stopped by netCDFinterface -> local_error_handler
> clexit: exiting the program
> PAR: msexit: Message-passing time=          0.010 CPU seconds
> PAR: msexit halting Master
> ----------------------------------------
> I'm using netCDF 3.5.0, compiled from source, and <make test> ends
> successfully. I could imagine that a hardware/network/communication
> problem is the case here. Another possibility is, that there might
> be a simultaneous access to the netcdf file. Do you have any
> suggestions, how I could identify the problem further, or can you
> point me to other possible causes?

I suspect this error is caused by simultaneous accesses to the netCDF
file.  Our netCDF library only permits one writer at a time.  It is
possible to open a file for writing by multiple processes or threads,
but since the library is not designed for this use, errors or file
corruption are likely to result.  It is possible to have one writer
and multiple readers, but that also requires use of the NF_SHARE flag
on open or lots of calls to the nf_sync() subroutine.

If you need parallel or concurrent access by multiple writers to the
same netCDF file, consider using the Parallel netCDF package
(pnetcdf) developed by researchers at Northwestern University and
Argonne Laboratories:

  http://www-unix.mcs.anl.gov/parallel-netcdf/

--Russ

_____________________________________________________________________

Russ Rew                                         UCAR Unidata Program
address@hidden          http://www.unidata.ucar.edu/staff/russ