[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: "David Borg-Breen (2052/x6816)": netCDF problem with Solaris 2.4



> Date:    Fri, 14 Jul 1995 17:40:34 -0700
> From:    "David Borg-Breen (2052/x6816)" <address@hidden>
> To:      address@hidden, address@hidden
> Subject: netCDF problem with Solaris 2.4
>
> We are continuing to have netCDF problems with our SparcServer 1000
> running Solaris 2.4 when used with nfs-mounted file-systems.
>
> The symptom is that zeros are being written to a netcdf file instead
> of non-zero data when the file-system is mounted using nfs.
>
> Mike Schmidt gave me your patch history a months ago and our patch
> levels now match yours.
>
> A couple of other people from PMEL have communicated with you about
> the problem.  I understand one of the recommendations was to add calls
> to "ncsync".  This was done and it greatly reduced the frequency of
> errors for our most critical application when the files are exported
> from a Unix file system.  Unfortunately we also have users accessing files
> exported from a VMS system using TGV's Multinet NFS software and we continue
> to experience a large number of errors with the Multinet NFS files.
>
> I just ran a test on directories mounted both from another Sun/Solaris
> system as well as the VMS system.
>
>   Sun/Solaris file-server:    1 error in 350 tests
>   VMS file-server:            about 50% error rate
>
> Do you know of any fixes or special patches to remedy problems accessing
> Multinet-served files?
>
> If not, we would like to ask TGV about the problem but aren't sure
> exactly what to ask.  Can you help us out with this information?
>
>  - What does the "ncsync" routine do?

A netcdf file can be thought of as containing a header and
the data. When a netcdf file is opened, the information in the header
is brought into memory. It is written out on 'ncclose' (or on 'ncendef' for a
newly created file). If the netcdf file has an "unlimited" (aka "record")
dimension, data writes may change information stashed in the
header, such as the current maximum value of the record dimension, so that
processes which are attempting to read the file will not be aware of the
additional data. So, in a file opened for write, 'ncsync' is called to
cause the changed header information to be written out immediately.
If the file is opened read-only, the header information is read in from
the file. Note that netcdf (silently) does not support having multiple
processes modify a file at the same time, even if ncsync() is used.
I suspect this may be your problem.



>  - Can you tell us how netCDF does it's writes?  What I/O or file
>    routines does it call?

There is a compile time choice here.

The default i/o strategy is implemented in libsrc/xdrposix.c.
The library maintains a "page" (8192) bytes in memory, which it has gotten
using lseek() and read(). If the page is modified, it tagged as such.
When there is a request to access data which is not on the current page,
it is written out (if modified) using lseek() and write() and the page
containing the desired data is read.
Since the header info is typically not on the same page as the data last
accessed, calling 'ncsync()' would typically trigger this.

The alternate strategy is to use xdr_stdio, which uses stdio for i/o
see libsrc/xdrstdio.c and the xdr(3) man pages.

> Any details or other other relevant information you can provide would
> be appreciated.

I'm not sure which i/o strategy is used on the VMS system.
If it is xdrposix.c, you might try changing  BIOBUFSIZ in xdrposix.c to
something smaller, like 1024.

> Thanks very much.  -Dave Borg-Breen
>
> - ----------------------------------------------------------------------
>   David Borg-Breen                     Internet: address@hidden
>   NOAA/PMEL                            Phone: (206) 526-6816
>   7600 Sand Point Way NE, Bldg. 3      Fax:   (206) 526-6815
>   Seattle, WA 98115
> - ----------------------------------------------------------------------