[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Problem on SGI



>To: address@hidden
>From: Matthew Bettencourt <address@hidden>
>Subject: Re: 20030903: Problem on SGI 
>Organization: 
>Keywords: C++, sync, bug, SGI, file exists

Matt,

> This code works fine on AIX/OSF/Linux/SUN and only gives me beef's with 
> the SGI.  Now, here is a theory that I came up with.  I am storing 
> netcdf vars/files in linked lists, stuff like list<NcVar> and whatnot. 
> Now, the way that different compilers handle these STL calls varies, or 
> at least can vary greatly.  I am thinking that some of the netCDF c++ 
> copy/create/delete constructs may be broken and this is what is 
> corrupting the data on the SGI.  That is just a theory I have, but like 
> many of my theorys, it is often wrong :(...
> 
> Now, if I comment out the sync it dies in a put_rec call....

You may have answered this before, but I can't find it: Do you have
only one process writing a file while multiple other processes are
reading it?  You should not have  multiple processes or threads trying
to write to the same file concurrently, netCDF doesn't support that.

Another question: Is your use of the sync() method purely for having a
writer indicate that there is now more data, without changing the
schema information like number of variables or attributes, or is the
writer process also adding variables, dimensions, or attributes to a
file and then invoking the NcFile::sync() method?  The latter use of
sync() is definitely not well-tested, so I would not be surprised if
there were bugs for that sort of use.

Also, after the netCDF C++ interface was written back in 1996, we
added a new way of handling syncing writes and reads to the C
interface, by use of the NC_SHARE flag when a file is opened.  It's
possible you could use this flag as well and let the syncing be
handled inside the C library (called by the C++ library) instead of
calling the sync() method yourself.  Here's a description of the use
of NC_SHARE:

  
http://www.unidata.ucar.edu/packages/netcdf/guidec/guidec-10.html#HEADING10-322

Currently there is no use of NC_SHARE in the C++ interface, but I
think it would be relatively easy to add this to the NcFile
constructor by adding another parameter

  NcFile(const char* path, FileMode = ReadOnly, ShareMode = UnShared)

and providing ShareMode=Shared would make sure the call to the
underlying C nc_open() function used the NC_SHARE flag.

Finally, Steve Emmerson has created a couple of C programs for testing
the nc_sync() functions in the C interface, in files "nc_sync.c" and
"nc_sync_child.c".  Code in the first file creates a netCDF file and
executes the program in the second file via popen().  It then modifies
the netCDF file and signals the child program via the pipe to test the
nc_sync() function.  A new makefile target "sync_test" executes the
test.  It would take a little effort to modify these for a C++ sync()
test, but it sounds like it might be worth it.  If we could reproduce
the SGI problem using these, it would be a lot easier to track down.

If I sent you the C programs, would you be able to compile and test
those, just to make sure the C nc_sync() function is working correctly
on your SGI platform?  We have an SGI here too, so I could test it,
but it will be hard to get to it before tomorrow.

> Also, I don't know where it is getting the File exists warning message??

From the C layer, the message 

  netCDF file exists && NC_NOCLOBBER

is returned by nc_strerror for the netCDF error return code
NC_EEXIST.  If that's not exactly the message you are getting, then I
don't know where it's coming from either.

--Russ