[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Problem on SGI



>To: address@hidden
>From: Matthew Bettencourt <address@hidden>
>Subject: Re: 20030903: Problem on SGI 
>Organization: 
>Keywords: C++, sync, bug, SGI, File exists

Matt,

> Srry this took so long, but friday was not a fun day for me :>  Just to 
> let you know, I ran the sync_test routine and it worked w/o a problem.

I'm sorry to hear that, it would have made debugging the problem
simpler if this small test of netcdf synchronization had failed.  We
still have no way to duplicate the problem here, which is what we need
to determine whether it is a netcdf bug and fix it.

> as for
> --------------
> "File exists" is a system error corresponding to
> 
>      #define  EEXIST  17      /* File exists                          */
> 
> in /usr/include/sys/errno.h.  It could occur from calling open() with
> O_CREAT and O_EXCL set, but if this occurred within the netCDF
> library, I would think you would get the corresponding netCDF error
> above instead.
> ------------
> the only way I create a new NcFile are the following.
> 
> 
> rs->ncfp[j] = new NcFile(name.c_str(),NcFile::Write);
> ncfiles[_proc_id][i] = new NcFile(filename.c_str(),NcFile::Replace);

It's possible the "File exists" error is really just a result of
clobbering sys_nerr (or wherever the state of the last system error is
stored in multithreaded code) with an errant pointer or out-of-bounds
array index, in which case the actual error you see would not indicate
anything important.

> I don't know the status on this issue, I am gonna try one more thing on 
> this end.  I may not have been complete locking out all processes 
> reading from files (My thread structure on this code is pretty pasta-ish 
> IYKWIM).  Do you have any ideas from a netCDF side of things??

No, sorry.  No one else has reported similar problems and our sync
test code seems to perform as intended.  If you could possibly shrink
the code that demonstrates the error into something we could run on
our dual-CPU IRIX system to reproduce the error, we might be able to
make some progress, but it would have to be easy to see that the code
didn't expect the netCDF library to be thread-safe ...

My earlier shock at looking at the NcVar::sync() code and thinking it
was clearly in error was misplaced.  The fact that it works fine on
other platforms confirms that there is nothing simple wrong with the
sync() methods.

--Russ

_____________________________________________________________________

Russ Rew                                         UCAR Unidata Program
address@hidden                      http://my.unidata.ucar.edu