[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: netcdf question



> From address@hidden Wed Jan 26 07:30:33 1994
> Keywords: 199401262331.AA04453
> Date: Wed, 26 Jan 94 15:30:33 PST
> From: address@hidden (Art Mirin)
> To: address@hidden
> Subject: netcdf question
> Cc: address@hidden
> 
> Hello,      01/26/94
>  
> I am working with several others here at Livermore on a couple of large codes
> that run on parallel processors and that use netCDF.  One or more of my
> colleagues (John Bolstad, Mike Wehner) may have queried you in the past 
> regarding netCDF issues.  I wish to address the following area.
>  
> We have variables that are partitioned between the various processors,
> and in some cases we have each processor write its portion of the 
> variable to the netCDF file.  I have been finding that we get spotty
> results.  I am not sure whether we have implemented netCDF incorrectly,
> have done something else wrong in the code, or what.  I do find that when
> we stick to one processor we are okay.  With more than one processor 
> I have seen things like (a) double infinity values in variables that
> were written by the process that created the netCDF file (using double
> infinity as a fill value); in this particular case the variable of
> concern was written only from the processor that created the file,
> but the file had other variables that were written from all
> processors; (b) zeros in portions of variables corresponding
> to certain processors (I think 0 was the fill value in this case), 
> (c) error when a processor tried to open a netCDF file that another 
> processor had created; the file was already there, as we had a synchronization
> message; we obtained an error something like "cannot open .....  19",
> and looking up 19 in netcdf.inc indicated not a netcdf file.
>  
> Our code is quite large so I'm not asking for specific help at this point.
> But perhaps you have general comments regarding possible limitations on
> netCDF in the parallel context or something that we could be doing to
> ameliorate the situation.
>  
> Thanks,
> Art Mirin


I don't know enough about the specifics of your system to be much
help. However, my first guess is that some memory regions used for
buffering and keeping track of the "current" page are note properly
protected or synchronized in a mulitiprocessor context.

The current netcdf implementation does not work very well for
"shared access" by multiple processes even in a single processor
environment. It supports single writer, multiple reader access
via the ncsync() call. So, if what you are doing is separate
ncopen() calls in distinct processes, all bets are off. In
this case, you would have separate buffers and file descriptors in each
process any data sychronization would go via the file system.

A better move would be to nc_open() in the main thread and then
share the memory and single file descriptor in the various threads
of execution.

In any case, the file you need to look at is libsrc/xdrposix.c.
(If your system as a "parallel" stdio package, you might alternatively
use libsrc/xdrstdio.c. These are two alternate implementations of
the low layer of netcdf, and after reading over them you should have
a good idea about what is going on.)
In xdrposix.c, you would probably want to share the "biobuf" and
put some locking around accesses to the data there.

Hope this helps.

-glenn

Glenn P. Davis                address@hidden
UCAR / Unidata
PO Box 3000                   3300 Mitchell Lane, Suite 170
Boulder, CO 80307-3000        Boulder, CO  80301

(303) 497 8643