[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: netcdf problem with the keyword unlimited



>To: address@hidden
>From: "Alan Dawes" <address@hidden>
>Subject: Re: 20020219: netcdf problem with the keyword unlimited
>Organization: AWE
>Keywords: 

Hi Alan,

> I am now home and I have come across another issue with 
> the UNLIMITED keyword.
> 
> Let me explain. I have two parallel fortran codes which 
> use a parallel communications library (developed here) 
> for communications and the I/O. When it was written the 
> netcdf library was used for the I/O. To get parallelisum 
> one processor writes the data from the surrounding processors 
> by using MPI to get the info. Obviously, this part cannot 
> really be called parallel, but it works. So in a nutshell, 
> the code calls the parallel communications library which 
> in turn calls the netcdf library to do the I/O from a single
> processor.
> 
> However, when I replaced the fixed Z nodal definitions by 
> the UNLIMITED keyword rubbish was written onto the dump..
> but not everywhere...just at specific areas. Therefore, the 
> data has not been shifted or rotated etc. It apeears not to 
> be random. If I run my codes on a single processor it still 
> causes problems.
> 
> I have tried both netcdf-3.4 and 3.5 with no effect.
> 
> If I totally remove the parallel library and call raw netcdf 
> directly there is NO PROBLEM!
> 
> Personally, I cannot explain it...hence the message. 
> 
> 1/ Can the netcdf library work using MPI when the
>    UNLIMITED keyword is specified? 
> 
>    Has this been tested?

Yes, the netCDF library can be used in applications that also use
MPI, but netCDF has not been adapted to use MPI, so is not parallel.
Specifically, I know of some ocean modelers who are using MPI and
netCDF (and finding netCDF a bottleneck because it's not parallel).
Whether the UNLIMITED dimension is used or not should not be relevant.

You are calling NF90_CLOSE() on your dataset before exiting, right?
That's necessary to make sure the size of the unlimited dimensions
gets written to disk correctly.

It's possible what you are seeing is a bug in the MPI part of your
program that causes something to be overwritten.

A difference between the UNLIMITED dimension and other dimensions is
that the size of the UNLIMITED dimension changes as you write more
data, so you should make explicit calls (nc_sync, nf_sync, nf90_sync,
depending on which interface you're using) to synchronize the value in
the in-memory header with the value on disk.  And if another process
is reading the size of the unlimited dimension, it must also call
nc_sync before the read, to make sure it's getting the value on disk
instead of the old value in it's in-memory copy.  

The UNLIMITED dimension is supposed to be a "high-water mark" of the
number of records; it never decreases.  You can write data into a high
number record before you write data into lower number records, so
depending on the how the fill-value option is set, you may get either
garbage or fill-values in the intervening unwritten records.

> 2/ Can POINTER array definitions be passed to the 
>    netcdf functions or do they have to be fixed or
>    allocatable storage?

It sounds like you're using the Fortran 90 interface.  We didn't write
that and aren't familiar enough with Fortran 90 to know whether
there's any reason you couldn't use POINTER array definitions just
like fixed or allocatable storage.  If you can send us a small example
that demonstrates it doesn't work, we'll try to find out why and fix
the problem or document the restriction.

> 3/ what other issues may effect the way netcdf 
>    writes UNLIMITED data?

If multiple variables use the unlimited dimension, the data is written
in a different order to disk than it would be if there is no UNLIMITED
dimension: all the data for each record variable is written for each
record slice, instead of all data for each variable being written in a
single contiguous block, as is the case for non-record variables.

See the documentation in the User's Guide in the "File Structure and
Performance" chapter and for more information:

 http://www.unidata.ucar.edu/packages/netcdf/f90/Documentation/guide.book.pdf

I don't know if we have any test platforms with an MPI library here,
but if you can construct a small example that fails and send it to us,
we could try to duplicate the problem.

--Russ

_____________________________________________________________________

Russ Rew                                         UCAR Unidata Program
address@hidden                     http://www.unidata.ucar.edu