[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: netCDF parallel on T3E



Harsh:

It's not completely clear to me what your question is,
but I think you are wondering about what Steve L. says
here.

>> At this point I don't have much to add to the info I gave you yesterday.
>> Basically, unless something got broken along the way, the changes I made
>> for parallel I/O in netCDF for the T3D/T3E in version 2.4 have been
>> preserved by Unidata in the current 3.4 release.

I just had a look at the netcdf-2.4.3 xdrffio.c file and compared it to
the ffio.c in the current release. In the old file, there are
sequences of code like

#ifdef _CRAYMPP
        ...
        par_io_init(...);
        ...
        par_open(...);
#else
        ...
        ffopens();
#endif

EG, the definition of the macro _CRAYMPP
turns xdrffio.c into something one would call xdrpario.c.

These sequences are not in the current ffio.c.
If I recall correctly, the first cut of ffio.c included them,
but then we tried to compile on a machine which was _CRAYMPP
but didn't have pario library in an obvious place.
So, we cut them out using 'unifdef'. In retrospect
it might have been better to just change the macro
from _CRAYMPP to something like USE_PARIO. But then
we would have this body of untested code in the distribution.
I hope Steve isn't offended, the decision was made mostly
on a "what can we support at this time" kind of basis.

It looks like it would be straight forward (almost mechanical)
to put back in using the old xdrffio.c as a model. (Steve did
the more difficult "thinking" work already.)
However, I would suggest making a copy of ffio.c to something
like pario.c and putting the modifications there, without ifdefs.
This way, one could decide which ncio implementation to use
at link time rather than compile time.

>From what Steve says in the rest of the letter,
I'm not sure you want to use functions whose names are
par_xxx() anyway. What does the "global I/O" interface
look like? Doesn't this take the form of ffio calls?

>> Of course, you'll need
>> to use global I/O rather than the older "par_io" library to do the
>> actual I/O.

Given any platform specific I/O system which is capable of
random access, it is straightforward to write an ncio
implementation which uses that I/O system. A competent C
programmer can do it in less than a day. The interface is
defined in ncio.h. There are two implementations (buffered and
unbuffered) in posixio.c, another in ffio.c, and another contributed
mmapio.c (in pub/netcdf/contrib at our ftp site)
which can be used as examples. (The buffered version in posixio.c has
gotten unreadable at this point, I'm sorry to say.)

A brief outline of the ncio interface follows.

There are  2 public 'constructors':
        ncio_create()
and
        ncio_open().
The first creates a new file and the second opens an existing one.

There is a public 'destructor',
        ncio_close()
which closes the descriptor and calls internal function ncio.free()
to free any allocated resources.

The 'constructors' return a data structure which includes
4 other 'member functions'

        ncio.get() - converts a file region specified by an offset
                and extent to a memory pointer. The region may be
                locked until the corresponding call to rel().

        ncio.rel() - releases the region specified by offset.

        ncio.move() - Copy one region to another without making anything
                available to higher layers. May be just implemented in
                terms of get() and rel(), or may be tricky to be efficient.
                Only used in by nc_enddef() after redefinition.
and
        ncio.sync() - Flush any buffers to disk. May be a no-op on
                if I/O is unbuffered.

The interactions between layers and error semantics are more clearly defined
for ncio than they were for the older xdr stream based system. The functions
all return the system error indication (errno.h) or 0 for no error.

The sizehint parameter to ncio_open() and ncio_create() is a contract between
the upper layers and ncio. It a negotiated value passed. A suggested value
is passed in by reference, and may be modified upon return. The upper layers
use the returned value as a maximum extent for calls to ncio.get().

In the netcdf distribution, there is test program 't_ncio.c',
which can be used to unit test an ncio implementation. The program
is script driven, so a variety of access patterns can be tested
by feeding it different scripts.

This should get you started.
In the next couple of days, I'll try to come up with more
complete documentation for this interface.

-glenn