[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: netCDF 2.4.3 prefill on Cray address@hidden



> To: address@hidden
> From: address@hidden (John Sheldon)
> Subject: netCDF 2.4.3 prefill on Cray
> Organization: Princeton/GFDL
> Keywords: 199611160003.AA04557

> I finally got around to testing out 2.4.3 on our Cray IEEE T90.
> Unfortunately, I did not see any improvement in the speed of pre-fill.

(Jokeing tone of voice) As far as I know, performance improvement for
prefill on Cray T90 wasn't in the specs for netcdf-2.4.3,
so I don't know anyone would be expecting it. As I mention at the end,
those sort of improvements are in netcdf-3.

Without doing any profiling, I would guess that the reason that 2.4.3
performance difference between fill and no_fill due to the fact that the
"source" array for fill data is very small,
and the prefill loop isn't as smart as the varput loop.
The prefill loop is something like:

for(ii = current_number_of_records;
        ii < current_number_of_records + new_records; ii++)
{
        NCfillrecord(..., ii);
}

The function fill_record(..., recnum)
steps through the "record" variables (in your example case, just the one)
and calls xdr_NC_fill() to fill the the space for that particular variable.
The function xdr_NC_fill() just loops thru a single record's data, converting
from a small (in your case 2 value) source array.

In contrast, for a varput(), you are handing it a source array with all the
values, so the record fill is completely parallel ?

All is not lost, however. It looks to me like someone ("SWANSON") has done some
work on improving this for other cray architectures, see cdf.c: xdr_NC_fill().
It looks like he was being conservative in his changes here. I believe
if you change line 426 of cdf.c:

#if !defined(_CRAY) || defined(_CRAYMPP) || defined(_CRAYIEEE)

to

#if !defined(_CRAY)

your architecture would benefit from this work. (In your case, the call to
 xdr_floats() would end up being a copy rather than a conversion.)
Give that a try and see if things improve. Note that it is really a bit
of a hack, it only fixes 'float' values.

In netcdf-3, all architectures and all types benefit from this sort of thing.
There is a compile time tuning parameter, NFILL (in putget.c) which
controls the space vs time tradeoff of fill buffer size vs looping for fill.
NC_PG_CHUNK/sizeof(double). NC_PG_CHUNK is another compile time tuning
parameter, defined in nc.h, which controls the same sort of tradeoff for all
i/o. On systems with lots o' memory, NC_PG_CHUNK could be increased well beyond
it's default value of 16384.

Hope this helps.

-glenn