[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 20030730: Netcdf performance problem on NEC SX6



>To: address@hidden
>From: Mathis Rosenhauer <address@hidden>
>Subject: Netcdf performance problem on NEC SX6
>Organization: Deutsches Klimarechenzentrum
>Keywords: 200307301224.h6UCO8Ld019417 netCDF NEC SX6 performance

Hi Mathis,

I hope you don't mind that I'm also CC:ing Gottfried Necker on this
reply, since he has recently run into a similar problem.

> I got reports from our users who see a performance drop between the
> old netcdf-3.4.0 version and the latest 3.5.1-beta11 on our NEC SX6
> machines or with any other 3.5.x release for that matter. I have
> narrowed this down to ncio_px_get() in posixio.c:
> 
>       if (*vpp == NULL)
>       {
>           ncio_px_sync(nciop);
>           pxp->bf_offset = OFF_NONE;
>           pxp->bf_cnt = 0;
>       }
> 
> That statement seems to be new in 3.5.x. Tracing back a little bit I
> found in putget.m4 the function putNCvx_$1_$2() which uses a local
> pointer
> 
> void *xp;
> 
> and calls
> 
> int lstatus = ncp->nciop->get(ncp->nciop, offset,extent, RGN_WRITE, &xp);
> 
> xp is undefined in the first round and happens to be NULL in a lot of
> cases on our system which causes the slowdowns in ncio_px_get().
> 
> This is probably related to another report I found in your mail
> archives but using 3.5.1-beta11 doesn't help much in our case.
> 
> http://www.unidata.ucar.edu/cgi-bin/msgout?/glimpse/netcdf/5141
> 
> Would it be safe to disable the "if (*vpp == NULL)" statement in
> posixio.c or make xp static?

Thanks for digging into this problem and reporting what you found.

It looks like the ncio_px_get() change was made to fix a
synchronization bug with the symptoms that nc_sync() by a reader was
not making visible the changes made by a concurrent writer.  It
appears that the "fix" may have done more than what was intended.  

I definitely would not advise making xp static, as it looks to me as
if xp is supposed to be an output-only variable for the
ncp->nciop->get() call, so it should never be dereferenced to test its
value against NULL.  

If you aren't using concurrent writers and readers, you may be able to
safely discard the "if (*vpp == NULL)" statement, but we aren't sure
of the right fix in the case of concurrent access yet.

> Thanks in advance for your help

Thanks again for the help in debugging this problem

> Mathis
> 
> -- 
> Mathis Rosenhauer
> Wissenschaftliches Rechnen
> Deutsches Klimarechenzentrum                       http://www.dkrz.de

--Russ

_____________________________________________________________________

Russ Rew                                         UCAR Unidata Program
address@hidden                      http://my.unidata.ucar.edu