[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[netCDF #BKA-178769]: Parallel hang in nc_put_vara_double



Hi Greg,

> Additional information on this ticket:
> 
> I am able to get code to run by making a couple changes to nc4hdf.c.
> 
> Near line 603, change the "if (start[d2] >= (hssize_t)fdims[d2])" to "if
> (start[d2] > (hssize_t)fdims[d2])"  That is, change ">=" to ">".
> 
> Near line 612, remove the code block below:
> 
> /* A little quirk: if any of the count values are zero, then
> return success and forget about it. */
> for (d2 = 0; d2 < var->ndims; d2++)
> if (count[d2] == 0)
> goto exit;
> 
> The line numbers are for netcdf-4.1.3; I get similar behavior in
> netcdf-4.2.1.1.  I'm not sure how those changes affect other behavior in
> serial runs, but they get me past the current hang...
> 
> --Greg

Unfortunately, in either netcdf-4.1.3 or in the current snapshot, making the 2
changes you suggest results in failures running "make check" in nc_test when 
configuring serial:

  *** testing nc_put_var1_text ... 
        FAILURE at line 1465 of test_put.c: bad index: status = -57
    ...
        ### 48 FAILURES TESTING nc_put_var1_text! ###
    ...
  *** testing nc_put_vara ... 
        FAILURE at line 854 of test_write.c: bad index: status = -57
        ### 284 FAILURES TESTING nc_put_vara! ###
    ...
  *** Total number of failures: 5668
  *** nc_test FAILURE!!!
  FAIL: nc_test

Removing just the "quirky" code block doesn't cause any test failures, but
I assume by itself that's not a fix for the problem you encountered with
parallel netCDF-4, so I would need a better justification to accept that fix.

We may have some help replacing our lack of parallel I/O expertise soon, but 
until then I'll put this on hold, at least until we get upgrade our parallel
testing environment to the latest HDF5, pnetCDF, MPI-IO, etc.

By reporting the problem and your workaround, it may help someone else 
searching for a solution to a similar problem.

I see you've also provided a patch for additional problems with pnetcdf, and 
I'll
try testing those soon for incorporation into our development snapshot.  Thanks
for your contributions!

--Russ

> On 11/29/12 3:48 PM, Unidata netCDF Support wrote:
> > Gregory Sjaardema,
> >
> > Your Ticket has been received, and a Unidata staff member will review it 
> > and reply accordingly. Listed below are details of this new Ticket. Please 
> > make sure the Ticket ID remains in the Subject: line on all correspondence 
> > related to this Ticket.
> >
> >      Ticket ID: BKA-178769
> >      Subject: Parallel hang in nc_put_vara_double
> >      Department: Support netCDF
> >      Priority: Normal
> >      Status: Open
> 
> 
> 
Russ Rew                                         UCAR Unidata Program
address@hidden                      http://www.unidata.ucar.edu



Ticket Details
===================
Ticket ID: BKA-178769
Department: Support netCDF
Priority: High
Status: Closed