[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[netCDF #PDL-125161]: Writing parallel files with zero-size chunks


> On 08/29/2012 12:18 AM, Unidata netCDF Support wrote:
> > I just succeeded in running a test case that used count[0] = 0 on an MPI 
> > parallel
> > file system using the netCDF-4 parallel I/O inherited from HDF5, and it ran 
> > fine.
> >
> > The test I ran just inserted the following code in a loop after line 136 in
> > nc_test4/tst_parallel.c:
> >
> >        /* See if count dimension == 0 returns error */
> >        count_save = count[0];
> >        count[0] = 0;
> >        if (nc_put_vara_int(ncid, v1id, start, count, slab_data)) ERR;
> >        count[0] = count_save ;
> >
> > Discussing this with CISL consultants indicates the problem may be 
> > platform-specific.
> thanks for developing further test programs. Unfortunately I don't see how all
> processes writing with count = 0 answers my question about all processes but 
> one
> writing with count=0. What I'd like to know is

Sorry, I see I was confusing your support ticket with another similar question 
asked whether having count[i]==0 for any i in nc_put_var calls was permitted on
parallel platforms.

Now I've compiled and run the bug demonstration code you provided and have 
the problem, resulting in hanging at the same place you observed:

  $ mpirun -n 5 ./nc4partest
  mpi_name: spock.unidata.ucar.edu size: 5 rank: 0, isDataWriter=0
  mpi_name: spock.unidata.ucar.edu size: 5 rank: 1, isDataWriter=0
  mpi_name: spock.unidata.ucar.edu size: 5 rank: 2, isDataWriter=1
  mpi_name: spock.unidata.ucar.edu size: 5 rank: 3, isDataWriter=0
  mpi_name: spock.unidata.ucar.edu size: 5 rank: 4, isDataWriter=0
  mpi_rank=0 start[0]=0 start[1]=0 count[0]=0 count[1]=0
  mpi_rank=2 start[0]=0 start[1]=0 count[0]=24 count[1]=24
  mpi_rank=3 start[0]=0 start[1]=0 count[0]=0 count[1]=0
  mpi_rank=4 start[0]=0 start[1]=0 count[0]=0 count[1]=0
  mpi_rank=1 start[0]=0 start[1]=0 count[0]=0 count[1]=0
  mpi_rank=1 start[0]=0 start[1]=0 count[0]=0 count[1]=0  C-c C-cCtrl-C 
caught... cleaning up processes

> * Given the available API documentation is my program incorrect or triggering
> undocumented behaviour?

It seems to be correct according to the meager API documentation.  The developer
who implemented the netCDF-4 parallel I/O is no longer at Unidata, and we don't
have anyone here currently with the expertise to diagnose and fix this problem.
I have been trying to contract some help for this area, but have not yet 

> * Since you mention the problem might be platform-specific and my program is
> using a fairly widely available platform (Debian GNU/Linux with only two
> self-compiled libraries used, in this case HDF5 1.8.9 and netcdf both
> passing all tests invoked by make check) is there a bug on this platform I
> should be aware of? Is there another platform I should use instead? I'm all 
> for
> stable testing platforms but I'm not aware of a binary download at
> http://www.unidata.ucar.edu/downloads/netcdf/netcdf-4_2_1_1/index.jsp

Please ignore my comment that the bug might be platform-specific, as that was
merely a repetition of what I heard from NCAR CISL consultants about the other
related bug that they have been looking at.  For now, I will enter this bug
into our Jira issue tracking system, but don't know if we will be able to 
it in the near future.  For now, I can only recommend that you contact the NCAR
CISL consulting office:


Sorry we can't be of more help ...


Russ Rew                                         UCAR Unidata Program
address@hidden                      http://www.unidata.ucar.edu

Ticket Details
Ticket ID: PDL-125161
Department: Support netCDF
Priority: Normal
Status: Closed

NOTE: All email exchanges with Unidata User Support are recorded in the Unidata inquiry tracking system and then made publicly available through the web. If you do not want to have your interactions made available in this way, you must let us know in each email you send to us.