Re: [netcdf-hdf] collective I/O ?

On Tue, Sep 18, 2007 at 04:16:51PM -0500, MuQun Yang wrote:
> Another possibility is that HDF5 "magically" figure out your case is
> not good or not possible for collective IO and it will change the 
> route to do an independent IO call instead. To verify that, we have 
> to get your program, the platform, mpi-io compiler information.

Sure, no problem.  I was wondering if HDF5 was doing some magic
myself.  

I'm using MPICH2-1.0.6, compiled with gcc-4.1.2.  I'm just testing on
my laptop, running Ubuntu.  

On Tue, Sep 18, 2007 at 03:32:03PM -0600, Ed Hartnett wrote:
> It would be best if we can reproduce this, but at the moment I don't
> quite know how to tell if independent or collective I/O is actually
> being used. What is Jumpshot - your debugger?

There are a couple ways I can tell collective I/O is not being used:

- I can run the program in a debugger and set a break point on
  MPI_File_{read,write}_at_all and MPI_File_{read,write}_all and
  observe those routines are never hit.

- I can use Jumpshot and see no collective I/O routines end up in the
  logs.  

So, what's Jumpshot?  The MPI standard defines a uniform way to hook
profiling tools into an MPI implementation.  In MPICH2, we call our
profling libray "MPE".  MPE wraps every MPI routine with code to log
the start and end of that routine.  Jumpshot is a visualizer for that
log file, showing a trace of the programs run on a timeline.  

On Tue, Sep 18, 2007 at 04:36:52PM -0500, MuQun Yang wrote:
> Perhaps you can share your testing program,machine and compiler
> information with both Ed and me. I may steal some time to reproduce
> here.

That'd be great.  I've attached both my writer and my reader.  I'm
only putting a single attribute on the dataset and writing a 1D
variable (each process writes it's rank to the variable).

When you reproduce, be sure to take note of the emails I've been
sending the last few days: NetCDF4 needs a bit of work to build
against HDF5 built with MPICH2.

Let me know if I can provide any information.
==rob

-- 
Rob Latham
Mathematics and Computer Science Division    A215 0178 EA2D B059 8CDF
Argonne National Lab, IL USA                 B29D F333 664A 4280 315B
/* simple demonstration of parallel netcdf-4
 * text attribute on dataset
 * read from 1-d array, compare with rank */

#include <stdlib.h>
#include <mpi.h>
#define USE_PARALLEL_MPIO
#include <netcdf.h>
#include <stdio.h>

static void handle_error(int status)
{
        fprintf(stderr, "%s\n", nc_strerror(status));
        exit(-1);
}


int main(int argc, char **argv)
{
        int ret, ncfile, rank, varid, value;
        size_t start, count=1;
        char buf[13];

        MPI_Init(&argc, &argv);

        ret = nc_open_par("demo.nc", NC_MPIIO|NC_NETCDF4, 
                        MPI_COMM_WORLD, MPI_INFO_NULL, &ncfile);
        if (ret != NC_NOERR) handle_error(ret);

        ret = nc_get_att_text(ncfile, NC_GLOBAL, "string", buf);
        if (ret != NC_NOERR) handle_error(ret);

        ret = nc_inq_varid(ncfile, "v1", &varid);

        /* unnecessary: collective access is the default */
        ret = nc_var_par_access(ncfile, varid, NC_COLLECTIVE);
        if (ret != NC_NOERR) handle_error(ret);

        MPI_Comm_rank(MPI_COMM_WORLD, &rank);
        start = rank;
        ret = nc_get_vara_int(ncfile, varid, &start, &count, &value);
        if (ret != NC_NOERR) handle_error(ret);

        printf("rank: %d variable: %d att: %s", rank, value, buf);

        ret = nc_close(ncfile);
        if (ret != NC_NOERR) handle_error(ret);

        H5close();
        MPI_Finalize();

        return 0;
}
/* simple demonstration of parallel netcdf-4
 * text attribute on dataset
 * write out rank into 1-d array */

#include <stdlib.h>
#include <mpi.h>
#define USE_PARALLEL_MPIO
#include <netcdf.h>
#include <stdio.h>

static void handle_error(int status)
{
        fprintf(stderr, "%s\n", nc_strerror(status));
        exit(-1);
}


int main(int argc, char **argv) {

        int ret, ncfile, nprocs, rank, dimid, varid, ndims=1;
        size_t start, count=1;
        char buf[13] = "Hello World\n";

        MPI_Init(&argc, &argv);

        ret = nc_create_par("demo.nc", NC_MPIIO|NC_NETCDF4, 
                        MPI_COMM_WORLD, MPI_INFO_NULL, &ncfile);
        if (ret != NC_NOERR) handle_error(ret);

        MPI_Comm_rank(MPI_COMM_WORLD, &rank);
        MPI_Comm_size(MPI_COMM_WORLD, &nprocs);

        ret = nc_def_dim(ncfile, "d1", nprocs, &dimid);
        if (ret != NC_NOERR) handle_error(ret);

        ret = nc_def_var(ncfile, "v1", NC_INT, ndims, &dimid, &varid);
        if (ret != NC_NOERR) handle_error(ret);

        ret = nc_put_att_text(ncfile, NC_GLOBAL, "string", 13, buf);
        if (ret != NC_NOERR) handle_error(ret);
        
        ret = nc_enddef(ncfile);
        if (ret != NC_NOERR) handle_error(ret);

        /* unnecessary: collective access is the default */
        ret = nc_var_par_access(ncfile, varid, NC_COLLECTIVE);
        if (ret != NC_NOERR) handle_error(ret);

        start = rank; 
        ret = nc_put_vara_int(ncfile, varid, &start, &count, &rank);
        if (ret != NC_NOERR) handle_error(ret);

        ret = nc_close(ncfile);
        if (ret != NC_NOERR) handle_error(ret);

        MPI_Finalize();

        return 0;
}