Re: HDF5 chunking questions...

Ed,
There's a fairly extensive chapter on chunking and chunk caching at http://hdf.ncsa.uiuc.edu/UG41r3_html/Perform.fm2.html#149138. This covers the material Quincey provided, and quite a bit more.

This is in the HDF4 documentation. We still need to do something similar for HDF5, but the principles are quite similar.

Mike


At 11:06 AM 12/15/2003, Quincey Koziol wrote:
Hi Ed,

> Quincey et. al.,
>
> Given an n-dimensional dataspace, with only one unlimited
> (i.e. extendable) dimension, tell me how to select the chunk size for
> each dimension to get a good read performance for large data files.
>
> Would you care to suggest any smart algorithms to yeild better
> performance for various situations?
    Unfortunately there aren't generic instructions for this sort of thing,
it's very application-I/O-pattern dependent.  A general heuristic is to pick
lower and upper bounds on the size of a chunk (in bytes) and try to make the
chunks "squarish" (in n-D).  One thing to keep in mind is that the default
chunk cache in HDF5 is 1MB, so it's probably worthwhile to keep chunks under
half of that.  A reasonable lower limit is a small multiple of the block size
of a disk (usually 4KB).
    Generally, you are trying to avoid the situation below:

        Dataset with 10 chunks (dimension sizes don't really matter):
        +----+----+----+----+----+
        |    |    |    |    |    |
        |    |    |    |    |    |
        | A  | B  | C  | D  | E  |
        +----+----+----+----+----+
        |    |    |    |    |    |
        |    |    |    |    |    |
        | F  | G  | H  | I  | J  |
        +----+----+----+----+----+

        If you are writing hyperslabs to part of each chunk like this:
        (hyperslab 1 is in chunk A, hyperslab 2 is in chunk B, etc.)
        +----+----+----+----+----+
        |1111|2222|3333|4444|5555|
        |6666|7777|8888|9999|0000|
        | A  | B  | C  | D  | E  |
        +----+----+----+----+----+
        |    |    |    |    |    |
        |    |    |    |    |    |
        | F  | G  | H  | I  | J  |
        +----+----+----+----+----+

        If the chunk cache is only large enough to hold 4 chunks, then chunk
    A will be preempted from the cache for chunk E (when hyperslab 5 is
    written), but will immediately be re-loaded to write hyperslab 6 out.

    Unfortunately, our general purpose software can't predict the I/O pattern
that users will access the data in, so it is a tough problem. One the one hand,
you want to keep the chunks small enough that they will stick around in the
cache until they are finished being written/read, but you want the chunks to
be larger so that the I/O on them is more efficient. :-/

    Quincey

--
Mike Folk, Scientific Data Tech (HDF)   http://hdf.ncsa.uiuc.edu
NCSA/U of Illinois at Urbana-Champaign          217-244-0647 voice
605 E. Springfield Ave., Champaign IL 61820 217-244-1987 fax
From owner-netcdf-hdf@xxxxxxxxxxxxxxxx 16 2003 Dec -0700 08:12:39
Message-ID: <wrxpteo4x08.fsf@xxxxxxxxxxxxxxxxxxxxxxx>
Date: 16 Dec 2003 08:12:39 -0700
From: Ed Hartnett <ed@xxxxxxxxxxxxxxxx>
To: netcdf-hdf@xxxxxxxxxxxxxxxx
Subject: timing HDF5 1.6.1 code...
Received: (from majordo@localhost)
        by unidata.ucar.edu (UCAR/Unidata) id hBGFCeuv013594
        for netcdf-hdf-out; Tue, 16 Dec 2003 08:12:40 -0700 (MST)
Received: from rodney.unidata.ucar.edu (rodney.unidata.ucar.edu 
[128.117.140.88])
        by unidata.ucar.edu (UCAR/Unidata) with ESMTP id hBGFCdp2013588
        for <netcdf-hdf@xxxxxxxxxxxxxxxx>; Tue, 16 Dec 2003 08:12:39 -0700 (MST)
Organization: UCAR/Unidata
Keywords: 200312161512.hBGFCdp2013588
Lines: 39
User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.2
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Sender: owner-netcdf-hdf@xxxxxxxxxxxxxxxx
Precedence: bulk

Howdy all!

A looming question for netcdf-4 is one of speed. Can it keep up with
netcdf-3?

To quantify this we have the following requirement (see
http://my.unidata.ucar.edu/content/software/netcdf/netcdf-4/reqs.html):

"The performance of netCDF-4 in not more than 10% slower than
netCDF-3 for large contiguous data writes. It is not more than 100%
slower for any other operation."

Of course I want to make netcdf-4 as fast as possible. To do that I
need to make sure I'm using HDF5 properly, and getting the best
performance for the set of tasks that we select to optimize for.

I'm going to take a stab at quantifying some things here, in the hope
that Russ will correct or expand it:

In netcdf the task is the reading (and, less importantly, writing) of
large 2/3/4 dimensional arrays of floats/doubles/longs. By large we
mean thousands of records of size range on the order of 100 for each
other dimension.
For example, a 4D file with 2000 records, each of size 100x100x100,
would have a total size of about 2GB, the maximum now available under
32-bit netcdf-3.x.

I would like to get some idea of the "native" speed of HDF5 on such
tasks, so that I can make sure that netcdf-4 gets that speed.

Can you guys look at some HDF5 code for me and tell me it if conforms
to your idea of good HDF5 code, which will me as fast as it can be?

I will send the code as a separate message...

Ed



From owner-netcdf-hdf@xxxxxxxxxxxxxxxx 16 2003 Dec -0700 08:16:49
Message-ID: <wrxllpc4wta.fsf@xxxxxxxxxxxxxxxxxxxxxxx>
Date: 16 Dec 2003 08:16:49 -0700
From: Ed Hartnett <ed@xxxxxxxxxxxxxxxx>
To: netcdf-hdf@xxxxxxxxxxxxxxxx
Subject: code to write a bunch of integer records...
Received: (from majordo@localhost)
        by unidata.ucar.edu (UCAR/Unidata) id hBGFGprd017790
        for netcdf-hdf-out; Tue, 16 Dec 2003 08:16:51 -0700 (MST)
Received: from rodney.unidata.ucar.edu (rodney.unidata.ucar.edu 
[128.117.140.88])
        by unidata.ucar.edu (UCAR/Unidata) with ESMTP id hBGFGop2017785
        for <netcdf-hdf@xxxxxxxxxxxxxxxx>; Tue, 16 Dec 2003 08:16:50 -0700 (MST)
Organization: UCAR/Unidata
Keywords: 200312161516.hBGFGop2017785
Lines: 85
User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.2
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Sender: owner-netcdf-hdf@xxxxxxxxxxxxxxxx
Precedence: bulk


HDF5 Guys,

How does this code look? This is what I'm timing to get the baseline
answer: how fast can HDF5 write a file of longs?

Thanks!

Ed

#define NDIMS 3
#define XLEN 2000
#define YLEN 300
#define ZLEN 500
#define HDF5_FILE "/tmp/a1.h5"
#define VAR_NAME "V"

  hid_t hdfid, mem_spaceid, file_spaceid, datasetid, plistid;
  hsize_t h5dim[] = {XLEN, YLEN, ZLEN}, h5count[] = {1, YLEN, ZLEN};
  hssize_t h5start[] = {0, 0, 0};
  hsize_t h5dimmax[] = {H5S_UNLIMITED, YLEN, ZLEN}, chunksize[NDIMS];
  int *data;

  /* Allocate memory for data and fill it with a phoney value. */
  {
     size_t len = YLEN*ZLEN*sizeof(int);
     if (!(data = (int *)malloc(len)))
         BAIL(-2);
     for (i=0; i<YLEN*ZLEN; i++)
         data[i] = i;
  }

     /* Create a HDF5 file, with an unlimited dimension, and write
         the data that way. */
     {
         /* Create the file and dataset. */
         if ((hdfid = H5Fcreate(HDF5_REC_FILE, H5F_ACC_TRUNC,
                                H5P_DEFAULT, H5P_DEFAULT)) < 0)
            BAIL(-1);
         h5dim[0] = 0;
         h5dim[1] = YLEN;
         h5dim[2] = ZLEN;
         if ((file_spaceid = H5Screate_simple(NDIMS, h5dim, h5dimmax)) < 0)
            BAIL(-3);
         if ((plistid = H5Pcreate (H5P_DATASET_CREATE)) < 0)
            BAIL(-10);
         chunksize[0] = 1;
         chunksize[1] = YLEN;
         chunksize[2] = ZLEN;
         if (H5Pset_chunk(plistid, NDIMS, chunksize) < 0)
            BAIL(-11);
         if ((datasetid = H5Dcreate(hdfid, VAR_NAME, H5T_STD_I32BE, file_spaceid, 
plistid)) < 0)
            BAIL(-4);
         H5Sclose(file_spaceid);
         H5Pclose(plistid);

         /* Now write the data. Use the same mem space for all
            writes. This memspace is only big enough to hold one
            record. */
         h5dim[0] = 1;
         h5dim[1] = YLEN;
         h5dim[2] = ZLEN;
         if ((mem_spaceid = H5Screate_simple(NDIMS, h5dim, NULL)) < 0)
            BAIL(-3);
         for (h5start[0] = 0; h5start[0]<XLEN; h5start[0]++)
         {
            h5dim[0] = h5start[0] + 1;
            if (H5Dextend(datasetid, h5dim) < 0)
               BAIL(-3);
            if ((file_spaceid = H5Dget_space(datasetid)) < 0)
               BAIL(-3);
            if (H5Sselect_hyperslab(file_spaceid, H5S_SELECT_SET, h5start,
                                    NULL, h5count, NULL) < 0)
               BAIL(-3);
            if (H5Dwrite(datasetid, H5T_STD_I32BE, mem_spaceid,
                         file_spaceid, H5P_DEFAULT, data))
               BAIL(-5);
            H5Sclose(file_spaceid);
         }

         /* Clean up. */
         H5Sclose(mem_spaceid);
         H5Dclose(datasetid);
         H5Fclose(hdfid);
     }

From owner-netcdf-hdf@xxxxxxxxxxxxxxxx 16 2003 Dec -0700 08:26:38
Message-ID: <wrx1xr44wcx.fsf@xxxxxxxxxxxxxxxxxxxxxxx>
Date: 16 Dec 2003 08:26:38 -0700
From: Ed Hartnett <ed@xxxxxxxxxxxxxxxx>
In-Reply-To: <6.0.1.1.2.20031215112120.02080a00@xxxxxxxxxxxxxxxxx>
To: netcdf-hdf@xxxxxxxxxxxxxxxx
Subject: Re: HDF5 chunking questions...
Received: (from majordo@localhost)
        by unidata.ucar.edu (UCAR/Unidata) id hBGFQecA026929
        for netcdf-hdf-out; Tue, 16 Dec 2003 08:26:40 -0700 (MST)
Received: from rodney.unidata.ucar.edu (rodney.unidata.ucar.edu 
[128.117.140.88])
        by unidata.ucar.edu (UCAR/Unidata) with ESMTP id hBGFQdp2026924
        for <netcdf-hdf@xxxxxxxxxxxxxxxx>; Tue, 16 Dec 2003 08:26:39 -0700 (MST)
Organization: UCAR/Unidata
Keywords: 200312161526.hBGFQdp2026924
References: <ullpe8ig5.fsf@xxxxxxxxxx>
        <200312151706.hBFH6o5q049894@xxxxxxxxxxxxxxxxxxxxxx>
        <6.0.1.1.2.20031215112120.02080a00@xxxxxxxxxxxxxxxxx>
Lines: 78
User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.2
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Sender: owner-netcdf-hdf@xxxxxxxxxxxxxxxx
Precedence: bulk

Mike Folk <mfolk@xxxxxxxxxxxxx> writes:

Ed,
There's a fairly extensive chapter on chunking and chunk caching at
http://hdf.ncsa.uiuc.edu/UG41r3_html/Perform.fm2.html#149138.  This
covers the material Quincey provided, and quite a bit more.

Thanks, I've read that...

>     Unfortunately there aren't generic instructions for this sort of thing,
>it's very application-I/O-pattern dependent.  A general heuristic is to pick
>lower and upper bounds on the size of a chunk (in bytes) and try to make the
>chunks "squarish" (in n-D).  One thing to keep in mind is that the default
>chunk cache in HDF5 is 1MB, so it's probably worthwhile to keep chunks under
>half of that.  A reasonable lower limit is a small multiple of the block size
>of a disk (usually 4KB).

Can the chunk cache size be increased programmatically?

1 MB seems low for scientific applications. Even cheap consumer PCs come
with about half a gig of RAM. Scientific machines much more
so. Wouldn't it be helpful to have 100 MB, for example?

>     Generally, you are trying to avoid the situation below:
>
>         Dataset with 10 chunks (dimension sizes don't really matter):
>         +----+----+----+----+----+
>         |    |    |    |    |    |
>         |    |    |    |    |    |
>         | A  | B  | C  | D  | E  |
>         +----+----+----+----+----+
>         |    |    |    |    |    |
>         |    |    |    |    |    |
>         | F  | G  | H  | I  | J  |
>         +----+----+----+----+----+
>
>         If you are writing hyperslabs to part of each chunk like this:
>         (hyperslab 1 is in chunk A, hyperslab 2 is in chunk B, etc.)
>         +----+----+----+----+----+
>         |1111|2222|3333|4444|5555|
>         |6666|7777|8888|9999|0000|
>         | A  | B  | C  | D  | E  |
>         +----+----+----+----+----+
>         |    |    |    |    |    |
>         |    |    |    |    |    |
>         | F  | G  | H  | I  | J  |
>         +----+----+----+----+----+
>
>         If the chunk cache is only large enough to hold 4 chunks, then chunk
>     A will be preempted from the cache for chunk E (when hyperslab 5 is
>     written), but will immediately be re-loaded to write hyperslab
>     6 out.

OK, great. Let me see if I can start to come up with the rules by
which I can select chunk sizes:

1 - Min chunk size should be 4 KB.
2 - Max chunk size should allow n chunks to fit in the chunk cache,
where n is around the max number of chunks the user will access at
once in a hyper-slab.

>
>     Unfortunately, our general purpose software can't predict the I/O pattern
> that users will access the data in, so it is a tough problem.  One
> the one hand,
>you want to keep the chunks small enough that they will stick around in the
>cache until they are finished being written/read, but you want the chunks to
>be larger so that the I/O on them is more efficient. :-/

I think we can make some reasonable guesses for netcdf-3.x access
patterns, so that we can at least ensure the common tasks are working
fast enough.

Obviously any user can flummox our optimizations by doing some odd
things we don't expect. As my old engineering professors told me: you
can make it foolproof, but you can't make it damn-foolproof.

Ed