Re: [netcdf-hdf] a question about HDF5 and large file - why so long to write one value?

NOTE: The netcdf-hdf mailing list is no longer active. The list archives are made available for historical reasons.

  • To: Ed Hartnett <ed@xxxxxxxxxxxxxxxx>
  • Subject: Re: [netcdf-hdf] a question about HDF5 and large file - why so long to write one value?
  • From: Quincey Koziol <koziol@xxxxxxxxxxxx>
  • Date: Tue, 21 Aug 2007 15:11:30 -0500
Hi Ed,

On Aug 21, 2007, at 3:00 PM, Ed Hartnett wrote:

Quincey Koziol <koziol@xxxxxxxxxxxx> writes:


        I do think it's better to force the user to give you a chunk
size.   Definitely _don't_ use a chunk size of one, the B-tree to
locate the  chunks will be insanely huge.  :-(

The user may specify a chunksize in netCDF-4. With a 1 MB chunksize,
wow, it's sure a whole lot faster! Now it takes less than a second.

Also the output file is only 4 MBs. Is that expected? I presume this
is because it does not write more than 1 MB for each of the 4
variables. Neat!

        Yes, that's what's happening. :-)

Here's the netCDF code to do chunking. (Note the nc_def_chunking call
after the nc_def_var call.)

        Ah, I didn't see that call when debugging, thanks.

       chunksize[0] = MEGABYTE/DOUBLE_SIZE;
       for (i = 0; i < NUMVARS; i++)
       {
          if (nc_def_var(ncid, var_name[i], NC_DOUBLE, NUMDIMS,
                         dimids, &varid[i])) ERR;
          if (nc_def_var_chunking(ncid, i, NULL, chunksize, NULL)) ERR;
       }
       if (nc_enddef(ncid)) ERR;
       for (i = 0; i < NUMVARS; i++)
          if (nc_put_var1_double(ncid, i, index, &pi)) ERR;

bash-3.2$ time ./tst_large

*** Testing really large files in netCDF-4/HDF5 format, quickly.
*** Testing create of simple, but large, file...ok.
*** Tests successful!

real    0m0.042s
user    0m0.014s
sys     0m0.028s
bash-3.2$ ls -l tst_large.nc
-rw-r--r-- 1 ed ustaff 4208887 2007-08-21 13:52 tst_large.nc

        However, if you are going to attempt to create a heuristic for
picking a chunk size, here's my best current thoughts on it: try to
get a chunk of a reasonable size (1MB, say) (but make certain that it
will contain at least one element, in the case of _really_ big
compound datatypes :-), then try to make the chunk as "square" as
possible (i.e. try to get the chunk size in all dimensions to be
equal).  That should give you something reasonable, at least... ;-)

Thanks!

Will that heuristic code get invoked when the application doesn't set a chunk size?

                Quincey

Attachment: smime.p7s
Description: S/MIME cryptographic signature

  • 2007 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdf-hdf archives: