Re: [netcdf-hdf] a question about HDF5 and large file - why so long to write one value?

Quincey Koziol <koziol@xxxxxxxxxxxx> writes:


>       I do think it's better to force the user to give you a chunk
> size.   Definitely _don't_ use a chunk size of one, the B-tree to
> locate the  chunks will be insanely huge.  :-(

The user may specify a chunksize in netCDF-4. With a 1 MB chunksize,
wow, it's sure a whole lot faster! Now it takes less than a second.

Also the output file is only 4 MBs. Is that expected? I presume this
is because it does not write more than 1 MB for each of the 4
variables. Neat!

Here's the netCDF code to do chunking. (Note the nc_def_chunking call
after the nc_def_var call.)

       chunksize[0] = MEGABYTE/DOUBLE_SIZE;
       for (i = 0; i < NUMVARS; i++)
       {
          if (nc_def_var(ncid, var_name[i], NC_DOUBLE, NUMDIMS, 
                         dimids, &varid[i])) ERR;
          if (nc_def_var_chunking(ncid, i, NULL, chunksize, NULL)) ERR;
       }
       if (nc_enddef(ncid)) ERR;
       for (i = 0; i < NUMVARS; i++)
          if (nc_put_var1_double(ncid, i, index, &pi)) ERR;

bash-3.2$ time ./tst_large

*** Testing really large files in netCDF-4/HDF5 format, quickly.
*** Testing create of simple, but large, file...ok.
*** Tests successful!

real    0m0.042s
user    0m0.014s
sys     0m0.028s
bash-3.2$ ls -l tst_large.nc
-rw-r--r-- 1 ed ustaff 4208887 2007-08-21 13:52 tst_large.nc

>       However, if you are going to attempt to create a heuristic for
> picking a chunk size, here's my best current thoughts on it: try to
> get a chunk of a reasonable size (1MB, say) (but make certain that it
> will contain at least one element, in the case of _really_ big
> compound datatypes :-), then try to make the chunk as "square" as
> possible (i.e. try to get the chunk size in all dimensions to be
> equal).  That should give you something reasonable, at least... ;-)

Thanks!

Ed

-- 
Ed Hartnett  -- ed@xxxxxxxxxxxxxxxx