On Aug 20, 2007, at 5:02 PM, Ed Hartnett wrote:
Quincey Koziol <koziol@xxxxxxxxxxxx> writes:The problem is in your computation of the chunk size for the dataset, in libsrc4/nc4hdf.c, around lines 1059-1084. The current computations end up with a chunk of size equal to the dimension size (2147483644/4 in the code below), i.e. a single 4GB chunk for the entire dataset. This is not going to work well, since HDF5 always reads an entire chunk into memory, updates it and then writes the entire chunk back out to disk. ;-) That section of code looks like it has the beginning of some heuristics for automatically tuning the chunk size, but it would probably be better to let the application set a particular chunk size, if possible.Ah ha! Well, that's not going to work! What would be a good chunksize for this (admittedly weird) test case: writing one value at a time for a huge array. Would a chunksize of one be crazy? Or the right size?
I do think it's better to force the user to give you a chunk size. Definitely _don't_ use a chunk size of one, the B-tree to locate the chunks will be insanely huge. :-(
However, if you are going to attempt to create a heuristic for picking a chunk size, here's my best current thoughts on it: try to get a chunk of a reasonable size (1MB, say) (but make certain that it will contain at least one element, in the case of _really_ big compound datatypes :-), then try to make the chunk as "square" as possible (i.e. try to get the chunk size in all dimensions to be equal). That should give you something reasonable, at least... ;-)
Description: S/MIME cryptographic signature