Re: best practice of using parallel hdf5

NOTE: The netcdf-hdf mailing list is no longer active. The list archives are made available for historical reasons.

Hi Quincey,

Thanks for the information. It's very helpful. I guess I will have to
create a new file for every record instead.



Quincey Koziol wrote:

> On Nov 27, 2006, at 8:24 PM, Eh Tan wrote:
>> In the event of system crash, how can I prevent the file corruption  and
>> how can I minimize the loss of data?
>> Should I flush the buffer after each output, or close the dataset  after
>> each output, or save each record in a new datagroup, or save each 
>> record
>> in a new file? How much of data loss would I expect in the worst
>> scenario (e.g., the system crashes during disk I/O)?
>     Generally, it's a good idea to call H5Fflush (or the equivalent 
> netCDF API call) after each major "phase" of writing to the file.  
> This will flush metadata changes out to the disk.  However, it is 
> still possible that incremental changes may be made to the file as 
> metadata is evicted from the HDF5 internal caches that would create a 
> "corrupt" file if the rest of the changes don't make it into the 
> file.  Flushing too often may create additional I/O though, so you'll 
> need to find a balance that's appropriate for your application.

Eh Tan
Staff Scientist
Computational Infrastructure for Geodynamics
2750 E. Washington Blvd. Suite 210
Pasadena, CA 91107
(626) 395-1693

To unsubscribe netcdf-hdf, visit:

  • 2006 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdf-hdf archives: