netcdf-hdf mailing list is no longer active. The list archives are made available for historical reasons.
"John Urbanic" <urbanic@xxxxxxx> writes: > NetCDF gurus: > > > > After successfully prototyping our parallel netcdf code, we have rolled it > into a large community app (MFIX) and are now getting sporadic "NetCDF: > HDF error" errors during runs. This, unsurprisingly, coincides with > failure to write portions of related variable fields. > > > > These happen during put_vars(), and occurs across all PEs at that random > time, and also only one associated PE's subsequent close() as well. In > one of the smallest cases, we are writing ~100, 600K files. This problem > will strike every 15 or 20 files, and will vary both in the file and the > fields that are affected. With larger files it occurs more frequently - > almost every other file with the 300MB files we need for production. > Again, it occurs in different fields and files within runs and from run to > run. We are using netcdf 4.1.3 and hdf 1.8.7. > > > > My question is, how can I possibly drill further into this problem? I am > at a loss as to how to proceed. It would be nice to force HDF to be more > specific, or course, but all debugging suggestions most welcome. If you build netCDF with --enable-logging, then put the following in your code: nc_set_log_level(3); (There is also a fortran version.) You will then get a ton of output. Trying changing the "3" to a "1" to get less output, or to a 5 to get more. If this doesn't work, fire up the parallel debugger and see where HDF5 and netCDF are failing to get along... Good luck, Ed -- Ed Hartnett -- ed@xxxxxxxxxxxxxxxx