Hi Jeorg, > I have seen some problems with netcdf caused by a large blksize when > writing a file on a lustre file system. On our systems (x86_64 based > linux system, self compiled with intel compilers 11.1.046) fstat returns > a recommended block size of 2097152, and while I haven't looked any > further, a lvl field is all zeroed (which is incorrect ;) ). > > If I change the return value of blksize (to 8192, but some larger values > work, too), the field is written back correctly. Also if I open the > netcdf file with disabled buffering: > > NCID = NCCRE(NCFILE(1:LNCFILE),NCCLOB,IER) > ! JH: This fixes the problem as well. > !NCID = NCCRE(NCFILE(1:LNCFILE),NCCLOB+nf_share,IER) > > the field is written back correctly. I have seen the error with netcdf > versions 3.6.2, 3.6.3, 4.1.1, 4.1.2-beta1, 4.1.2-beta2. For now we are > using a patched version of netcdf (as above), but would obviously be > interested in a proper fix :) > > Unfortunately I don't know the application (grib to netcdf converter), > nor too much about netcdf (I am working as "application support" for the > Australian Bureau of Meteorology, but haven't used much netcdf or grib). > And it's rather complicated to package up the application (it has > implicit dependencies on several shared data files, so it would need > some time to create a test case for you, and also I have to find out if > I can give you the data files in the first place). > > Do you have any suggestion on what to do next? Any debugging features I > could enable? Make sure your netCDF library is built without turning off assertions. The default is to leave assertion checking on, and it would only be turned off if you configured with something like CFLAGS="-DNDEBUG". The best thing would be if some assertion was violated while you were running the test you described, in which case we would be very interested in a gdb backtrace resulting from the assertion violation. We don't have access to a Lustre file system on which to reproduce and debug this problem, and haven't seen reports of it on other platforms. When you build netCDF from source, does running "make check" on such a platform (with asertion checking left on) result in any errors? Otherwise, this sounds like a serious problem, if it just returns zeros instead of crashing. We would need some way to reproduce it here on a relatively small test case. For now, I'll check to see if we can find a file system we can configure with a much larger block size to test if we can see a bug. I'll let you know if I can reproduce the problem that way. Thanks for reporting the problem. --Russ Russ Rew UCAR Unidata Program address@hidden http://www.unidata.ucar.edu Ticket Details =================== Ticket ID: RPZ-106941 Department: Support netCDF Priority: Normal Status: Closed
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata inquiry tracking system and then made publicly available through the web. If you do not want to have your interactions made available in this way, you must let us know in each email you send to us.