Hi Si, > We are meeting another severe netcdf performance when preparing for > our new Yellowstone machine. > As you know, the new Glade file system will have 2M or 4M block size > and it looks like this > really raises some performance issues. > > Here is a typical case our CESM group writes/uses: > status = nf90_inq_varid(ncid, "time", varid) > status = nf90_get_var(ncid, varid, data) > > The nf90_get_var command is very very slow and inefficient when "time" > is "unlimited". > In my test case, "time" is an unlimited variable with 248 values: > time = UNLIMITED ; // (248 currently) > > I used "strace" on the executable and noticed that "lseek" and "read" > are called 248+ times(one for each dimension), > because it is an unlimited variable. If it is not an unlimited > variable, "lseek" and "read" will be called only twice. > This is not a big deal when the block size of the file system is > small(4k e.g.), > but this takes a significant long time when the block size is large(2M > or 4M). > > My question is: Is that possible to modify the nf90_get_var or related > program in netcdf so you do not need to > do so many "lseek" and "read" to improve the performance? My feeling > is maybe you can do most of the job in memory? Just my guess. > You must have better ideas and suggestions. This is the same performance problem reported in an earlier support ticket, for which support responses are here: http://www.unidata.ucar.edu/support/help/MailArchives/netcdf/msg10905.html http://www.unidata.ucar.edu/support/help/MailArchives/netcdf/msg10908.html and the NCO forum thread here: http://sourceforge.net/projects/nco/forums/forum/9829/topic/4898620 There is an additional workaround now, if you're willing to test the netCDF snapshot release, soon to be version 4.2.1, which I list as number 6 below: 1. Data writer: Don't use the unlimited dimension if not needed. 2. Data writer: Make sure the record size for each variable is at least as big as large as the disk block size where it will be read. 3. Data Reader: Convert record-oriented data to use only fixed size dimensions before using it in processing. There's an NCO operator for this ncks --fix_rec_dmn in.nc out.nc or you can use nccopy (version 4.2 or current snapshot for -w option) nccopy -u in.nc out.nc or faster, if there is enough memory for the output file: nccopy -w -u in.nc out.nc 4. If processing multiple record variables, read input a record at a time instead of a variable at a time, processing all the record variables after each record has been read. This has already been done for nccopy and for some NCO operators. 5. Convert record-oriented netCDF-3 data to netCDF-4 classic model files (or regular netCDF-4 files), using chunk sizes that are less than or equal to a small multiple of the disk block size. The nccopy utility may be used for this purpose 6. Use the new "NC_DISKLESS" option when opening the record-oriented file, assuming you have enough memory to hold the file. This will read the whole file into memory on open, after which reads will be fast and will not depend on the disk block size. This is in the current snapshot release and will be in version 4.2.1. Note that we haven't incorporated the NC_DISKLESS functionality into the Fortran APIs yet, but this should be relatively easy after we have released the 4.2.1 C netCDF library. > I created a test case for you, which can be reached at > /glade/home/siliu/DAV/NCTest. > > The directory includes: > 1) Two netcdf data file: unlimitedD.nc and limitedD.nc > The only difference is that time is an unlimited dimensional variable > in the first one and is a limited dimensional one in the second. > ncdump -h will show it. > > 2) unlimited.f90 and limited.f90 > Two Fortran programs that call netcdf function nf90_get_var and print > out the time cost. > > 3) unlimited.exe and limited.exe > The executable files built from unlimited.f90 and limited.f90 > I am trying to get rpath set for you so you can run the executable > directly. > In case you can not run them directly, the environment settings and > compiler commands can be found inside the file runme-4.2. > > 4) limited_512k.trace unlimited_512k.trace > The tracking results with "strace" command. It shows "lseek" and > "read" are called for each time step for the unlimited case. > > Please let me know if you have any questions I may answer. The main question is the use case in which the performance problem occurs. If the reading program currently accesses record variables one at a time but needs to access all or most of the record variables, it may be worth it to modify the reading program to access data a record at a time instead of a variable at a time, if you want to stick to using netCDF-3 format. If the file is written once but read many times, with variables extracted from it one at a time, then it makes sense to convert the file to not use record variables for the convenience of readers. This just involves netCDF-3 format also. If the file is written once but potentially read many times, sometimes accessing all the data in a single record and other times accessing all the data in a single variable, then it makes sense to convert the file to netCDF-4 classic chunked, which can support either kind of access efficiently. This requires that the reading data be linked against the netCDF-4 library so it can handle chunked data. If the file is written once and single variables are read from it once or only a few times, it may make sense to modify the writing program to write the data without using an unlimited dimension, or as a netCDF-4 classic format file. I don't think there's any reason to duplicate the timings you have observed, as the cause is clear, and the possible workarounds I know about are listed above. There is no netCDF-3 library modification that can cure this problem, it is inherent in the netCDF-3 format and the trade-offs involved in implementing an unlimited dimension versus fixed-size dimensions. The netCDF-4 format uses a different set of engineering trade-offs involving a more complicated format and API that provides better performance in some cases and worse in others. I plan to write a user document on the performance issues involved in this problem with similar recommendations to above for data providers, developers of data access software, and users puzzled about why some access patterns are slow. --Russ Russ Rew UCAR Unidata Program address@hidden http://www.unidata.ucar.edu Ticket Details =================== Ticket ID: VUP-836503 Department: Support netCDF Priority: Normal Status: Closed
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata inquiry tracking system and then made publicly available through the web. If you do not want to have your interactions made available in this way, you must let us know in each email you send to us.