Due to the current gap in continued funding from the U.S. National Science Foundation (NSF), the NSF Unidata Program Center has temporarily paused most operations. See NSF Unidata Pause in Most Operations for details.
Hi Constantine, > My profiling results show that NetCDF Classic is very slow if the > following output scheme is used: > > // case 1 > for var in variables > define var > write var > endfor Rob Latham is exactly right about case 1, including the recommendation to use the "underbar underbar" function nc__endef() to reserve extra space in the header to optimize for this case if you want netCDF-3 files. > or even if something like this is done (assuming that all the > variables are defined already and they depend on an unlimited > dimension): > > // case 2 > for var in variables > append var > endfor > > It seems to me that case 1 is slow because NetCDF (Classic) keeps the > file header as small as possible (Section 4 of the NetCDF User's Guide > is perfectly clear about this). Case 2, on the other hand, seems to be > slow because (please correct me if I'm wrong) variables are stored > contiguously. (In other words: if variables A and B are defined in > this order, then appending X bytes to A requires moving B over by X > bytes.) No, that's not the case. The data for record variables (those that use an unlimited dimension) is interlaced by the unlimited dimension, so that appending data for the nth record of all record variables is efficient, especially if you append the variables in the same order in which they were defined. In this case appending data is just sequential I/O, except that the number of records in the header must also be updated once when a new record is first written. You may be seeing what you think is an inefficiency because all the fill values for a record are written the first time the file is extended to contain that record, unless you have "no-fill mode" set. Hence all the record values are typically written twice, once to fill the record with fill values of the appropriate type for each variable, and a second time when a data value overwrites the associated fill value. If you know you will always write all the values in a record, you can set no-fill mode before writing, to eliminate the overhead of writing fill values. > My question is: > > How does NetCDF-4 compare to NetCDF Classic in this regard? Would > switching to it improve write performance? (This is two questions, > really: I'm interested in cases 1 and 2 separately.) For case 1, netCDF-4 supports efficient addition of new variables, with no necessity to move data around to make more space in the "header", because there is no single contiguous header that stores all the metadata. Instead, it's distributed throughout the file. So either use nc__enddef() to reserve extra space in the header of netCDF-3 files, or use netCDF-4 if the software to access the data has been upgraded to netCDF-4. For case 2, netCDF-4 is no more efficient than netCDF-3, but it's more flexible, because it supports multiple unlimited dimensions. --Russ
netcdfgroup
archives: