Hi Qin, > I am using NetCDF as the data environment for weather forecast operational > system, and get pretty good performance from that. > > In our operational system, we storage gridded data in 4 dimensons(X,Y,Z, > TIME) for the latest 1 month 3D data set,. > > And we usually extract subset of grid data from a huge dataset or point > time series from NetCDF files(usually 18GBytes) by interpolation. > > The data interface is by using NetCDF API to provide a network services > for remote users of Local Area Network (LAN) via SOCKET. > > The time consumed for each operation is less than 100 ms. > > I notices that more and more people(or units) switch to HDF format, > because HDF format has the ability of compressing data to save storage > size. > > But I concern about more efficiency than saving storage size. So I > want to know the performance between HDF and NetCDF especially when I > want to extract the time series data. HDF5 has some performance benefits, but you may not notice if you are just accessing data sequentially. Using netCDF-4 provides most of the same performance benefits as HDF5. NetCDF-3 classic format and 64-bit offset format files always store numeric data in "big-endian" byte order, so byte-swapping is required for both reading and writing on little-endian platforms. But HDF5 uses a "reader-makes-right" strategy, storing data in the same byte order as the writing computer uses. It also permits specifying the byte order of the reading computers. in case the data will be read more often than it is written. HDF5 supports compression and chunking, both of which can have performance benefits. Compression means there is less data to read, when the data has some redundancy that makes it compressible. Chunking is like multidimensional tiling, storing data in blocks that can make accessing data along different axes faster than if the data is stored to favor just the most rapidly varying dimension. The netCDF-4 format and the netCDF-4 classic model format can both take advantage of compression, chunking, and using whatever byte order is most efficient. If you use the netCDF-4 classic model format, your netCDF-3 programs will continue to work by just relinking to a netCDF-4 library. Some users who are concerned with performance use netCDF-4 classic-model format for this reason, because existing software (relinked) will still work with the resulting compressed/chunked files. > Could anyone give me the answer ? It has been bothering me for months > since I actually don't want to change,but I or somebody else have to > be convinced. It's hard to give a single answer, it depends on your data and how it is accessed. If you install netCDF-4 and try it out, you can measure whether the difference is worthwhile. The "nccopy" utility in the current netCDF snapshot is useful for converting data from one netCDF format variant to another, for specifying various levels of compression, and for specifying chunking parameters to match how the data will be most frequently accessed. Both HDF5 and netCDF-4 also support parallel I/O, which can improve performance for models writing large amounts of data. --Russ Russ Rew UCAR Unidata Program address@hidden http://www.unidata.ucar.edu Ticket Details =================== Ticket ID: PTE-396337 Department: Support netCDF Priority: Normal Status: Closed
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata inquiry tracking system and then made publicly available through the web. If you do not want to have your interactions made available in this way, you must let us know in each email you send to us.