[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[netCDF #PTE-396337]: Services for Weather Forecast Operational System. HDF or NetCDF when considering the data access efficiency?



Hi Qin,

> I am using NetCDF as the data environment for weather forecast operational
> system, and get pretty good performance from that.
> 
> In our operational system, we storage gridded data in 4 dimensons(X,Y,Z,
> TIME) for the latest 1 month 3D data set,.
> 
> And we usually extract subset of grid data from a huge dataset or point
> time series from NetCDF files(usually 18GBytes) by interpolation.
> 
> The data interface is by using NetCDF API to provide a network services
> for remote users of Local Area Network (LAN) via SOCKET.
> 
> The time consumed for each operation is less than 100 ms.
> 
> I notices that more and more people(or units) switch to HDF format,
> because HDF format has the ability of compressing data to save storage
> size.
> 
> But I concern about more efficiency than saving storage size. So I
> want to know the performance between HDF and NetCDF especially when I
> want to extract the time series data.

HDF5 has some performance benefits, but you may not notice if you are
just accessing data sequentially.  Using netCDF-4 provides most of the
same performance benefits as HDF5.

NetCDF-3 classic format and 64-bit offset format files always store
numeric data in "big-endian" byte order, so byte-swapping is required
for both reading and writing on little-endian platforms.  But HDF5
uses a "reader-makes-right" strategy, storing data in the same byte
order as the writing computer uses.  It also permits specifying the
byte order of the reading computers. in case the data will be read
more often than it is written.

HDF5 supports compression and chunking, both of which can have
performance benefits.  Compression means there is less data to read,
when the data has some redundancy that makes it compressible.
Chunking is like multidimensional tiling, storing data in blocks that
can make accessing data along different axes faster than if the data
is stored to favor just the most rapidly varying dimension.

The netCDF-4 format and the netCDF-4 classic model format can both
take advantage of compression, chunking, and using whatever byte order
is most efficient.  If you use the netCDF-4 classic model format, your
netCDF-3 programs will continue to work by just relinking to a
netCDF-4 library.  Some users who are concerned with performance use
netCDF-4 classic-model format for this reason, because existing
software (relinked) will still work with the resulting
compressed/chunked files.

> Could anyone give me the answer ? It has been bothering me for months
> since I actually don't want to change,but I or somebody else have to
> be convinced.

It's hard to give a single answer, it depends on your data and how it
is accessed.  If you install netCDF-4 and try it out, you can measure
whether the difference is worthwhile.  The "nccopy" utility in the
current netCDF snapshot is useful for converting data from one netCDF
format variant to another, for specifying various levels of
compression, and for specifying chunking parameters to match how the
data will be most frequently accessed.

Both HDF5 and netCDF-4 also support parallel I/O, which can improve
performance for models writing large amounts of data.

--Russ

Russ Rew                                         UCAR Unidata Program
address@hidden                      http://www.unidata.ucar.edu



Ticket Details
===================
Ticket ID: PTE-396337
Department: Support netCDF
Priority: Normal
Status: Closed