Hi Jin, > > 1. If we used Netcdf4 format of file system, is it store larger > > amount of data than Netcdf3? > It is not easy to answer that question. I can think of > two factors that will affect the answer. > > First, if fill is not used in either case, then netcdf-3 files > should be smaller than a corresponding > netcdf-4 file. If fill is enabled, then it may be the case that > the netcdf-4 file is smaller because, if memory serves, it can > avoid actuallyallocating space for the fill data until the data is > actually read. > > Second, The record dimension (UNLIMITED) can affect the space used. > In netcdf-3, if many variables have a first unlimited dimension, and > the number of records has to grow for only a single variable, then > significant space can be allocated for the other variables as well. > The netcdf-4 format can avoid this. Right, and those are important for large datasets. But for small datasets or large datasets represented in many small files, HDF5 will generally require more space due to larger fixed overhead per variable, and use of B-trees internally for indexing chunks. There are some pathological examples, e.g. a large number of record variables, each with a small number of values per record, where netCDF-3 can store the data very compactly compared with HDF5. > >2. If we used Netcdf4, is it faster than Netcdf3 to write and read? > I am not sure of the answer. Perhaps other people here can comment. [Russ?]. It depends. The use of chunking and compression can make accessing subsets of multidimensional data in netCDF-4 significantly faster than netCDF-3 in some cases. However, netCDF-4 access can be slower if the chunk shapes and sizes aren't appropriate for common data access patterns, especially if large chunks need to be uncompressed to access small amounts of data, or if chunks must be repeatedly compressed or uncompressed due to inadequate chunk cache. If you are just reading data in the same order in which it was written and data is not compressed, the two formats are approximately equivalent. NetCDF-4 access is faster when there are a very large number of variables or attributes, as it indexes those for O(log N) access, whereas netCDF-3 just locates variables and attributes by name in a file with a simple O(N) search, where N is the number of attributes or variables. For parallel I/O, netCDF-3 has a performance advantage, due to the simpler data layout. For adding new metadata to an existing file, netCDF-4 is superior, because it never has to move data to make space for large amounts of new metadata in a file header, because metadata is appended --Russ > >3. Does the Netcdf Java library used Netcdf C libarary to read file of > >Netcdf4 > > file format? Is it faster way to read the file? > For reading, the Java library does not actually need to use the C library. > However Java is likely to be slower to some degree than the C library. > I should note that in the newest java library, netcdf-4 file writing > is possible and it uses the c-library to do that writing. > > > > > =Dennis Heimbigner > Unidata > Russ Rew UCAR Unidata Program address@hidden http://www.unidata.ucar.edu Ticket Details =================== Ticket ID: KSJ-559079 Department: Support netCDF Priority: Normal Status: Closed
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata inquiry tracking system and then made publicly available through the web. If you do not want to have your interactions made available in this way, you must let us know in each email you send to us.