Hi James, > We recently began working on a transition from netcdf 3.6.2 to 4.1.1. > > The process was trouble free and things seem to be working, but we have > been surprised to find the HDF variant producing extremely large files > relative to the old netcdf native form. Our measurement files are already > enormous, and further growth would be deadly. > > Has anyone else encountered this? There is a larger fixed-size overhead for metadata (names and properties of variables, dimensions, and attributes) in the HDF5-based netCDF-4 format, but in our experience, it's not significant for files with lots of data and only a moderate amount of metadata. And use of compression can make equivalent netCDF-4 files significantly smaller than netCDF-3 classic format files. As an example we use in our netCDF training workshop, a small netCDF classic format file with only one dimension of size 2 and one variable that uses that dimension is very small using netCDF classic or 64-bit offset formats: 88 test.nc1 # classic format 92 test.nc2 # 64-bit -offset format 5072 test.nc3 # netCDF-4 format 5108 test.nc4 # netCDF-4 -classic model format However, if you change the dimension size to 10000, the sizes are much closer: 40080 test.nc1 # classic format 40084 test.nc2 # 64-bit -offset format 45064 test.nc3 # netCDF-4 format 45101 test.nc4 # netCDF-4 -classic model format And if you apply level-1 compression to the variable in the netCDF-4 format, the netCDF-4 file is significantly smaller for this (artificial) data: 40080 test.nc1 # classic format 40084 test.nc2 # 64-bit -offset format 21055 test.nc3 # netCDF-4 format 21092 test.nc4 # netCDF-4 -classic model format Finally, if you apply the shuffle filter along with compression for this test file, the result is significantly better compression: 40080 test.nc1 # classic format 40084 test.nc2 # 64-bit -offset format 7777 test.nc3 # netCDF-4 format 7814 test.nc4 # netCDF-4 -classic model format It's easy to run little experiments like this with the "nccopy" utility in the latest netCDF snapshot release (soon to be in version 4.1.2), as you can specify conversions and compression on the command line: http://www.unidata.ucar.edu/netcdf/workshops/2010/utilities/NccopyExamples.html This is a very articficial example and it's unlikely you'll get results as good with your real data, but experimenting with nccopy's compression options on some real data could determine what you can expect in using netCDF 4 for your data. --Russ Russ Rew UCAR Unidata Program address@hidden http://www.unidata.ucar.edu Ticket Details =================== Ticket ID: AIQ-275071 Department: Support netCDF Priority: Normal Status: Closed
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata inquiry tracking system and then made publicly available through the web. If you do not want to have your interactions made available in this way, you must let us know in each email you send to us.