[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: netCDF library



More options for compression.

Dave Allured wrote:
Nilesh,

Since Netcdf format is a simple matrix of fixed width cells, there is no simple way to save space by not storing zero values.

I think you are saying that a standard scientific file format is important to you. Since you have had such good luck with gridded data in Netcdf, I suggest that you stay with it. Consider these options to reduce archival file size:

1. Keep your current Netcdf format, but store your files gzip'ed. Make uncompressing a standard part of opening the file. Many application languages will allow you to call the shell to gunzip and delete a temporary file, so you can automate this. gunzip is rather fast, as I recall. As you stated, your file size is reduced by 99%.

The Netcdf-Java 2.2 library looks for  ".Z", ".zip", ".gzip", ".gz", or ".bz2" 
file extensions, and if found, it will uncompress/unzip, then read from the uncompressed file. It caches the unzipped file, and 
can clean up the cache area automatically, deleting older files to keep cache size within a specified limit. The next time the 
file is opened, it first looks to see if the uncompressed version exists in the cache.

This works in read-only applications like servers. Writing usually is done once 
and we havent tried to optimize that.



2. Netcdf 16-bit packed format. Reduce file size by 50%. You get 16 bits for your combined precision and dynamic range.

3. Netcdf 8-bit packed format. Reduce file size by 75%. You get 8 bits for your combined precision and dynamic range.


If you use the standard attributes "scale_factor" and "add_offset", the 
Netcdf-Java 2.2 library will optionally handle the packing in a transparent way, ie promote the 
variable to float or double from byte or short, and apply the scale and offset. Again, this is only 
on the reading side.


These features are available only to Java applications.