netCDF library
Dave Allured
dave.allured at noaa.gov
Tue Aug 1 15:03:32 MDT 2006
Nilesh,
Since Netcdf format is a simple matrix of fixed width cells, there is no
simple way to save space by not storing zero values.
I think you are saying that a standard scientific file format is
important to you. Since you have had such good luck with gridded data
in Netcdf, I suggest that you stay with it. Consider these options to
reduce archival file size:
1. Keep your current Netcdf format, but store your files gzip'ed. Make
uncompressing a standard part of opening the file. Many application
languages will allow you to call the shell to gunzip and delete a
temporary file, so you can automate this. gunzip is rather fast, as I
recall. As you stated, your file size is reduced by 99%.
2. Netcdf 16-bit packed format. Reduce file size by 50%. You get 16
bits for your combined precision and dynamic range.
3. Netcdf 8-bit packed format. Reduce file size by 75%. You get 8
bits for your combined precision and dynamic range.
It is possible to write support for a custom, non-Netcdf or
contorted-Netcdf format to efficiently hold sparse data and exclude
zeros. This would be very costly in terms of programming time and lack
of compatibility. I recommend against this, and I say that as one who
has done it the wrong way a few times. ;-)
--Dave Allured
CIRES Climate Diagnostics Center (CDC)
NOAA/ESRL, Physical Sciences Division (PSD)
Nilesh Lahoti wrote:
> Dear Sir,
>
> We are air quality modeling group at Rutgers University, New Jersey.
> We are processing emissions and running simulation models for our
> study of long range transport of Ozone and Particulate matter for our
> research and for regulatory work.
>
> The netCDF library works great for us. However, I came across with one
> particular issue of netCDF and would like to discuss if there are any
> solution to this problem or something that can do to make its
> performance better. When we process emissions for our three
> dimensional grid of size (172 x 172 x 22) for 24 hours time period
> having hourly data, the file size is around 1 gigabyte(GB). There are
> several cells that have zero values and therefore the floating point
> value for pollutants in netCDF file has zero values. When we use gzip
> utility on unix to compress this files, the file size become almost 10
> MB which saves us 99% of disk space. Now the question arise that if
> the netCDF is most compress scientific format, than is it possible to
> suppress this zero values of the floating point variable or is there
> any switch that can be used to handle zero values and reduce file size
> by any chance.
>
> Looking forward to hear from you.
>
> from,
>
> Nilesh Lahoti
> Research Specialist
> CCL, EOHSI,
> Rutgers University
> Email: nilesh at fidelio.rutgers.edu
> Phone: 732-445-1416
>
> ===============================================================================
>
> To unsubscribe netcdfgroup, visit:
> http://www.unidata.ucar.edu/mailing-list-delete-form.html
> ===============================================================================
>
>
==============================================================================
To unsubscribe netcdfgroup, visit:
http://www.unidata.ucar.edu/mailing-list-delete-form.html
==============================================================================
More information about the netcdfgroup
mailing list