netCDF library

Dave Allured dave.allured at noaa.gov
Tue Aug 1 15:03:32 MDT 2006


Nilesh,

Since Netcdf format is a simple matrix of fixed width cells, there is no 
simple way to save space by not storing zero values.

I think you are saying that a standard scientific file format is 
important to you.  Since you have had such good luck with gridded data 
in Netcdf, I suggest that you stay with it.  Consider these options to 
reduce archival file size:

1.  Keep your current Netcdf format, but store your files gzip'ed.  Make 
uncompressing a standard part of opening the file.  Many application 
languages will allow you to call the shell to gunzip and delete a 
temporary file, so you can automate this.  gunzip is rather fast, as I 
recall.  As you stated, your file size is reduced by 99%.

2.  Netcdf 16-bit packed format.  Reduce file size by 50%.  You get 16 
bits for your combined precision and dynamic range.

3.  Netcdf 8-bit packed format.  Reduce file size by 75%.  You get 8 
bits for your combined precision and dynamic range.

It is possible to write support for a custom, non-Netcdf or 
contorted-Netcdf format to efficiently hold sparse data and exclude 
zeros.  This would be very costly in terms of programming time and lack 
of compatibility.  I recommend against this, and I say that as one who 
has done it the wrong way a few times.   ;-)

--Dave Allured
CIRES Climate Diagnostics Center (CDC)
NOAA/ESRL, Physical Sciences Division (PSD)

Nilesh Lahoti wrote:
> Dear Sir,
>
> We are air quality modeling group at Rutgers University, New Jersey. 
> We are processing emissions and running simulation models for our 
> study of long range transport of Ozone and Particulate matter for our 
> research and for regulatory work.
>
> The netCDF library works great for us. However, I came across with one 
> particular issue of netCDF and would like to discuss if there are any 
> solution to this problem or something that can do to make its 
> performance better. When we process emissions for our three 
> dimensional grid of size (172 x 172 x 22) for 24 hours time period 
> having hourly data, the file size is around 1 gigabyte(GB). There are 
> several cells that have zero values and therefore the floating point 
> value for pollutants in netCDF file has zero values. When we use gzip 
> utility on unix to compress this files, the file size become almost 10 
> MB which saves us 99% of disk space. Now the question arise that if 
> the netCDF is most compress scientific format, than is it possible to 
> suppress this zero values of the floating point variable or is there 
> any switch that can be used to handle zero values and reduce file size 
> by any chance.
>
> Looking forward to hear from you.
>
> from,
>
> Nilesh Lahoti
> Research Specialist
> CCL, EOHSI,
> Rutgers University
> Email: nilesh at fidelio.rutgers.edu
> Phone: 732-445-1416
>
> =============================================================================== 
>
> To unsubscribe netcdfgroup, visit:
> http://www.unidata.ucar.edu/mailing-list-delete-form.html
> =============================================================================== 
>
>

==============================================================================
To unsubscribe netcdfgroup, visit:
http://www.unidata.ucar.edu/mailing-list-delete-form.html
==============================================================================



More information about the netcdfgroup mailing list