Fwd: netCDF library
Valliappa Lakshmanan
valliappa.lakshmanan at noaa.gov
Wed Aug 2 08:45:23 MDT 2006
Nilesh and others,
The problem with the gzip and chunking solutions proposed is that
ultimately,
you will still be hitting the disk to read the large uncompressed file. We
discovered
that things simply didn't work for real-time operations since we're reading
and writing
6000x3000x20 grids created every 5 minutes.
A much better solution that has worked well for us with 3D radar data
is to write the data out as sparse grids. For example, instead of writing a
variable
as:
float emissions( x, y, z )
with three dimensions x,y,z
have the following layout:
dimension i UNLIMITED
float emissions( i )
int x (i)
int y (i)
int z (i)
and store only non-zero values and the location of those values in the
netcdf file.
While this is not a "natural grid", this retains the platform-independence
and easy readability
of netcdf files with a simple-enough compression that large grids actually
work.
Lak
p.s. This solution works only for the case you described where there are
lots of
zero/missing values. It won't work for large grids of fields like
temperature which have
values through out the domain.
p.s. 2: If your data are highly correlated in space, you can also store a
repeated count,
making this essentially run-length encoded data.
On 8/1/06, Dave Allured <dave.allured at noaa.gov> wrote:
>
> Nilesh,
>
> Since Netcdf format is a simple matrix of fixed width cells, there is no
> simple way to save space by not storing zero values.
>
> I think you are saying that a standard scientific file format is
> important to you. Since you have had such good luck with gridded data
> in Netcdf, I suggest that you stay with it. Consider these options to
> reduce archival file size:
>
> 1. Keep your current Netcdf format, but store your files gzip'ed. Make
> uncompressing a standard part of opening the file. Many application
> languages will allow you to call the shell to gunzip and delete a
> temporary file, so you can automate this. gunzip is rather fast, as I
> recall. As you stated, your file size is reduced by 99%.
>
> 2. Netcdf 16-bit packed format. Reduce file size by 50%. You get 16
> bits for your combined precision and dynamic range.
>
> 3. Netcdf 8-bit packed format. Reduce file size by 75%. You get 8
> bits for your combined precision and dynamic range.
>
> It is possible to write support for a custom, non-Netcdf or
> contorted-Netcdf format to efficiently hold sparse data and exclude
> zeros. This would be very costly in terms of programming time and lack
> of compatibility. I recommend against this, and I say that as one who
> has done it the wrong way a few times. ;-)
>
> --Dave Allured
> CIRES Climate Diagnostics Center (CDC)
> NOAA/ESRL, Physical Sciences Division (PSD)
>
> Nilesh Lahoti wrote:
> > Dear Sir,
> >
> > We are air quality modeling group at Rutgers University, New Jersey.
> > We are processing emissions and running simulation models for our
> > study of long range transport of Ozone and Particulate matter for our
> > research and for regulatory work.
> >
> > The netCDF library works great for us. However, I came across with one
> > particular issue of netCDF and would like to discuss if there are any
> > solution to this problem or something that can do to make its
> > performance better. When we process emissions for our three
> > dimensional grid of size (172 x 172 x 22) for 24 hours time period
> > having hourly data, the file size is around 1 gigabyte(GB). There are
> > several cells that have zero values and therefore the floating point
> > value for pollutants in netCDF file has zero values. When we use gzip
> > utility on unix to compress this files, the file size become almost 10
> > MB which saves us 99% of disk space. Now the question arise that if
> > the netCDF is most compress scientific format, than is it possible to
> > suppress this zero values of the floating point variable or is there
> > any switch that can be used to handle zero values and reduce file size
> > by any chance.
> >
> > Looking forward to hear from you.
> >
> > from,
> >
> > Nilesh Lahoti
> > Research Specialist
> > CCL, EOHSI,
> > Rutgers University
> > Email: nilesh at fidelio.rutgers.edu
> > Phone: 732-445-1416
> >
> >
> ==============================================================================
> >
> > To unsubscribe netcdfgroup, visit:
> > http://www.unidata.ucar.edu/mailing-list-delete-form.html
> >
> ==============================================================================
> >
> >
>
> ==============================================================================
>
> To unsubscribe netcdfgroup, visit:
> http://www.unidata.ucar.edu/mailing-list-delete-form.html
> ==============================================================================
>
>
>
More information about the netcdfgroup
mailing list