Re: netCDF library

Nilesh -- As Dave Allured pointed out, if you want to use a standard netCDF format, your options are limited.

We were facing a similar dilemma in the need to efficiently store large amounts of climate data and we opted to create netCDF variant that kept the data compressed and used an index to uncompress small blocks of the data, as requested. We have been using it most successfully for over a decade and most of the Regional Climate Centers' data is stored as 'compressed netCDF'. We generally see a 90-97% reduction in file sizes with benchmarked access being equal or slightly faster than standard netCDF files (especially over a network or off slower storage devices).

If you do go this route, you have to realize that you are on your own and you will have to uncompress any files you want to send to other researchers.

Given your particular situation, you may be more interested in looking at some other options: 1. HDF5 is supposed to have a compressed format and an interface similar to netCDF.

2. If you are using linux, you may be able to use a compressed file system in loopback mode to keep the netcdfs compressed but access them using the standard netCDF libraries. This is effectively what my library modifications do on a per-file basis vs. per-filesystem. This is probably most effective in a read-only situation.

Just some thoughts.

--Bill Noon
Northeast Regional Climate Center
Cornell University

On Aug 1, 2006, at 12:29 PM, Nilesh Lahoti wrote:

Dear Sir,

We are air quality modeling group at Rutgers University, New Jersey. We are processing emissions and running simulation models for our study of long range transport of Ozone and Particulate matter for our research and for regulatory work.

The netCDF library works great for us. However, I came across with one particular issue of netCDF and would like to discuss if there are any solution to this problem or something that can do to make its performance better. When we process emissions for our three dimensional grid of size (172 x 172 x 22) for 24 hours time period having hourly data, the file size is around 1 gigabyte(GB). There are several cells that have zero values and therefore the floating point value for pollutants in netCDF file has zero values. When we use gzip utility on unix to compress this files, the file size become almost 10 MB which saves us 99% of disk space. Now the question arise that if the netCDF is most compress scientific format, than is it possible to suppress this zero values of the floating point variable or is there any switch that can be used to handle zero values and reduce file size by any chance.

Looking forward to hear from you.


Nilesh Lahoti
Research Specialist
Rutgers University
Email: nilesh@xxxxxxxxxxxxxxxxxxx
Phone: 732-445-1416

======================================================================= =======
To unsubscribe netcdfgroup, visit:
======================================================================= =======

To unsubscribe netcdfgroup, visit: