Re: [thredds] How are compressed netcdf4 files handled in TDS

On 4/25/2011 1:37 PM, Roy Mendelssohn wrote:
yes, internal compression.  All the files were made from netcdf3 files using 
NCO with the options:

ncks -4 -L 1

The results so far show a decrease in file size from 40% of original to 1/100 
th of the original file size.   If the internally compressed data requests are 
cached differently than request to netcdf3 files, we want to take that into 
account when we do the tests, so that we do not just see the affect of 
differential cacheing.

When we have done tests on just local files, the reads where about  8 times 
slower from a compressed file.  But Rich Signell has found that the combination 
 of serialization/bandwidth is the bottleneck, and you hardly notice the 
difference in a remote access situation.  That is what we want to find out, 
because we run on very little money and with compression as mentioned above our 
RAIDS would go a lot farther, as long the hit to the access time is not too 
great.

Thanks,

-Roy

in netcdf4/hdf5, compression is tied to the chunking. Each chunk is individually compressed, and must be completely decompressed to retrieve even one value from that chunk. So the trick is to make your chunks correspond to your "common cases" of data access. If thats possible, you should find that compressed access is faster than non-compressed access, because IO is smaller. but it will be highly dependent on that.




On Apr 25, 2011, at 12:28 PM, John Caron wrote:

On 4/25/2011 11:30 AM, Roy Mendelssohn wrote:
Hi All:

We just converted one or our larger datasets  (larger in terms of the number of 
files that are aggregated) into compressed netCDF4. There is a substantial 
savings in storage, but we wanted to do a series of tests to see what hit in 
access time we would take, if any, wsince many of our users will make requests 
involving a lot of time periods.

In order to design these tests properly, we need to get a better understanding 
of how the TDS handles netcdf4 datasets that have compression.  Are the 
decompressed data cached, or more accurately cached any differently from data 
read from an uncompressed series of netcdf3 files, or since the decompression 
is handled automatically on the read, is everything handled the same after that?

We would also be interested other peoples experience with compressed netcdf4 
files in TDS, in particular when the extracts are not synoptic, but cover a lot 
of time periods in a region, or make a lot of very small calls to a large 
number of time periods  - such as we need to do for tagging data.

Thanks for any info,

-Roy
Hi Roy:

I assume you mean internally compressed, not externally (like zipping up a 
file) ?

_______________________________________________
thredds mailing list
thredds@xxxxxxxxxxxxxxxx
For list information or to unsubscribe,  visit: 
http://www.unidata.ucar.edu/mailing_lists/
**********************
"The contents of this message do not reflect any position of the U.S. Government or 
NOAA."
**********************
Roy Mendelssohn
Supervisory Operations Research Analyst
NOAA/NMFS
Environmental Research Division
Southwest Fisheries Science Center
1352 Lighthouse Avenue
Pacific Grove, CA 93950-2097

e-mail: Roy.Mendelssohn@xxxxxxxx (Note new e-mail address)
voice: (831)-648-9029
fax: (831)-648-8440
www: http://www.pfeg.noaa.gov/

"Old age and treachery will overcome youth and skill."
"From those who have been given much, much will be expected"