Re: [thredds] How are compressed netcdf4 files handled in TDS

On Apr 25, 2011, at 3:51 PM, John Caron wrote:

> On 4/25/2011 1:46 PM, Peter Cornillon wrote:
>> 
>> On Apr 25, 2011, at 3:42 PM, John Caron wrote:
>> 
>>> On 4/25/2011 1:37 PM, Roy Mendelssohn wrote:
>>>> yes, internal compression.  All the files were made from netcdf3 files 
>>>> using NCO with the options:
>>>> 
>>>> ncks -4 -L 1
>>>> 
>>>> The results so far show a decrease in file size from 40% of original to 
>>>> 1/100 th of the original file size.   If the internally compressed data 
>>>> requests are cached differently than request to netcdf3 files, we want to 
>>>> take that into account when we do the tests, so that we do not just see 
>>>> the affect of differential cacheing.
>>>> 
>>>> When we have done tests on just local files, the reads where about  8 
>>>> times slower from a compressed file.  But Rich Signell has found that the 
>>>> combination  of serialization/bandwidth is the bottleneck, and you hardly 
>>>> notice the difference in a remote access situation.  That is what we want 
>>>> to find out, because we run on very little money and with compression as 
>>>> mentioned above our RAIDS would go a lot farther, as long the hit to the 
>>>> access time is not too great.
>>>> 
>>>> Thanks,
>>>> 
>>>> -Roy
>>> 
>>> in netcdf4/hdf5, compression is tied to the chunking. Each chunk is 
>>> individually compressed, and must be completely decompressed to retrieve 
>>> even one value from that chunk. So the trick is to make your chunks 
>>> correspond to your "common cases" of data access. If thats possible, you 
>>> should find that compressed access is faster than non-compressed access, 
>>> because IO is smaller. but it will be highly dependent on that.
>> 
>> John, is there a loss of efficiency when compressing chunks compared to 
>> compressing the entire file? I vaguely recall that for some compression 
>> algorithms, compression efficiency is a function of the volume of data 
>> compressed.
>> 
>> Peter
>> 
> 
> Hi Peter:
> 
> I think dictionary methods such as deflate get better as the file size goes 
> up, but the tradeoff here is to try to decompress only the data you actually 
> want. Decompressing very large files can be very costly.

Yes, this is why I chunk. The reason that I asked the question is that this 
might influence the chunk size that one chooses.

Peter

> 
> John
> 
> _______________________________________________
> thredds mailing list
> thredds@xxxxxxxxxxxxxxxx
> For list information or to unsubscribe,  visit: 
> http://www.unidata.ucar.edu/mailing_lists/ 

--
Peter Cornillon
  215 South Ferry Road                                     Telephone: (401) 
874-6283
   Graduate School of Oceanography                          Fax: (401) 874-6283
    University of Rhode Island                                 Internet: 
pcornillon@xxxxxxxxxxx
     Narragansett, RI 02882   USA