[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[netCDF #HHS-891616]: NetCDF4 Internal Compression Performance



Hi Logan,

We are aware of the overhead , although there is unfortunately little we can do 
about it.  I assume you are working with either the netCDF-C or netCDF-Fortran 
library, using the netCDF4 data model/file format.  In this case, the file I/O 
is handled by the downstream HDF5 library.  We are looking at other compression 
alternatives, but the tradeoff between compression and I/O speed is fairly 
immutable (as I'm sure you're aware).  In terms of converting the data to 
floating point from integer, or adopting any sort of lossy compression; these 
would benefit netCDF certainly, but we have received a lot of pushback from our 
community when the topic has been broached in the past.  The objection, as I 
recall, is that they didn't want to lose any of their data.  

I'm sorry I can't provide a more immediately useful solution; we're hoping to 
have alternative compression techniques available in the future that will 
provide a better speed/storage tradeoff.  

-Ward

> To whom it may concern,
> 
> I am currently working on refactoring I/O code for the National Water
> Model, which is being ran operationally at NCEP to support hydrologic
> prediction for the National Weather Service. Part of this refactoring
> involves converting both gridded, and point values from floating point, to
> integer values via the scale_factor/add_offset attributes. I am also using
> internal NetCDF compression when writing output out. The scale of this
> modeling system permits output on 1 km grids across conus, along with a
> couple variables on 250 meter grids. The point output is across 2.7 million
> river reaches. I have been testing the model with and without internal
> compression. In my tests, I have seen that the compression adds a
> significant amount of time to I/O. In some cases, up to 25%, with a minimum
> of 13% additional I/O time. While for smaller model projects, or research
> projects, this may be a value that can be neglected, it does become an
> issue in an operational environment. I am wondering if this is an outcome
> of internal compression the Unidata NetCDF team is aware of? In some work I
> did years ago, we converted output from floating point to integer, and
> wrote the output directly to a gzipped file from Fortran via a C wrapper.
> In that case, the I/O time was significantly reduced, even though we were
> compressing the data.
> 
> Thanks for any input you may have.
> 
> LK
> 
> --
> Logan Karsten
> Associate Scientist III
> Research Applications Laboratory
> National Center for Atmospheric Research
> 303-497-2693
> 
> 


Ticket Details
===================
Ticket ID: HHS-891616
Department: Support netCDF
Priority: Normal
Status: Closed
===================
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata 
inquiry tracking system and then made publicly available through the web.  If 
you do not want to have your interactions made available in this way, you must 
let us know in each email you send to us.