[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[netCDF #FNJ-424791]: Is lossy compression with zlib and least_significant_digit coming to C/F90 APIs?



At one time, we were experimenting with the zfp and fpzip
compressors from LLNL 
(https://computation.llnl.gov/projects/floating-point-compression).
I think zfp is lossy. However, we made no attempt to provide those as part
of our standard distribution.
There is no particular problem in using lossy compressors as far as I know.


> Hello Greg,
> 
> The C interface has support for dynamic compression filters starting a 
> couple/few releases ago.  See the following page for information:
> 
> * https://www.unidata.ucar.edu/software/netcdf/docs/md_filters.html
> 
> We are preparing a new fortran release which should enable this functionality 
> for Fortran users as well.
> 
> To be honest, the response from Logan Karsten is what we most typically hear 
> from our colleagues and community when I speak with them, either at 
> conferences or informally.  It is not that dissimilar from my experience in 
> other disciplines as well.  Many scientists want to keep all of the data, 
> even when that data is not significantly difference from noise.  It is a 
> divisive issue, to be sure.  I'd be interested in speaking with you further 
> about this, as you are one of the few domain scientists to take this 
> position.  Is there any chance, perhaps, you'd be interested and/or willing 
> in writing a blog post for the Unidata blog on this? If you're willing (and I 
> would really encourage you to), I'd love to coordinate that with you.  Also, 
> if you're on the netCDF mailing list, that would be another good place to 
> start a broader discussion.
> 
> So, to answer the question you've asked, we have a technical route for users 
> to adopt whatever compression they care to, as long as there is an HDF5 
> plugin available.  However, the more esoteric the plugin, the less 
> distributable the data becomes.  To have lossy compression baked in as an 
> option would require community buy-in, and community buy-in will have to 
> start with scientists such as yourself.  As a computer scientist (albeit with 
> a background in scientific research), I can attest that trying to convince 
> domain scientists that it's ok to lose some of their data is a very, very 
> hard sell.
> 
> Thank you for reaching out and sharing this; if you'd be interested in doing 
> a blog post, please feel free to reach out to me directly at address@hidden, 
> and I can coordiante with you.
> 
> Have a great day!
> 
> -Ward
> 
> > Dear Unidata,
> >
> > I found a web page about netCDF with Python that mentions what is in the
> > subject line with zlib and least_significant_digit settings in order to get
> > lossy compression.
> >
> > Is this yet available within the F90 interface?
> >
> > Personally, I see this as a **paramount** importance.  As a numerical
> > weather model community here at NCAR (and beyond), we are really suffering
> > greatly from a disk usage and storage space with people running WRF and
> > MPAS and other models and saving 32-bit floats for all our variables.  This
> > is obviously mandatory for "restart" files, but it is almost utterly a
> > massive waste for just looking at final output as a plot.
> >
> > I found a community forum (Unidata) post just 2 years ago with
> > correspondence with fellow NCAR-RAL coworker Logan Karsten that was
> > answered that the community "didn't want to lose any of their data." This
> > is true to an extent, but giving individual users the ability to chop off
> > significant bytes as an option while giving others the chance to keep "all
> > their data" would be a massive benefit to the community.
> >
> > The NCAR HPSS is basically at full state, costs millions of dollars, and
> > model users are struggling with where to store simulations of months,
> > seasons, and years.  Rather than having to code multiple steps of
> > post-processing to take advantage of this idea I found related to Python, I
> > am asking about F90, because I would gladly help spearhead the effort to
> > put lossy compression into the WRF/MPAS codes directly just to reap its
> > benefits of not building yet more codes external to the models themselves
> > and having one copy of full bytes while getting around to a post-processing
> > step.
> >
> > I just wanted to weigh in that the "community" needs a direct lossy
> > compression in netCDF as soon as possible.  If it already exists and I do
> > not know about it, then please let me know.  I would be ecstatic to learn
> > about it and begin using as rapidly as I can.  But I cannot begin to say
> > how daunting it is for a fortran scientist of 29 years to figure out Python
> > and have to learn yet more languages and code yet more post-processing
> > software and deal with intermediary files and so forth.
> >
> > Thank you for listening to an alternative viewpoint of community needs.
> >
> > Regards,
> >
> > Greg Thompson, NCAR-RAL
> >
> >
> 

=Dennis Heimbigner
  Unidata


Ticket Details
===================
Ticket ID: FNJ-424791
Department: Support netCDF
Priority: Normal
Status: Open
===================
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata 
inquiry tracking system and then made publicly available through the web.  If 
you do not want to have your interactions made available in this way, you must 
let us know in each email you send to us.