[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[netCDF #FNJ-424791]: Is lossy compression with zlib and least_significant_digit coming to C/F90 APIs?



Hello Greg,

The C interface has support for dynamic compression filters starting a 
couple/few releases ago.  See the following page for information:

* https://www.unidata.ucar.edu/software/netcdf/docs/md_filters.html

We are preparing a new fortran release which should enable this functionality 
for Fortran users as well.  

To be honest, the response from Logan Karsten is what we most typically hear 
from our colleagues and community when I speak with them, either at conferences 
or informally.  It is not that dissimilar from my experience in other 
disciplines as well.  Many scientists want to keep all of the data, even when 
that data is not significantly difference from noise.  It is a divisive issue, 
to be sure.  I'd be interested in speaking with you further about this, as you 
are one of the few domain scientists to take this position.  Is there any 
chance, perhaps, you'd be interested and/or willing in writing a blog post for 
the Unidata blog on this? If you're willing (and I would really encourage you 
to), I'd love to coordinate that with you.  Also, if you're on the netCDF 
mailing list, that would be another good place to start a broader discussion.  

So, to answer the question you've asked, we have a technical route for users to 
adopt whatever compression they care to, as long as there is an HDF5 plugin 
available.  However, the more esoteric the plugin, the less distributable the 
data becomes.  To have lossy compression baked in as an option would require 
community buy-in, and community buy-in will have to start with scientists such 
as yourself.  As a computer scientist (albeit with a background in scientific 
research), I can attest that trying to convince domain scientists that it's ok 
to lose some of their data is a very, very hard sell.

Thank you for reaching out and sharing this; if you'd be interested in doing a 
blog post, please feel free to reach out to me directly at address@hidden, and 
I can coordiante with you.  

Have a great day!

-Ward

> Dear Unidata,
> 
> I found a web page about netCDF with Python that mentions what is in the
> subject line with zlib and least_significant_digit settings in order to get
> lossy compression.
> 
> Is this yet available within the F90 interface?
> 
> Personally, I see this as a **paramount** importance.  As a numerical
> weather model community here at NCAR (and beyond), we are really suffering
> greatly from a disk usage and storage space with people running WRF and
> MPAS and other models and saving 32-bit floats for all our variables.  This
> is obviously mandatory for "restart" files, but it is almost utterly a
> massive waste for just looking at final output as a plot.
> 
> I found a community forum (Unidata) post just 2 years ago with
> correspondence with fellow NCAR-RAL coworker Logan Karsten that was
> answered that the community "didn't want to lose any of their data." This
> is true to an extent, but giving individual users the ability to chop off
> significant bytes as an option while giving others the chance to keep "all
> their data" would be a massive benefit to the community.
> 
> The NCAR HPSS is basically at full state, costs millions of dollars, and
> model users are struggling with where to store simulations of months,
> seasons, and years.  Rather than having to code multiple steps of
> post-processing to take advantage of this idea I found related to Python, I
> am asking about F90, because I would gladly help spearhead the effort to
> put lossy compression into the WRF/MPAS codes directly just to reap its
> benefits of not building yet more codes external to the models themselves
> and having one copy of full bytes while getting around to a post-processing
> step.
> 
> I just wanted to weigh in that the "community" needs a direct lossy
> compression in netCDF as soon as possible.  If it already exists and I do
> not know about it, then please let me know.  I would be ecstatic to learn
> about it and begin using as rapidly as I can.  But I cannot begin to say
> how daunting it is for a fortran scientist of 29 years to figure out Python
> and have to learn yet more languages and code yet more post-processing
> software and deal with intermediary files and so forth.
> 
> Thank you for listening to an alternative viewpoint of community needs.
> 
> Regards,
> 
> Greg Thompson, NCAR-RAL
> 
> 


Ticket Details
===================
Ticket ID: FNJ-424791
Department: Support netCDF
Priority: Normal
Status: Closed
===================
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata 
inquiry tracking system and then made publicly available through the web.  If 
you do not want to have your interactions made available in this way, you must 
let us know in each email you send to us.