Re: distribution issues - moving zlib into HDF5...

> Having seen the arguments from the HDF side and the netCDF side, I still 
> agree, wholeheartedly, with Ed.  If we say HDF5 supports a certain kind of 
> compression, there should be no if's about it.  It has to be complete 
> support, not 95% support.

    I'm not terribly opposed to requiring zlib in order to build the HDF5
distribution (at least by default), it's a fairly pervasive technology by this
point.  The main point I was trying to make was about not shipping zlib with
the HDF5 (or netCDF-4) distribution and building/installing our own version.


> Mike
> At 12:23 PM 3/16/2006, Albert Cheng wrote:
> >Hi, Ed and everyone,
> >
> >I would like to share some of my experience installing HDF5 on
> >various platforms, some classic like Linux/AIX/Sun and some one
> >of a kind new platforms like IBM Blue Gene, Cray XT3.
> >
> >On the classics platforms, nearly all hosts of them would zlib
> >installed.  So, for them, it is a non-issue.  So those that don't
> >have zlib installed, some have a policy of no non-supported software.
> >Gnu Zlib is considered non-supported according to them. In those cases,
> >I usually just build HDF5 without zlib since their policy is no non-supported
> >software. (You may ask "Is HDF5 software supported?"  For them, yes, because
> >they can contact the HDF group for support.)
> >
> >On some new platforms, things are still in flux.  Zlib compression is
> >a luxury for them at the moment.  If it works, great.  If not, well,
> >they have other higher priorities like file systems or compilers. :-)
> >I would have to build HDF5 without zlib in order for HDF5 users to
> >test their software on these new platforms.
> >
> >Yes, it is true that HDF5 files with data compression produced in some
> >other systems will not be quite usable in the above platforms. (Actually,
> >one can still access parts of the file that are not compressed, just not
> >able to read the data of any compressed datasets.)  On the other hand,
> >HDF5 files without compression will be totally useable in the above
> >platforms.
> >
> >Then there is a different dimension of the zlib due to its own success.
> >Since zlib is so popular now, 95% of machines I run into would have a
> >version of zlib installed by the system.  If HDF5 brings in its
> >own version, HDF5 users now have a new confusion--which version
> >of zlib is my application linking with--the system version or the HDF5
> >version?
> >
> >There are also applications that already use zlib on their
> >own and has been linking with their system version of zlib.  Now they
> >want to use HDF5 library in their application and if HDF5 insists on
> >"inserting" their own version of Zlib, these groups of users would be
> >annoyed.  To further complicate the issue, if shared lib versions of
> >zlib is available, the user applications code will use different versions
> >of zlib depending on how their runtime environment such as $LD_LIBRARY_PATH.
> >It is confusing.  (Honestly, the shared lib, though a great tool, could be
> >confusing at times.)
> >
> >There are valid pros and cons on each approach. I think the current HDF5 
> >setting
> >is a reasonable compromise for the current conditions.  Since netCDF-4 
> >must run
> >with zlib compression, would it work for netCDF4 to be distributed with a 
> >version
> >of zlib code?  I know, it is not quite right for netCDF4, being the upper 
> >layer,
> >to provide the source code of a library two layers down.  But is it a 
> >possible
> >solution?
> >
> >-Albert
> >
> >At 09:47 AM 3/16/2006, you wrote:
> >>Quincey Koziol <koziol@xxxxxxxxxxxxx> writes:
> >>
> >>Howdy Quincey!
> >>
> >> >> I agree with Ed's concern about the netCDF library not being able to 
> >> read
> >> >> netCDF files that are created using zlib.  Same goes for HDF5
> >> >> itself.  However it's done, the libraries should have zlib available.
> >> >
> >> >     If we want to make zlib a requirement, we should decide that for
> >> >     certain.
> >>
> >>By "we" in this, you mean the HDF5 team, right?
> >>
> >>Because for netCDF-4, zlib is a requirement. We have the following
> >>netCDF-4 requirement:
> >>
> >>"Compression can be applied on a per-variable basis."
> >>
> >>I think this is an important feature of netCDF-4. I don't think we
> >>want the user confusion that would go with some installations having
> >>zlib, others not. So I don't want to build netCDF-4 without zlib. It's
> >>a firm requirement for netCDF-4.
> >>
> >>For HDF5, my opinion would be that you are making it harder for the
> >>users. With the current situation, it is possible to write a HDF5 file
> >>on one system, and be unable to read it on another. It is possible to
> >>write a HDF5 program that works great on one platform, but won't
> >>compile on another. Does this not cause a lot of user confusion?
> >>
> >>I suspect most people just install HDF5 without zlib. But per-variable
> >>compression is a *GREAT* feature, and one that I think most
> >>researchers would very much like to use (my opinion).
> >>
> >> > Then, if it's a requirement, we should make the absolute best attempt 
> >> we can
> >> > to use the system's installed zlib - that's what it's there for.  Only 
> >> if there
> >> > is no zlib installed on the system should we attempt to help the user 
> >> install
> >> > a copy of zlib from the _zlib_ web-site.  We should _not_ be in the 
> >> "business"
> >> > of distributing zlib.
> >>
> >>Here I must disagree. Your current method makes it considerably harder
> >>for the user to ensure that zlib is installed, and the HDF5 is built
> >>with the --with-zlib option. In fact, I would speculate that the vast
> >>majority of users will screw this part up. (I often do myself!)
> >>
> >>It is usual in autotools to provide what cannot be found on the
> >>system. For example, if I want to use the "strtok" function, the
> >>autotools way would be to look for it on the host system, and, if not
> >>found, to provide one, *not* to have my code work one way if it's
> >>there, and another way if it's not.
> >>
> >>Similarly, it would make sense for HDF5 to look for zlib, and, if not
> >>found, to provide it.
> >>
> >>The amount of extra code in your distribution is trivial. The amount
> >>of extra work to get it working this way is zero for you, since I am
> >>going to do it anyway, and you can just take my solution if you want
> >>it.
> >>
> >>In general, this also guards against versioning problems. What if the
> >>user has a version of zlib other than 1.2.3 installed? Surely you
> >>don't test against every possible version of zlib.
> >>
> >>I understand your concerns about taking on some responsibility for
> >>zlib. But that is already the case - it is already an important
> >>component of our products. Making it harder to install does not really
> >>help this problem. If there is a bug is zlib, we all will be hurting
> >>- no matter how it was installed.
> >>
> >>In fact, by taking a known, tested version of zlib inside our
> >>products, we guard against zlib risks. If version 1.2.4 of zlib comes
> >>out, and it sucks, we will be protected. All our users will be using
> >>1.2.3, which they got with their tarball.
> >>
> >> >> At first blush, Ed's solution seems to me like a good one.
> >> >
> >> >     It's _very much_ against all the current open source "package" 
> >> installation
> >> > systems' philosophy and I don't think it's a good idea.  All of package
> >> > installation systems allow each package in the system to list other 
> >> packages as
> >> > prerequisites that must be installed on the system before the package the
> >> > user is interested in is installed.  We just _finished_ removing zlib and
> >> > libjpeg from the HDF4 distribution... :-)
> >>
> >>In the world of package distribution, if zlib is required for HDF5,
> >>that means that HDF5 will not build without zlib.
> >>
> >>The problem is that HDF5 *will* build without zlib, yielding a HDF5
> >>build which is not usable for netCDF-4. To fit the current package
> >>model, you need to make the HDF5 configure script dumb enough to just
> >>give up when zlib is not present. There there would be only one way to
> >>build HDF5 - with zlib.
> >>
> >>Also, package distribution, while wonderful, is not the usual way for
> >>people to get either HDF5 or netCDF - they get these as tarballs from
> >>our site. So we cannot rely on every user having a package manager to
> >>hide this complexity.
> >>
> >>I would be interested to hear why you removed zlib from HDF4. Was
> >>there a problem?
> >>
> >> >
> >> >     Shipping someone else's library with our package and installing it
> >> > on a user's system (most likely _in place of_ the system version of that
> >> > library) has caused _lots_ of headaches for us in the past.
> >>
> >>Installation is different from building.
> >>
> >>I would not suggest that zlib be installed from the HDF5 build. You
> >>can build it and use it internal to the HDF5 package, use it within
> >>HDF5 but not expose it to anyone else.
> >>
> >>Then you have not installed anything on the user's system other than
> >>HDF5.
> >>
> >> >     Frankly, I think this is the course that netCDF-4 should take
> >> > with HDF5 also, but that'll ultimately be up to them.
> >>
> >>I would love to. Instead of programming autoconf and automake I would
> >>be out on a bike ride!
> >>
> >>But that's just going to kill uptake of netCDF-4. Our users are Earth
> >>scientists, not computer scientists.
> >>
> >>I intend that users will be able to install netCDF-4 in one step, just
> >>like netCDF-3. It makes extra work for me up front, but also saves me
> >>hundreds or thousands of support emails later. (Recall - we don't have
> >>a large support team here at netCDF - just me and Russ!)
> >>
> >>In this new build method, users will not get HDF5 from you and
> >>netCDF-4 from me, they will get the whole tarball from me, and build
> >>it all at once.
> >>
> >>(There will also be provision for users who want to
> >>use an already-installed version of HDF5 - but this will be a small
> >>minority of users. However, for them, everything can work just as it
> >>does now.)
> >>
> >>It was also not my intention to install HDF5 on their machines with
> >>netCDF-4, but just to use it internally, and end up installing
> >>netCDF-4 only.
> >>
> >>(This also would mean that the user would not have to provide the link
> >>arguments "-lhdf5 -lhdf5_hl -lz -lnetcdf", just "-lnetcdf". It will
> >>make the netcdf library files larger, but so what?)
> >>
> >>If the HDF5 build can take on the task of building zlib when not
> >>present that would make things a lot easier for me. I can give you the
> >>changes that you would need for that to happen.
> >>
> >>If not, then I will possibly have to hack every HDF5 release before
> >>including it in the master tarball with netCDF-4. This will add some
> >>work for me every time you do a release, and some delay in adopting
> >>new HDF5 versions in netCDF-4.
> >>
> >>My goal is to make life easier for the netCDF user. Current netCDF4
> >>installation is far too complicated - it *must* be
> >>simplified. Otherwise we will lose half our users or more before they
> >>even get to try it.
> >>
> >>Thanks,
> >>
> >>Ed
> >>
> >>
> >>--
> >>Ed Hartnett  -- ed@xxxxxxxxxxxxxxxx
> >
> >--
> >Mike Folk, Scientific Data Tech (HDF)
> >NCSA/U of Illinois at Urbana-Champaign  Voice: 217-244-0647
> >1205 W. Clark St, Urbana IL 61801    Fax: 217-244-5521