Re: distribution issues - moving zlib into HDF5...

NOTE: The netcdf-hdf mailing list is no longer active. The list archives are made available for historical reasons.

Having seen the arguments from the HDF side and the netCDF side, I still agree, wholeheartedly, with Ed. If we say HDF5 supports a certain kind of compression, there should be no if's about it. It has to be complete support, not 95% support.

Mike

At 12:23 PM 3/16/2006, Albert Cheng wrote:
Hi, Ed and everyone,

I would like to share some of my experience installing HDF5 on
various platforms, some classic like Linux/AIX/Sun and some one
of a kind new platforms like IBM Blue Gene, Cray XT3.

On the classics platforms, nearly all hosts of them would zlib
installed.  So, for them, it is a non-issue.  So those that don't
have zlib installed, some have a policy of no non-supported software.
Gnu Zlib is considered non-supported according to them. In those cases,
I usually just build HDF5 without zlib since their policy is no non-supported
software. (You may ask "Is HDF5 software supported?"  For them, yes, because
they can contact the HDF group for support.)

On some new platforms, things are still in flux.  Zlib compression is
a luxury for them at the moment.  If it works, great.  If not, well,
they have other higher priorities like file systems or compilers. :-)
I would have to build HDF5 without zlib in order for HDF5 users to
test their software on these new platforms.

Yes, it is true that HDF5 files with data compression produced in some
other systems will not be quite usable in the above platforms. (Actually,
one can still access parts of the file that are not compressed, just not
able to read the data of any compressed datasets.)  On the other hand,
HDF5 files without compression will be totally useable in the above
platforms.

Then there is a different dimension of the zlib due to its own success.
Since zlib is so popular now, 95% of machines I run into would have a
version of zlib installed by the system.  If HDF5 brings in its
own version, HDF5 users now have a new confusion--which version
of zlib is my application linking with--the system version or the HDF5
version?

There are also applications that already use zlib on their
own and has been linking with their system version of zlib.  Now they
want to use HDF5 library in their application and if HDF5 insists on
"inserting" their own version of Zlib, these groups of users would be
annoyed.  To further complicate the issue, if shared lib versions of
zlib is available, the user applications code will use different versions
of zlib depending on how their runtime environment such as $LD_LIBRARY_PATH.
It is confusing.  (Honestly, the shared lib, though a great tool, could be
confusing at times.)

There are valid pros and cons on each approach. I think the current HDF5 setting is a reasonable compromise for the current conditions. Since netCDF-4 must run with zlib compression, would it work for netCDF4 to be distributed with a version of zlib code? I know, it is not quite right for netCDF4, being the upper layer,
to provide the source code of a library two layers down.  But is it a possible
solution?

-Albert

At 09:47 AM 3/16/2006, you wrote:
Quincey Koziol <koziol@xxxxxxxxxxxxx> writes:

Howdy Quincey!

>> I agree with Ed's concern about the netCDF library not being able to read
>> netCDF files that are created using zlib.  Same goes for HDF5
>> itself.  However it's done, the libraries should have zlib available.
>
>     If we want to make zlib a requirement, we should decide that for
>     certain.

By "we" in this, you mean the HDF5 team, right?

Because for netCDF-4, zlib is a requirement. We have the following
netCDF-4 requirement:

"Compression can be applied on a per-variable basis."

I think this is an important feature of netCDF-4. I don't think we
want the user confusion that would go with some installations having
zlib, others not. So I don't want to build netCDF-4 without zlib. It's
a firm requirement for netCDF-4.

For HDF5, my opinion would be that you are making it harder for the
users. With the current situation, it is possible to write a HDF5 file
on one system, and be unable to read it on another. It is possible to
write a HDF5 program that works great on one platform, but won't
compile on another. Does this not cause a lot of user confusion?

I suspect most people just install HDF5 without zlib. But per-variable
compression is a *GREAT* feature, and one that I think most
researchers would very much like to use (my opinion).

> Then, if it's a requirement, we should make the absolute best attempt we can > to use the system's installed zlib - that's what it's there for. Only if there > is no zlib installed on the system should we attempt to help the user install > a copy of zlib from the _zlib_ web-site. We should _not_ be in the "business"
> of distributing zlib.

Here I must disagree. Your current method makes it considerably harder
for the user to ensure that zlib is installed, and the HDF5 is built
with the --with-zlib option. In fact, I would speculate that the vast
majority of users will screw this part up. (I often do myself!)

It is usual in autotools to provide what cannot be found on the
system. For example, if I want to use the "strtok" function, the
autotools way would be to look for it on the host system, and, if not
found, to provide one, *not* to have my code work one way if it's
there, and another way if it's not.

Similarly, it would make sense for HDF5 to look for zlib, and, if not
found, to provide it.

The amount of extra code in your distribution is trivial. The amount
of extra work to get it working this way is zero for you, since I am
going to do it anyway, and you can just take my solution if you want
it.

In general, this also guards against versioning problems. What if the
user has a version of zlib other than 1.2.3 installed? Surely you
don't test against every possible version of zlib.

I understand your concerns about taking on some responsibility for
zlib. But that is already the case - it is already an important
component of our products. Making it harder to install does not really
help this problem. If there is a bug is zlib, we all will be hurting
- no matter how it was installed.

In fact, by taking a known, tested version of zlib inside our
products, we guard against zlib risks. If version 1.2.4 of zlib comes
out, and it sucks, we will be protected. All our users will be using
1.2.3, which they got with their tarball.

>> At first blush, Ed's solution seems to me like a good one.
>
> It's _very much_ against all the current open source "package" installation
> systems' philosophy and I don't think it's a good idea.  All of package
> installation systems allow each package in the system to list other packages as
> prerequisites that must be installed on the system before the package the
> user is interested in is installed.  We just _finished_ removing zlib and
> libjpeg from the HDF4 distribution... :-)

In the world of package distribution, if zlib is required for HDF5,
that means that HDF5 will not build without zlib.

The problem is that HDF5 *will* build without zlib, yielding a HDF5
build which is not usable for netCDF-4. To fit the current package
model, you need to make the HDF5 configure script dumb enough to just
give up when zlib is not present. There there would be only one way to
build HDF5 - with zlib.

Also, package distribution, while wonderful, is not the usual way for
people to get either HDF5 or netCDF - they get these as tarballs from
our site. So we cannot rely on every user having a package manager to
hide this complexity.

I would be interested to hear why you removed zlib from HDF4. Was
there a problem?

>
>     Shipping someone else's library with our package and installing it
> on a user's system (most likely _in place of_ the system version of that
> library) has caused _lots_ of headaches for us in the past.

Installation is different from building.

I would not suggest that zlib be installed from the HDF5 build. You
can build it and use it internal to the HDF5 package, use it within
HDF5 but not expose it to anyone else.

Then you have not installed anything on the user's system other than
HDF5.

>     Frankly, I think this is the course that netCDF-4 should take
> with HDF5 also, but that'll ultimately be up to them.

I would love to. Instead of programming autoconf and automake I would
be out on a bike ride!

But that's just going to kill uptake of netCDF-4. Our users are Earth
scientists, not computer scientists.

I intend that users will be able to install netCDF-4 in one step, just
like netCDF-3. It makes extra work for me up front, but also saves me
hundreds or thousands of support emails later. (Recall - we don't have
a large support team here at netCDF - just me and Russ!)

In this new build method, users will not get HDF5 from you and
netCDF-4 from me, they will get the whole tarball from me, and build
it all at once.

(There will also be provision for users who want to
use an already-installed version of HDF5 - but this will be a small
minority of users. However, for them, everything can work just as it
does now.)

It was also not my intention to install HDF5 on their machines with
netCDF-4, but just to use it internally, and end up installing
netCDF-4 only.

(This also would mean that the user would not have to provide the link
arguments "-lhdf5 -lhdf5_hl -lz -lnetcdf", just "-lnetcdf". It will
make the netcdf library files larger, but so what?)

If the HDF5 build can take on the task of building zlib when not
present that would make things a lot easier for me. I can give you the
changes that you would need for that to happen.

If not, then I will possibly have to hack every HDF5 release before
including it in the master tarball with netCDF-4. This will add some
work for me every time you do a release, and some delay in adopting
new HDF5 versions in netCDF-4.

My goal is to make life easier for the netCDF user. Current netCDF4
installation is far too complicated - it *must* be
simplified. Otherwise we will lose half our users or more before they
even get to try it.

Thanks,

Ed


--
Ed Hartnett  -- ed@xxxxxxxxxxxxxxxx

--
Mike Folk, Scientific Data Tech (HDF)   http://hdf.ncsa.uiuc.edu
NCSA/U of Illinois at Urbana-Champaign  Voice: 217-244-0647
1205 W. Clark St, Urbana IL 61801 Fax: 217-244-5521


  • 2006 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdf-hdf archives: