Re: distribution issues - moving zlib into HDF5...

Quincey Koziol <koziol@xxxxxxxxxxxxx> writes:

Howdy Quincey!

>> I agree with Ed's concern about the netCDF library not being able to read 
>> netCDF files that are created using zlib.  Same goes for HDF5 
>> itself.  However it's done, the libraries should have zlib available.
>
>     If we want to make zlib a requirement, we should decide that for
>     certain.

By "we" in this, you mean the HDF5 team, right?

Because for netCDF-4, zlib is a requirement. We have the following
netCDF-4 requirement:

"Compression can be applied on a per-variable basis."

I think this is an important feature of netCDF-4. I don't think we
want the user confusion that would go with some installations having
zlib, others not. So I don't want to build netCDF-4 without zlib. It's
a firm requirement for netCDF-4.

For HDF5, my opinion would be that you are making it harder for the
users. With the current situation, it is possible to write a HDF5 file
on one system, and be unable to read it on another. It is possible to
write a HDF5 program that works great on one platform, but won't
compile on another. Does this not cause a lot of user confusion?

I suspect most people just install HDF5 without zlib. But per-variable
compression is a *GREAT* feature, and one that I think most
researchers would very much like to use (my opinion).

> Then, if it's a requirement, we should make the absolute best attempt we can
> to use the system's installed zlib - that's what it's there for.  Only if 
> there
> is no zlib installed on the system should we attempt to help the user install
> a copy of zlib from the _zlib_ web-site.  We should _not_ be in the "business"
> of distributing zlib.

Here I must disagree. Your current method makes it considerably harder
for the user to ensure that zlib is installed, and the HDF5 is built
with the --with-zlib option. In fact, I would speculate that the vast
majority of users will screw this part up. (I often do myself!)

It is usual in autotools to provide what cannot be found on the
system. For example, if I want to use the "strtok" function, the
autotools way would be to look for it on the host system, and, if not
found, to provide one, *not* to have my code work one way if it's
there, and another way if it's not.

Similarly, it would make sense for HDF5 to look for zlib, and, if not
found, to provide it.

The amount of extra code in your distribution is trivial. The amount
of extra work to get it working this way is zero for you, since I am
going to do it anyway, and you can just take my solution if you want
it.

In general, this also guards against versioning problems. What if the
user has a version of zlib other than 1.2.3 installed? Surely you
don't test against every possible version of zlib.

I understand your concerns about taking on some responsibility for
zlib. But that is already the case - it is already an important
component of our products. Making it harder to install does not really
help this problem. If there is a bug is zlib, we all will be hurting
- no matter how it was installed.

In fact, by taking a known, tested version of zlib inside our
products, we guard against zlib risks. If version 1.2.4 of zlib comes
out, and it sucks, we will be protected. All our users will be using
1.2.3, which they got with their tarball.

>> At first blush, Ed's solution seems to me like a good one.
>
>     It's _very much_ against all the current open source "package" 
> installation
> systems' philosophy and I don't think it's a good idea.  All of package
> installation systems allow each package in the system to list other packages 
> as
> prerequisites that must be installed on the system before the package the
> user is interested in is installed.  We just _finished_ removing zlib and
> libjpeg from the HDF4 distribution... :-)

In the world of package distribution, if zlib is required for HDF5,
that means that HDF5 will not build without zlib.

The problem is that HDF5 *will* build without zlib, yielding a HDF5
build which is not usable for netCDF-4. To fit the current package
model, you need to make the HDF5 configure script dumb enough to just
give up when zlib is not present. There there would be only one way to
build HDF5 - with zlib.

Also, package distribution, while wonderful, is not the usual way for
people to get either HDF5 or netCDF - they get these as tarballs from
our site. So we cannot rely on every user having a package manager to
hide this complexity.

I would be interested to hear why you removed zlib from HDF4. Was
there a problem?

>
>     Shipping someone else's library with our package and installing it
> on a user's system (most likely _in place of_ the system version of that
> library) has caused _lots_ of headaches for us in the past.

Installation is different from building.

I would not suggest that zlib be installed from the HDF5 build. You
can build it and use it internal to the HDF5 package, use it within
HDF5 but not expose it to anyone else. 

Then you have not installed anything on the user's system other than
HDF5.

>     Frankly, I think this is the course that netCDF-4 should take
> with HDF5 also, but that'll ultimately be up to them.

I would love to. Instead of programming autoconf and automake I would
be out on a bike ride! 

But that's just going to kill uptake of netCDF-4. Our users are Earth
scientists, not computer scientists.

I intend that users will be able to install netCDF-4 in one step, just
like netCDF-3. It makes extra work for me up front, but also saves me
hundreds or thousands of support emails later. (Recall - we don't have
a large support team here at netCDF - just me and Russ!)

In this new build method, users will not get HDF5 from you and
netCDF-4 from me, they will get the whole tarball from me, and build
it all at once. 

(There will also be provision for users who want to
use an already-installed version of HDF5 - but this will be a small
minority of users. However, for them, everything can work just as it
does now.)

It was also not my intention to install HDF5 on their machines with
netCDF-4, but just to use it internally, and end up installing
netCDF-4 only.

(This also would mean that the user would not have to provide the link
arguments "-lhdf5 -lhdf5_hl -lz -lnetcdf", just "-lnetcdf". It will
make the netcdf library files larger, but so what?)

If the HDF5 build can take on the task of building zlib when not
present that would make things a lot easier for me. I can give you the
changes that you would need for that to happen.

If not, then I will possibly have to hack every HDF5 release before
including it in the master tarball with netCDF-4. This will add some
work for me every time you do a release, and some delay in adopting
new HDF5 versions in netCDF-4.

My goal is to make life easier for the netCDF user. Current netCDF4
installation is far too complicated - it *must* be
simplified. Otherwise we will lose half our users or more before they
even get to try it.

Thanks,

Ed


-- 
Ed Hartnett  -- ed@xxxxxxxxxxxxxxxx