[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 20030915:File Offset questions related to 2GB dataset sizes



Greg,

Thanks very much for your efforts!  I have a few specific comments
below.

> Attached is a patch file and a tar file of the patched source for my
> initial version of the modifications to netcdf-3.5.1-beta13 to
> support a 64-bit offset field. The modifications were made such that
> the patched library can read/write both files with both a 32-bit
> offset (compatible with current netcdf) and a 64-bit offset ("new"
> format).
> 
> The "new" format starts with the magic string "CDF2" instead of
> "CDF1".  On file create, it is specified by passing the flag
> NC_64BIT_OFFSET in the mode field of the nc_create call.

That's an excellent way to support backward compatibility.  I
especially like that creating CDF2 files is not the default, and our
best practices would recommend that it not be used except for files
that are intended to be larger than CDF1 can handle, at least for the
near future.  That way, even software linked with a CDF2-capable
library would still mostly write portable CDF1 files unless they were
too large, in which case CDF1 software would not have been able to
access the data anyway.

> On read, the library queries the magic string and determines whether
> format '1' or '2' is present in the file. This seems to work pretty
> good in our environment, but perhaps there is a better method.

It would be best to determine whether the file is format 1 or 2 on
open rather than on read and cache the format version, but maybe
that's what you mean.  I haven't had a chance to look at the patch
yet to see how you implement this.

> I assume that the library will be compiled with
> -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE and there are some
> asserts that check this, but I may need some better error checking
> eventually.... I think it is also possible to fix the code so that 
> "new" files could be read/written on systems with 32-bit offsets if
> the offset was less thann 2^31, but I haven't looked into that.

We'll look into that, and also what autoconf/configure changes are
needed to autodetect whether large files are supported.

> If you've got any suggestions or criticisms of the code, let me
> know. I've been using it for a couple weeks and haven't noticed any
> problems, but my tests are exclusively targeted at the way we use
> netcdf.  We were limited to finite element meshes of approximately
> 44 million elements with the standard netcdf. I have created meshes
> of 150 million elements with the "new" version and can go even
> larger with some modifications to the way we use netcdf.
> 
> Thanks for the advice and steering me in this direction. It looks
> like it will have minimal impact on our software that uses netcdf;
> most of the time we will be able to just relink with new libraries
> (netcdf and our exodusII).

Thanks again for implementing this idea.  It sounds like something we
may want to consider incorporating into our next release of netCDF-3.
Meanwhile our netCDF-4 work will proceed with an interface over HDF5.

What you have implemented may also be of use in the
Argonne/Northwestern parallel netCDF effort.  I'm CC:ing some of the
developers involved with that effort, since they have recently come up
against the 2 Gb limitation and are looking for a way around it that
still maintains compatibility with the current format as closely as
possible.  They may want to get your patch independently, but I hope
we can work together to make sure we all support the same format for
CDF2 and the same best practices for when to use the NC_64BIT_OFFSET
create flag.

--Russ

_____________________________________________________________________

Russ Rew                                         UCAR Unidata Program
address@hidden          http://www.unidata.ucar.edu/staff/russ