[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Large file support (LFS)



Hi Rick,

> Howdy, I'd like to make certain an application here in SCD that uses
> netCDF can properly take advantage of LFS and netCDF's support of it.
> 
> I know the steps I have to follow with regard to compilation/etc. for my
> application to use the LFS API.

The situation with LFS is about to change with version 3.6.0-beta, and
there will be an announcement about it soon, but here's a summary.

The original netCDF format (known by it's file "magic number" as CDF1)
has a 32-bit size and a 32-bit file offset for each variable, which
limits the size of files even when compiled with LFS support.
Basically, the size of the last fixed-size variable or the last
record-size variable is unconstrained, as long as the offset from the
beginning of the file is less than 2^31.  There are examples of how
you can exploit this to write terabyte netCDF files here:

 
http://my.unidata.ucar.edu/content/software/netcdf/f90/documentation/f90-html-docs/guide9.html#2236524

but the limitations of permitting only a single large fixed-size
variable or multiple large record variables are fairly constraining.

With 3.6.0, we're introducing the first new format for netCDF access
since 1988, changing the 32-bit file offsets to 64-bit offsets with
some code contributed by Greg Sjaardema of Sandia.  The new file
"magic number" will be 'C' 'D' 'F' '\002', and the library will still
read and write CDF1 files by default.  However if a user creates a
file with the NC_64BIT_OFFSET flag (or equivalent for the Fortran or
C++, or Java interfaces), the new format with 64-bit offsets will be
used.  Assuming the library is compiled with LFS support, this
eliminates many of the constraints for creating large netCDF files.

The remaining rules are:

 - The size of each fixed-size variable except the last fixed size
   variable has to be strictly less than 2**32 = 4294967296 bytes.
   The last fixed-size variable can be any size supported by the file
   system, e.g. terabytes.

 - The size of one record's worth of data for any record variable
   except for the last record variable in each record must also be
   strictly less than 2**32 bytes.  The size of one record's worth of
   data for the last record variable is unconstrained except by the
   file system.

So in particular, you'll be able to have as many 4 Gbyte fixed-size
variables as you want, with the last variable even larger.

> Could you please pass along some hints/pointers as to how the netCDF 
> goes about dealing with large files (stat()'ing, open()'ing, etc)?  I'll
> start poking around to see what I find, but figure (at least one of) you
> could send me to the exact spot to look, thanks!

I don't think there is any difference in the way large files are
handled in terms of stat()'ing, open()'ing, etc. since this all "just
works" for large files when compiled with LFS support.

I'm currently working on some needed additions to netCDF 3.6.0 to
return errors in case the rules on sizes of variables are violated.
In case you're wondering, the size of a variable is a size_t, which is
still 32 bits on most systems even when compiled with LFS, and that's
why there are still constraints on maximum variable sizes.

In a year when netCDF-4 is available, we may even be able to eliminate
these constraints with the HDF5-based file format that will be
supported in netCDF-4 (with full backward compatibility for the
current formats, of course ...).

_____________________________________________________________________

Russ Rew                                         UCAR Unidata Program
address@hidden          http://www.unidata.ucar.edu/staff/russ