[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 20020510: 2 Gig limit of netCDF files



>To: address@hidden
>From: Steve Hankin <address@hidden>
>Subject: 2 Gig limit of netCDF files
>Organization: UCAR/Unidata
>Keywords: 200205102009.g4AK9Ia05392

Hi Steve,

> I was looking at the Web page "What other future work on
> netCDF is planned?"
> (http://www.unidata.ucar.edu/packages/netcdf/faq.html#plans)
> and I see that lifting the 2 gigabyte limit is listed as a
> plan for netCDF V4.

I'll have to update that web page to better reflect reality one of
these days.  The current netCDF V3 actually supports files much larger
than 2 Gbytes in several ways, summarized in a new section of the
User's Guide on "Large File Support", which will soon make its way
into new editions of the User's Guides.  I've appended a text version,
or you can read it in the Fortran90 User's Guide:

  
http://www.unidata.ucar.edu/packages/netcdf/f90/Documentation/f90-html-docs/guide9.html#2236524

> My question:  can you estimate the availability date for
> this?
> 
> If not, then might the >2 gig limit feature be made
> available earlier as a patch or a beta version?

Anything more than the current Large File Support requires a new
format for netCDF data, not just code changes.  So it cannot be done
with a patch.  That's the idea of the proposal to base netcdf4 on
HDF-5, as in a proposal I submitted last year (URL available on
request).  Until something like that ambitious effort gets funded, I
hope you can get along with the existing Large File Support and the
constraints it imposes.

--Russ

9.3 Large File Support

It is possible to write netCDF files that exceed 2 GB on platforms
that have "Large File Support" (LFS). Such files would be
platform-independent to other LFS platforms, but if you call nc_open
to access data from such a file on an older platform without LFS, you
would expect a "file too large" error.

There are important constraints on the structure of large netCDF files
that result from the 32-bit relative offsets that are part of the
netCDF file format:

* If you don't use the unlimited dimension, only one variable can
  exceed 2 Gbytes in size, but it can be as large as the underlying
  file system permits. It must be the last variable in the dataset,
  and the offset to the beginning of this variable must be less than
  about 2 Gbytes. For example, the structure of the data might be
  something like:

  netcdf bigfile1 {
     dimensions: 
        x=2000;
        y=5000;
        z=10000;
     variables:
        double x(x);         // coordinate variables
        double y(y);
        double z(z);
        double var(x, y, z); // 800 Gbytes
  }

* If you use the unlimited dimension, any number of record variables
  may exceed 2 Gbytes in size, as long as the offset of the start of
  each record variable within a record is less than about 2
  Gbytes. For example, the structure of the data in a 2.4 Tbyte file
  might be something like:

  netcdf bigfile2 {
     dimensions: 
        x=2000;
        y=5000;
        z=10;
        t=UNLIMITED;         // 1000 records, for example
     variables:
        double x(x);         // coordinate variables
        double y(y);
        double z(z);
        double t(t);
                             // 3 record variables, 2.4 Gbytes per record
        double var1(t, x, y, z);
        double var2(t, x, y, z);
        double var3(t, x, y, z);
  }