Questions About netCDF Large File Support



What is Large File Support?

Large File Support (LFS) refers to operating system and C library facilities to support files larger than 2 GiB. On many 32-bit platforms the default size of a file offset is still a 4-byte signed integer, which limits the maximum size of a file to 2 GiB. Using LFS interfaces and the 64-bit file offset type, the maximum size of a file may be as large as 263 bytes, or 8 EiB. For many current platforms, large file macros or appropriate compiler flags have to be set to build a library with support for large files. This is handled automatically in netCDF 3.6.

More information about Large File Support is available from Adding Large File Support to the Single UNIX Specification.


What does Large File Support have to do with netCDF?

When the netCDF format was created in 1988, 4-byte fields were reserved for file offsets, specifying where the data for each variable started relative to the beginning of the file or the start of a record boundary.

This first netCDF format variant, the only format supported in versions 3.5.1 and earlier, is referred to as the netCDF classic format. The 32-bit file offset in the classic format limits the total sizes of all but the last non-record variables in a file to less than 2 GiB, with a similar limitation for the data within each record for record variables. The netCDF classic format is also identified as version 1 or CDF1 in reference to the format label at the start of a file.

With netCDF 3.6, a second variant of netCDF format is now supported in addition to the classic format. The new variant is referred to as the 64-bit offset format, version 2, or CDF2. The primary difference from the classic format is the use of 64-bit file offsets instead of 32-bit offsets, but it also supports larger variable and record sizes.


Do I have to know which netCDF file format variant is used in order to access or modify a netCDF file?

No, version 3.6 of the netCDF library detects which variant of the format is used for each file when it is opened for reading or writing, so it is not necessary to know which variant of the format is used. The version of the format will be preserved by the library on writing. If you want to modify a classic format file to use the 64-bit offset format so you can make it much larger, you will have to create a new file and copy the data to it.


Will future versions of the netCDF library continue to support accessing files in the classic format?

Yes, the 3.6 library and all planned future versions of the library will continue to support reading and writing files using the classic (32-bit offset) format as well as the new 64-bit offset format. There is no need to convert existing archives from the classic to the 64-bit offset format. Even netCDF-4, which will introduce a third variant of the netCDF format based on HDF5, will continue to support accessing classic format netCDF files as well as 64-bit offset netCDF files.


Should I start using the new 64-bit offset format for all my netCDF files?

No, we discourage users from making use of the new format unless they need it for very large files. It may be some time until third-party software that uses the netCDF library is upgraded to 3.6 or later versions that support the new large file facilities, so we advise continuing to use the classic netCDF format for data that doesn't require huge file offsets. The library makes this recommendation easy to follow, since the default for file creation is the classic format.


How can I tell if a netCDF file uses the classic format or new 64-bit offset format?

The short answer is that under most circumstances, you should not care, if you use version 3.6.0 or later of the netCDF library. But the difference is indicated in the first four bytes of the file, which are 'C', 'D', 'F', '\001' for the classic netCDF format and 'C', 'D', 'F', '\002' for the new 64-bit offset format. On a Unix system, one way to display the first four bytes of a file, say foo.nc, is to run the following command:

      od -An -c -N4 foo.nc
which will output
      C   D   F 001
or
      C   D   F 002
depending on whether foo.nc is a classic or 64-bit offset netCDF file, respectively.


What happens if I create a 64-bit offset format netCDF file and try to open it with an older netCDF application that hasn't been upgraded to netCDF 3.6?

The application will indicate an error trying to open the file and present an error message equivalent to "not a netCDF file". This is why it's a good idea not to create 64-bit offset netCDF files until you actually need them.


Can I create 64-bit offset files on 32-bit platforms?

Yes, by specifying the appropriate file creation flag you can create 64-bit offset netCDF files the same way on 32-bit platforms as on 64-bit platforms.


How do I create a 64-bit offset netCDF file from C, Fortran-77, Fortran-90, or C++?

With netCDF version 3.6.0 or later, use the NC_64BIT_OFFSET flag when you call nc_create(), as in:

  err = nc_create("foo.nc",
                  NC_NOCLOBBER | NC_64BIT_OFFSET,
                  &ncid);

In Fortran-77, use the NF_64BIT_OFFSET flag when you call nf_create(), as in:

  iret = nf_create('foo.nc',
                   IOR(NF_NOCLOBBER,NF_64BIT_OFFSET),
                   ncid)

In Fortran-90, use the NF90_64BIT_OFFSET flag when you call nf_create(), as in:

  iret = nf90_create(path="foo.nc",
                     cmode=or(nf90_clobber,nf90_64bit_offset),
                     ncid=ncFileID)

In C++, use the Offset64Bits enum in the NcFile constructor, as in:

  NcFile nc("foo.nc",
            FileMode=NcFile::New,
            FileFormat=NcFile::Offset64Bits);


How do I create a 64-bit offset netCDF file using the ncgen utility?

A new flag, '-v', has been added to ncgen to specify the file format variant. By default or if '-v 1' or '-v classic' is specified, the generated file will be in netCDF classic format. If '-v 2' or '-v 64-bit-offset' is specified, the generated file will use the new 64-bit offset format. To permit creating very large files quickly, another new ncgen flag, '-x', has been added to specify use of nofill mode when generating the netCDF file.


Have all netCDF size limits been eliminated?

No, there are still some limits on sizes of netCDF objects, even with the new 64-bit offset format. Each fixed-size variable and the data for one record's worth of a record variable are limited in size to a little less that 4 GiB, which is twice the size limit in versions earlier than netCDF 3.6.

The maximum number of records remains 232-1.


Why are variables still limited in size?

While most platforms support a 64-bit file offset, many platforms only support a 32-bit size for allocated memory blocks, array sizes, and memory pointers. In C developers jargon, these platforms have a 64-bit off_t type for file offsets, but a 32-bit size_t type for size of arrays. Changing netCDF to assume the 64-bit size_t available on 64-bit platforms would make it suitable only for 64-bit platforms.

We expect to be able to remove remaining variable size constraints with netCDF-4 using the HDF5 format, but that won't be released until mid-2005.


Why do I get an error message when I try to create a file larger than 2 GiB with the new library?

There are several possible reasons why creating a large file can fail that are not related to the netCDF library:

  • User quotas may prevent you from creating large files. On a Unix system, you can use the "ulimit" command to report limitations such as the file-size writing limit.
  • The file system in which you are writing may not be configured to allow large files. On a Unix system, you can test this with a command such as
      dd if=/dev/zero bs=1000000 count=3000 of=./test
    
    which should write a 3 GB file named "test" in the current directory.
  • There is insufficient disk space for the file you are trying to write.

If you get the netCDF library error "One or more variable sizes violate format constraints", you are trying to define a variable larger than permitted for the file format variant. This error typically occurs when leaving "define mode" rather than when defining a variable. The error cannot necessarily be determined when a variable is first defined, because the last fixed-size variable is permitted to be larger than other fixed-size variables when there are no record variables. Similarly, the last record variable may be larger than other record variables. This means that subsequently adding a small variable to an existing file may be invalid, because it makes what was previously the last variable now in violation of the format size constraints. For details on the format size constraints, see the Users Guide sections NetCDF Classic Format Limitations and NetCDF 64-bit Offset Format Limitations.

If you get the netCDF library error "Invalid dimension size", you are exceeding the size limit of netCDF dimensions, which must be less than 2,147,483,644 for classic files with no large file support and otherwise less than 4,294,967,292.


Do I need to use special compiler flags to compile and link my applications that use netCDF with Large File Support?

No, except that 32-bit applications should link with a 32-bit version of the library and 64-bit applications should link with a 64-bit library, similarly to use of other libraries that can support either a 32-bit or 64-bit model of computation.

Is it possible to create a "classic" format netCDF file with netCDF version 3.6.0 that cannot be accessed by applications compiled and linked against earlier versions of the library?

No, classic files created with the new library should be compatible with all older applications, both for reading and writing, with one minor exception. The exception is due to a correction of a netCDF bug that prevented creating records larger than 4 GiB in classic netCDF files with software linked against versions 3.5.1 and earlier. This limitation in total record size was not a limitation of the classic format, but an unnecessary restriction due to the use of too small a type in an internal data structure in the library. If you want to always make sure your classic netCDF files are readable by older applications, make sure you don't exceed 4 GiBytes for the total size of a record's worth of data. (All records are the same size, computed by adding the size for a record's worth of each record variable, with suitable padding to make sure each record begins on a byte boundary divisible by 4.)