[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: question about netCDF file size limits



I'm resending this because the original reply had email addresses such
as "address@hidden" and "address@hidden"
that apparently couldn't be delivered from outside the firewall.  So
apologies if this is a repeat ...

Frank,

You wrote:

> I am working with an even higher res. version of the global eddy-resolving
> ocean model: 106 vertical levels instead of 40. This increases the size of
> a single 3D variable (4 byte) from about 1.4 Gbyte to 3.7Gbyte.
>
> For the high-res. model I have been creating netCDF files with 1 3D
> variable per file. Everything was hunky-dory for the 40 level cases. I am
> having problems with the 106 level case.
>
> I am able to write the file (1 horizontal slice at a time, with the
> horizontal dimensions the leading dimensions) without errors.  I can also
> read the data back into IDL 1 horizontal slice at a time.  However, when I
> try to slice out in different dimensions, I get an error:
>
> IDL> ncid = ncdf_open("TEMP.t.es_02a.00010701.nc")
> % Loaded DLM: NCDF.
> IDL> id = ncdf_varid(ncid,"TEMP")
> IDL> ncdf_varget,ncid,id,teq,count=[3600,1,106,1],offset=[0,1105,0,0]
> % NCDF_VARGET: Operation Failed, bad file (0) or variable (4) id ?
>                (NC_ERROR=0)
>
> IDL> ncdf_varget,ncid,id,sst,count=[3600,2400,1,1]
>
> The last statement returns a reasonable looking field.

This error may be because the version of IDL you are using isn't
compiled and linked with Large File Support.  Do you know what version
of netCDF is used in the IDL you are using and whether it can access
files >4GByte?

> I thought I remembered that the file size limit for netCDF was 4 Gbyte
> (2^32). Is it actually 2Gbyte? If it matters, I am creating a record
> dimension (time) with a length of 1.

The offsets in netCDF 3.5.x are limited to 2^31 (2 GBytes), because
the C off_t type is signed, and negative offsets have meaning in the
underlying system calls.  However, only the variable offsets are
limited to 2 GBytes, netCDF file sizes can be much larger, as
explained here:

  http://www.unidata.ucar.edu/packages/netcdf/faq.html#lfs

and

  
http://www.unidata.ucar.edu/packages/netcdf/f90/Documentation/f90-html-docs/guide9.html#2236524

but those files will still only have 32-bit offsets, so the
restrictions on their structure are fairly constraining.  However, you
should be able to write and read a single variable even larger than 4
GBytes, provided the library you use was compiled with the right flags
to support Large File access.  For the C interface, this means the
library would have to have been compiled with -D_FILE_OFFSET_BITS=64
-D_LARGEFILE_SOURCE.  I suspect IDL was compiled without these flags,
so it can't access the big file you can create.

What Dennis was referring to is an alpha release of a version of
netCDF with 64-bit offsets instead of 32-bit offsets, incorporating
changes Greg Sjaardema of Sandia Labs made and tested (at my
suggestion):

  ftp://ftp.unidata.ucar.edu/pub/netcdf/exp/netcdf-3.6.0-alpha.tar.Z

This actually changes the format of the files, so the first 4 bytes of
the file change from "CDF1" to "CDF2", so if IDL tried to access one
of these CDF2 files, it would just claim the file was not a netCDF
file.

So far only the C interface supports the changes, and I'd like to make
them available from the Fortran and C++ interfaces before announcing
the release.

The only change to the interface is that you can now create netCDF
files that use 64-bit offsets (on either 64-bit or 32-bit platforms)
by supplying the flag NC_64BIT_OFFSET in the mode field of the
nc_create call.  By default if this flag is not supplied, files will
still have 32-bit offsets.  When reading a file with the new library,
you don't have to know whether it's format is CDF1 (32-bit offsets) or
CDF2 (64-bit offsets), since the library will notice which format is
used when the file is opened and handle it appropriately.  On 32-bit
systems, reading will check that the offset does not exceed the value
representable in 32-bits on read and return an error code if it does.

Note that ultimately all size constraints will be removed with
netCDF-4 that will use the HDF5 format, but that won't be released
until 2005.

--Russ