[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 2GB limit and netcdf




> I have a quick user question:
> Are the netCDF datasets limited by the 2GB limit?
> I've looked at the code and the lseek()s, etc. use "long"s which
> are 64 bit (on Crays) and can handle >2GB;
> however, the meta-data in the header
> is limited to 4 bytes ... which would probably limit the dataset size
> to 2GB (or 4GB if negative values are not needed for flagging exceptional
> conditions).
> Anyways, I just wanted to pass this by you before I conclude to the
> user that: no, datasets must be < 2GB.

R.K.:

As you observe, the _calculation_ of the offset of a particular datum,
and the number handed to lseek() (or ffio_seek()) is done in the system off_t.
This is 64 bits on crays and several other systems (including SGI).

The largest offset stored on disk is the offset of the
*first* record variable. This is typically well within the 2^31-1 limit.
So, it is possible to create valid files which have data beyond the 2^31-1
limit.

Warning! If you try to read such a file on a system which
has a 32 bit off_t, you are out of luck. Of course, you couldn't
have copied the file to that system without truncation anyway.

An interesting sub-case is systems like SunOS 5 and AIX, which now
support large files in the filesystem but have off_t a 32 bit quantity.
These systems include a second set of interfaces and types ( lseek64(), ...
off64_t, ...) to support the large files. Typically, these systems include
some compilation flags which map the old interfaces to the 64 interfaces.
If that is the case and the netcdf is configured and compiled appropriately,
things will work.

Warning #2! Netcdf library versions prior to netcdf-3.4 may not deal with
the offset calculation properly.

The upshot is that you can create large netcdf files,
but the file portability is compromised.

-glenn