[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 20010808: netcdf 3.4 help



>To: address@hidden
>From: "Alan S. Dawes" <address@hidden>
>Subject: netcdf 3.4 help
>Organization: UCAR/Unidata
>Keywords: 200108082235.f78MZM110902, huge files, large file support, record

Hi Alan,

Here's the CDL for a small file that has two "record variables", x and
y, of different shapes:

 netcdf big2 {
 dimensions:
         m = 4 ;
         n = 5 ;
         r = UNLIMITED ; // (3 currently)
 variables:
         float x(r, m) ;
         float y(r, n) ;
 data:

  x =
   2, 3, 4, 5,
   3, 4, 5, 6,
   4, 5, 6, 7 ;

  y =
   1, 2, 3, 4, 5,
   2, 4, 6, 8, 10,
   3, 6, 9, 12, 15 ;
 }

A small Fortran program that will write the netCDF file corresponding
to the above CDL is appended.  This program was mostly generated by
the "ncgen -f" utility, except I edited the output from that utility a
bit for this example.

In this example, if r was an ordinary dimension declared to be of
length 3, then all the values for x would be stored in the file
followed by all the values of y.  However, since r is declared to be
the unlimited dimension, the first slice of x (corresponding to r=1)
is followed by the first slice of y, then the second slice of x and y,
and so on.  But all your netCDF data access calls for reading and
writing the data are the same as if r was a fixed size dimension.
It's just that with r an UNLIMITED dimension, the data is organized
differently in the file and its possible to append more data in the r
direction efficiently.

To have this program generate a 6 Gbyte file, corresponding to the
similar CDL:

 netcdf big2 {
 dimensions:
         m = 400000 ;
         n = 600000 ;
         r = UNLIMITED ; // (1500 currently)
 variables:
         float x(r, m) ;
         float y(r, n) ;
 data:

  x =
   ...  // long list of values

  y =
   ...  // long list of values
 }

it's only necessary to change the three parameters in the Fortran
program to

      parameter(MFIXED=400000)
      parameter(NFIXED=600000)
      parameter(NUMRECS=1500)

and link the resulting Fortran against the netCDF library compiled
with large file support.  So even though x is a 1500 x 400000 array of
600,000,000 floats (requiring 2.4 Gbytes to store) and y is a 1500 x
600000 array of 900,000,000 floats (requiring 3.6 Gbytes to store),
both variables can be written into and read from the netCDF file,
because they are record variables, only stored a slice at a time, with
the x slice for r=1 followed by the y slice for r=1, ...  To simplify
this example, I haven't included any fixed size variables, but they
don't really change anything as long as the total size of all fixed
size variables is < 2 GBytes.

I hope this clarifies one way to write very large netCDF files on a
32-bit platform with large file support.  I don't think any special
Fortran flags are required for this, but the C library had to be built
with 

  -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE

--Russ

      program fgennc
      parameter(MFIXED=4)
      parameter(NFIXED=5)
      parameter(NUMRECS=3)
      include 'netcdf.inc'
* error status return
      integer  iret
* netCDF id
      integer  ncid
* dimension ids
      integer  m_dim
      integer  n_dim
      integer  r_dim
* dimension lengths
      integer  m_len
      integer  n_len
      integer  r_len
      parameter (m_len = MFIXED)
      parameter (n_len = NFIXED)
      parameter (r_len = NF_UNLIMITED)
* variable ids
      integer  x_id
      integer  y_id
* rank (number of dimensions) for each variable
      integer  x_rank
      integer  y_rank
      parameter (x_rank = 2)
      parameter (y_rank = 2)
* variable shapes
      integer  x_dims(x_rank)
      integer  y_dims(y_rank)
* data variables
      real  x(m_len)
      real  y(n_len)
* starts and counts for array sections of record variables
      integer  x_start(x_rank), x_count(x_rank)
      integer  y_start(y_rank), y_count(y_rank)

* enter define mode
      iret = nf_create('big2.nc', NF_CLOBBER, ncid)
      call check_err(iret)
* define dimensions
      iret = nf_def_dim(ncid, 'm', MFIXED, m_dim)
      call check_err(iret)
      iret = nf_def_dim(ncid, 'n', NFIXED, n_dim)
      call check_err(iret)
      iret = nf_def_dim(ncid, 'r', NF_UNLIMITED, r_dim)
      call check_err(iret)
* define variables
      x_dims(2) = r_dim
      x_dims(1) = m_dim
      iret = nf_def_var(ncid, 'x', NF_REAL, x_rank, x_dims, x_id)
      call check_err(iret)
      y_dims(2) = r_dim
      y_dims(1) = n_dim
      iret = nf_def_var(ncid, 'y', NF_REAL, y_rank, y_dims, y_id)
      call check_err(iret)
* leave define mode
      iret = nf_enddef(ncid)
      call check_err(iret)
       
* Write record variables one record at a time
       
      do irec=1, NUMRECS
       
*     store some arbitrary values in data variable slices
       
         do ix = 1, m_len
            x(ix) = ix + irec
         enddo
         
         do iy = 1, n_len
            y(iy) = iy * irec
         enddo

*     store x slice
         x_start(1) = 1
         x_start(2) = irec
         x_count(1) = m_len
         x_count(2) = 1
         iret = nf_put_vara_real(ncid, x_id, x_start, x_count, x)
         call check_err(iret)
*     store y slice
         y_start(1) = 1
         y_start(2) = irec
         y_count(1) = n_len
         y_count(2) = 1
         iret = nf_put_vara_real(ncid, y_id, y_start, y_count, y)
         call check_err(iret)
      enddo
       
      iret = nf_close(ncid)
      call check_err(iret)

      end
       
      subroutine check_err(iret)
      integer iret
      include 'netcdf.inc'
      if (iret .ne. NF_NOERR) then
      print *, nf_strerror(iret)
      stop
      endif
      end