[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 970904: netcdf unlimited dim



>To: address@hidden
>From: Ben Foster <address@hidden>
>Subject: netcdf unlimited dim
>Organization: High Altitude Observatory/NCAR, Boulder CO
>Keywords: 199709041605.KAA10198

Hi Ben,

> I have been writing netcdf files containing 3d variables (lev,lat,lon)
> from a model at a single model time. For time series, I have been
> writing a separate file for each time, encoding the time into the
> file name. (see attached ncdump for esample of one of these files)
> 
> Now I want to put multiple times in a single file so I can animate
> along the time dimension. Thus the variables will be 4-dimensional
> (time,lev,lat,lon). The model is running in a time loop, and I
> have 3d variables at each iteration, i.e., I do not want to save
> 4d variables before creating the netcdf file.
> 
> My question is how best to do this. Should time be an unlimited
> dimension? It doesnt have to be, as I do know the number of times
> that it will contain at the time of creation. I will plan to call
> a fortran routine at each time iteration. If its the 1st iteration,
> I'll create the netcdf file and define dimensions, attributes, etc.
> If the time iteration is > 1, then I will reopen the existing file
> and add the 3d vars from the model at the
> appropriate time index of the netcdf vars (i,lev,lat,lon). 
> 
> Several of the global file attributes will become arrays, w/ different
> values at each time. Should they be initially sized for all times,
> w/ new values added at each time iteration, or resized at each iter?
> Does an unlimited dimension affect i/o performance?  It would also
> be convenient to add new times to an existing netcdf file in later
> continuation runs of the model.
> 
> I've looked through the doc, and found examples of how to set up
> an unlimited dimension, but can't find examples of how to grow the
> variables along that dimension from inside a loop. Thanks for any
> advice or help,

If you really want to be able to "add new times to an existing netcdf
file in later continuation runs of the model", and you can't anticipate
the maximum number of times you will want to add, then time should
definitely be an unlimited dimension.  A file with an unlimited
dimension takes only the space needed for the current size of that
dimension; it doesn't have to be initially created for as large as it
will ever get.  Even if you know what the maximum size of the time
dimension will eventually be, using an unlimited dimension has
advantages:

 - the file only grows as needed during the model run;
 - all the data for each time is stored together, which can mean more
   efficient I/O if it is later to be accessed this way;
 - it's easy for a file reader to tell how many times are present
   in the file by the current size of the time dimension, without
   looking at missing values or fill values of variables.

Global attributes should be initially sized for all times, since
attributes don't have associated dimensions.  You might consider using
variables instead of global attributes for information that depends on
time, since the result will be clearer and more "self-describing".  Any
association between the nth time and the nth value of a global attribute
relies on implicit knowledge that is not in the file, whereas a variable
dimensioned by time makes the association explicit.

Trying to resize a global attribute at each model iteration would
perform badly, as expanding attributes is implemented by essentially
copying all the data to a new file with more space for the larger
attribute.

An unlimited dimension may improve performance over using a fixed
dimension for time, because there are fewer page faults in writing the
data, since it is written essentially sequentially as the values for all
unlimited variables are written together at each iteration.  However,
this effect on performance is probably minor.

To grow a variable from inside a loop, you have to use the START and
COUNT arrays, something like this (in the netCDF-3 Fortran interface):

   real tn(NLON, NLAT, NLEV)
   real un(NLON, NLAT, NLEV)
   ...
   integer start(4)
   integer count(4)

   start(1) = 1
   start(2) = 1
   start(3) = 1
   start(4) = 1

   count(1) = NLON
   count(2) = NLAT
   count(3) = NLEV
   count(4) = 1

   do 10 itime = i, NTIMES
        start(4) = itime
        status = nf_put_vara_real (ncid, tnid, start, count, tn)
        if (status .ne. nf_noerr) call handle_err(status)
        status = nf_put_vara_real (ncid, unid, start, count, un)
        if (status .ne. nf_noerr) call handle_err(status)
        ...
10 continue

or something similar in the netCDF-2 Fortran interface.

--Russ

_____________________________________________________________________

Russ Rew                                         UCAR Unidata Program
address@hidden                     http://www.unidata.ucar.edu