[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 20040630: adding another entry to unlimited dimension



>To: address@hidden
>From: Stacy Brodzik <address@hidden>
>Subject: adding another entry to unlimited dimension
>Organization: > Stacy Brodzik <address@hidden>
>Keywords: 200406300005.i5U05WWb002233 netCDF time

Hi Stacy,

> I've added variables, attributes, etc to netcdf files but I've never
> tried to add another time offset and its accompanying data to a
> netcdf file.  I've looked through the functions and don't really see
> an easy way to do it.  If there's a way to do it without opening the
> dataset, copying all the data out of it into arrays, etc, adding the
> new data, and creating a completely new netcdf file, I'd be
> interested in hearing back from you.

If by "add another time offset" you mean just add all the data
associated with another time record to all the variables that use the
time dimension, where the time dimension is declared to be unlimited
and is the first dimension of each variable that uses it, then you can
do what you want by merely writing the variable data slices using one
of the nc_put_vara C interfaces, for example.  That will merely append
the data efficiently to the netCDF file without copying, and is what
the unlimited dimension is designed to support.

But I suspect you probably know that and mean something else by "add
another time offset", for example adding a new dimension and some new
variables that use it.  If it's a fixed-size dimension, then you can
add it and also add some additional fixed size variables efficiently,
but only if you anticipated you might need to do this by calling the
"underbar underbar" versions of the function nc__enddef() that has
additional parameters for reserving space in the header for
additional dimensions, variables, and attributes, and additional space
before the first record for additional fixed-size variable data.

This function is currently only documented in the man page reference
documentation, which should be available online, but the script that
produces it is currently broken.  So I've appended the relevant part
of that below.

If you didn't reserve any extra space in the header and fixed-variable
data section, then when you call nc_redef(), add new dimensions,
variables, and attributes, then call nc_enddef(), the data will be
copied to make space for the new data.  And if you want to add a new
record variable, this also requires a copy of all the data, because
the original format unfortunately doesn't allow for reserving extra
space in records for latter addition of new record variable data.

In netCDF-4, currently under development, you will be able to
efficiently add new variables, attributes, and dimensions without
restriction, and without worrying about the data being copied.

--Russ

_____________________________________________________________________

Russ Rew                                         UCAR Unidata Program
address@hidden          http://www.unidata.ucar.edu/staff/russ


     int nc__enddef(int ncid, size_t h_minfree,  size_t  v_align,
          size_t v_minfree, size_t r_align)

          Like nc_enddef() but has additional performance  tuning
          parameters.

          Caution: this function exposes internals of the  netcdf
          version  1 file format.  It may not be available on fu-
          ture netcdf implementations.

          The current netcdf file format has three sections,  the
          "header" section, the data section for fixed size vari-
          ables, and the data section for variables which have an
          unlimited dimension (record variables).  The header be-
          gins at the beginning of the file. The (offset) of  the
          beginning of the other two sections is contained in the
          header. Typically, there is no space between  the  sec-
          tions.  This  causes  copying overhead to accrue if one
          wishes to change the size of the sections, as may  hap-
          pen  when  changing  names  of  things,  text attribute
          values, adding attributes or  adding  variables.  Also,
          for  buffered  i/o, there may be advantages to aligning
          sections in certain ways.

          The minfree parameters allow one to  control  costs  of
          future  calls  to nc_redef(), nc_enddef() by requesting
          that minfree bytes be available at the end of the  sec-
          tion.   The h_minfree parameter sets the pad at the end
          of the "header" section. The v_minfree  parameter  sets
          the  pad  at the end of the data section for fixed size
          variables.

          The align parameters allow one to set the alignment  of
          the beginning of the corresponding sections. The begin-
          ning of the section is rounded up to  an   which  is  a
          multiple   of  the  align  parameter.  The  flag  value
          NC_ALIGN_CHUNK tells the library to use  the  chunksize
          (see above) as the align parameter.  The v_align param-
          eter controls the alignment of  the  beginning  of  the
          data  section  for  fixed  size variables.  The r_align
          parameter controls the alignment of  the  beginning  of
          the  data section for variables which have an unlimited
          dimension (record variables).

          The file format requires mod 4 alignment, so the  align
          parameters  are  silently rounded up to multiples of 4.
          The  usual  call,  nc_enddef(ncid)  is  equivalent   to
          nc__enddef(ncid, 0, 4, 0, 4).

          The file format does not contain a "record size" value,
          this  is  calculated from the sizes of the record vari-
          ables. This unfortunate fact prevents us from providing
          minfree  and  alignment  control  of the "records" in a
          netcdf file. If you add a variable which has an  unlim-
          ited dimension, the third section will always be copied
          with the new variable added.