[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Allocation of Space in NetCDF



> Date: Tue, 10 Dec 1996 12:38:50 -0800
> To: address@hidden
> From: Roy Mendelssohn <address@hidden>
> Subject: Allocation of Space in NetCDF

Hi Roy,

> I have a question about how and when NetCDF allocates space.  Perhaps it is
> answered in the documentation, but I couldn't find it.  Suppose I have a
> data array, say dimensioned by lat,lon,time and a fourth dimension that is
> an unlimited dimension.  So we have real numbers, in FORTRAN terms say it
> is dimensioned (2,2,2,*), and we have 32-bit integers.
> 
>       1) How big of a file would be created by default (ie I create the
> NetCDF file, define all the dimensions etc, define a variable with those
> dimensions, but don't actually write any data to the file?

The size of the netCDF file specified by the CDL:

    netcdf r {
    dimensions:
        lon = 2;
        lat = 2;
        time = 2;
        rec = unlimited;
    variables:
        float var(rec,time,lat,lon);
    }

is 128 bytes, as you can verify by running "ncgen -b" on it.  It could
be larger if the names of the variables and dimensions were longer.

It's possible to glean this from the User's Guide chapter on File
Structure and Performance, but it's easier to just run ncgen on the CDL
file and look at the size of the generated netCDF file.
 
>       2) This is really the question I have.  Suppose now I have one
> observation, for convenience it is at location (1,1,1,1).  How big of a
> file do I have now.  If it is what Fortran does, I would have an array that
> is (2,2,2,1) so I would increase the file by 8x4bytes = 32 bytes.  What
> would be ideal if I only increased by the 4 bytes the file size.

Sorry, but adding 1 data value increases the size of the file by
lat*lon*time*4 bytes, in this case 32 bytes.  The smallest increment by
which a netCDF file grows is one record's worth of data, which is the
amount of space for one slice along the unlimited dimension of all the
variables that use the unlimited dimension.

> What brings this up is I am thinking of using NetCDF for a dataset where
> lat, lon, and time are very large, and the unlimited dimension represents
> separate observations for that lat, lon, time coordinate. but the number of
> observations will vary greatly depending on the particular lat,lon,time
> combination, some even having no observations (i.e. a grid with varying
> number of obs at any grid point).  If the file size were determined like
> Fortran dimensions arrays, it would be huge, many locations would just be
> missing data, and it wouldn't be practical.  If the storage were the other
> way, then it would be very practical.
> 
> Any help, advice etc. would be greatly appreciated.

There are various ways to represent such data without wasting space, but
you trade off ease of access by location and time.  For example, you
could let the unlimited dimension be "obsnum" representing observation
number, and use something like:

    netcdf sparse {
    dimensions:
       obsnum = unlimited;  // observation number
    variables:
       float lat(obsnum);
       float lon(obsnum);
       float time(obsnum);
       float var(obsnum);
    }

and now you can have as many (lat,lon,time) tuples as you want, with
each observation adding only 16 bytes to the file (four floats), but
without an index, it may be costly to find all the observations
corresponding to any particular (lat,lon,time) interval.

Another approach uses "ragged arrays", similar to what is described in
the "Data Structures" section of the manual, available on-line at

    http://www.unidata.ucar.edu/packages/netcdf/guide_5.html#SEC31

Hope this helps.

--Russ
_____________________________________________________________________

Russ Rew                                         UCAR Unidata Program
address@hidden                     http://www.unidata.ucar.edu