[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 20020820: how to store different sets of attributes for each image of a collection of images in a netCDF file



>To: address@hidden
>From: Wen Jiang <address@hidden>
>Subject: how to store different sets of attributes for each image of a 
>collection of images in a netCDF file 
>Organization: Baylor College of Medicine
>Keywords: 200208201457.g7KEvYK12457

Hi Wen,

> I am thinking to adopt netCDF file format for our data. The reason that 
> we like netCDF is that netCDF allows adding new attributes to the data 
> after the data is created which is much better than the fixed image 
> formats we are using. This is nice, but we need to store multiple (and 
> the number is not predetermined) 2D images in the file and a set of 
> attributes (same names but different values) are needed for each of the 
> images. I don't know how efficiently this could be done with netCDF. My 
> understanding is that the whole file will be rewritten / copied whenever 
> a new images or a new attribute is added to the data sets, thus the 
> performance might be horrible. Is there a better way to achieve this need?

If all your 2D images are the same size, or if you can tolerate
specifying a maximum size and wasting space for some images that are
smaller than that maximum size, then you can use the "record
dimension" or "unlimited dimension" to permit efficiently appending
new images and values of new variables associated with each image
along the record dimension.  On the other hand, netCDF is not ideal
for appending new images of diverse sizes without wasting some space,
since it relies on being able to seek to the start of any slice of any
data variable quickly, without searching or following a chain of
pointers.  It's also not possible to add new record variables
efficiently.

If you can use netCDF variables rather than attributes for the
ancillary data associated with each image, then these can also be
appended efficiently, without copying any data, but you have to
anticipate what these are when you create the file.

It's also possible to reserve space when you create a netCDF file for
the future efficient addition of global and variable-specific
attributes, more values for existing attributes, and additional
non-record variables, without any copying.  If you can estimate an
upper bound on how much extra space you are likely to need in the
header and the non-record variable section for the later addition of
attributes and fixed variables that don't use the unlimited dimension,
reserving this extra space is generally a good idea.

The C and Fortran interfaces for the functions reserving this extra
space are only documented in the man-page reference documentation, but
not yet in the User Guides.  For example, in the C man pages, see the
description of the nc__enddef() function (two underbars!):

  http://www.unidata.ucar.edu/cgi-bin/man-cgi?netcdf+-s3

Just to be clear, the structure of such a file might be something like
this (where I'm using the simple netCDF CDL notation described in the
Users Guide):

netcdf imdata {

dimensions:
  nx = 1000;           // maximum x size of an image
  ny = 2000;           // maximum y size of an image
  nimage = unlimited;  // number of images currently in dataset

variables:
  byte image(nimage, nx, ny);   // an image
  int mx(nimage);               // the actual x size of this image
  int my(nimage);               // the actual y size of this image
  char title(nimage, titleLen); // title of the image
  float date-time(nimage);
      date-time:units = "hours since 2000-1-1";
  float calibration(nx, ny);
      calibration:units = "rad/s";

// global attributes
    :image_subject = "id of subject";   
}

I made up that example partly to be clear about what you can and can't
reserve space for in advance.  You can reserve any amount of space on
creation of a netCDF file for later adding new fixed-size variables
such as "calibration", variable attributes such as
"calibration:units", global attributes such as ":image_subject".

You can efficiently add additional record values for any
record-oriented variables (that use the unlimited dimension) such as
"image", "mx", "my", "title", and "date-time".

You *cannot* later efficiently add new record variables, such as
"moonPhase(nimage)".  The record variables and the total size of a
record are fixed at file creation time.

--Russ

_____________________________________________________________________

Russ Rew                                         UCAR Unidata Program
address@hidden                     http://www.unidata.ucar.edu