[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[netCDF #DVS-605789]: Getting values of compound attributes



> > I've written general C code for compound types in ncdump and nccopy,
> > though for attributes in nccopy I can just use the netCDF-4 library
> > function nc_copy_att() which does the attribute work.  
> > ...
> 
> Right. Although I can see how to do it, I'm still grappling with the
> 'atomic reads' aspect of the API. To wit: How atomic is 'atomic'?

Good question.  In the HDF5 API, it's possible to read individual
fields (which I usually call members), or a subset of fields, from a
variable of compound type with a single call to H5Dread().  However,
the netCDF-4 API does not expose this functionality, and requires that
a whole compound value be read all at once into a memory block big
enough to hold all the members, using the memory offsets originally
used in the type definition.  I'm CC:ing Ed to make sure I've got that
right, as he wrote the netCDF-4 API.

> > ...
> > So the main idea with compound types is to treat them as a bunch of fields
> > and to handle each field with another call to the function that handles the
> > compound type, with primitive types at the bottom bottoming out the 
> > recursion.
> 
> Yes, I'm doing that too, although I'm currently doing the array stuff
> and will tackle the recursion in a bit.
> 
> Here are some specific questions about what I can expect from
> nc_get_vara() for different types.
> 
> Here's my base case (and I have this working, so I don't think I have
> any questions about it, but including it makes for a better story ;-)
> 
> compound obs_t {
>     byte day ;
>     short elev ;
> }
> 
> obs_t obs ;
> 
> For this nc_get_var() (note not nc_get_vara()) returns data for both
> day and elev in one call and I am using nc_inq_compound_field(ncid,
> datatype, i, field_name, &field_offset, &field_typeid, &field_ndims,
> &field_sizes[0]); to get the offsets into the buffer returned. This
> works.
> 
> Array of compound:
> compound obs_t {
>     byte day ;
>     short elev ;
> };
> 
> obs_t obs(3) ;
> 
> I'm guessing that nc_get_vara() will return a buffer with
> <day,elev,day,elev,day,elev> in it where the field offsets for each
> are obtained using nc_inq_compound_field() and that the start and
> length (and stride) params work as for a cardinal type.

That's my understanding, though I think there is padding between
compound values to ensure that each compound value starts at a
multiple of 4 bytes (right Ed?).  If there is such padding, it's
incorporated into the type size, which gives the size in bytes that
each value would require in an array.  So you can use the type size to
index into an array of values.

> compound with child compound (your test file w/o the array dimension):
>   compound cmp1 {
>     short i ;
>     int j ;
>   }; // cmp1
>   compound cmp2 {
>     float x ;
>     double y(3, 2) ;
>   }; // cmp2
>   compound cmp3 {
>     cmp1 xx ;
>     cmp2 yy ;
>   }; // cmp3
> 
> variables:
>       cmp3 phony_compound_var ;
> }
> 
> What does nc_get_var() return in this case? 

My understanding is that the values would appear together in the
returned data in the order 
   i, j, x, y(0,0), y(0,1), y(1, 0), y(1,1), y(2,0), y(2,1)
with the offsets being determined by the size of cmp1, cmp2, and the
offsets within each of those compound types.  Any padding is accounted
for in the offsets of the individual member of each compound type and
the size of the compound types (right, Ed?).

> Currently I have code like:
> 
> if (has_stride)
> errstat = nc_get_vars(ncid, varid, cor, edg, step, &values[0]);
> else
> errstat = nc_get_vara(ncid, varid, cor, edg, &values[0]);
> 
> and it works fine. But am I really gaining anything using it? That
> is, is nc_get_vara() more efficeint than nc_get_vars()?

It is certainly more efficient to read a block of N contiguous values
than a block of N strided values for stride > 1, as the latter in
general requires more disk accesses for large N and slihtly more CPU
to gather the desired values into a contiguous result, but I suspect
that's not what you're asking.

In netCDF-3, I believe reading the strided values avoided decoding
from XDR to native until they were all gathered in a contiguous block,
whereas just using nc_gt_vara() and extracting the strided values
yourself would incur the cost of un-XDRing all the values, including
the ones being skipped over.  So using the strided API was somewhat
more efficient, though only in CPU time, not disk accesses, so
probably not significantly more efficient.

For netCDF-4, I don't know the answer to your question; maybe Ed
knows.  If the data is compressed and chunked, then the overhead 
of decompressing/inflating all the data is incurred when it's
accessed through strided APIs, and that may dominate.

--Russ


Russ Rew                                         UCAR Unidata Program
address@hidden                      http://www.unidata.ucar.edu



Ticket Details
===================
Ticket ID: DVS-605789
Department: Support netCDF
Priority: Normal
Status: Closed