Re: 1.8 file format

Hi Elena:

Thanks that was the missing piece. i am starting to understand what they are 
trying to do. Apparently  VIIRS-AF-EDR_Aggr is intended to mean the aggregation 
of the 5 referenced datasets. Im still trying to figure out the range reference.

John


Elena Pourmal wrote:
John,

We have PPT slides from the NPOESS people (HDF-EOS X Workshop). You will get an idea how they use region references to point to raw data that is scattered among several datasets in an HDF5 file.

Here is a link http://hdfeos.org/workshops/ws10/presentations/day3/Profile_of_NPOESS_HDF5_Files.ppt

I tried to attach the file but my previous email had bounced due to the big size of the attachment.

Elena

At 12:35 PM -0600 7/24/07, John Caron wrote:
Hi Quincey: comments are inline:

Quincey Koziol wrote:
Hi John,

On Jul 24, 2007, at 12:22 PM, John Caron wrote:

Meanwhile yet another question on a different topic:

Im trying to figure out what the NPOESS data files are doing with "Reference Types".

Case 1 is a Reference with type = 0 (Object Reference)

long VIIRS-AF-EDR_Aggr(5);
   :AggregateBeginningDate = "2003125";
   :AggregateBeginningGranuleID = "NPP001212126088";
   :AggregateBeginningOrbitNumber = 9; // int
   :AggregateBeginningTime = "10109.840960z";
   :AggregateCreationDate = "2003125";
   :AggregateEndingDate = "2003125";
   :AggregateEndingGranuleID = "NPP001212126088";
   :AggregateEndingOrbitNumber = 9; // int
   :AggregateEndingTime = "101038.325248z";
   :_LastModified = "2005-08-29T15:54:58Z";

data:

{2928, 3528, 3800, 4072, 4344}

The 5 values are indeed object references to 5 other datasets in the file. Any clues on how this is used, or is it an internal structure that should be left alone?

An object reference is simply the offset in the file of the object header for the object referenced. Users shouldn't modify them directly, but use the H5R* API routines for working with them.

I can understand a reference to another object as an alias.
Bur what does an array of length 5 of references to 5 other objects mean?



h5dump is:

DATASET "VIIRS-AF-EDR_Aggr" {
           DATATYPE  H5T_REFERENCE
           DATASPACE  SIMPLE { ( 5 ) / ( 5 ) }
           ATTRIBUTE "AggregateBeginningDate" {
              DATATYPE  H5T_STRING {
                    STRSIZE 7;
                    STRPAD H5T_STR_NULLTERM;
                    CSET H5T_CSET_ASCII;
                    CTYPE H5T_C_S1;
                 }
              DATASPACE  SIMPLE { ( 1, 1 ) / ( 1, 1 ) }
           }
           ATTRIBUTE "AggregateBeginningGranuleID" {
              DATATYPE  H5T_STRING {
                    STRSIZE 15;
                    STRPAD H5T_STR_NULLTERM;
                    CSET H5T_CSET_ASCII;
                    CTYPE H5T_C_S1;
                 }
              DATASPACE  SIMPLE { ( 1, 1 ) / ( 1, 1 ) }
           }
           ....



Case 2 is a Reference, with type = 1 (Dataset Region Reference).

long VIIRS-AF-EDR_Gran_0(5);

data:

 {3299824, 14172636162555905, 8589934592, 3299824, 14172636162555907}

h5dump is

DATASET "VIIRS-AF-EDR_Gran_0" {
           DATATYPE  H5T_REFERENCE
           DATASPACE  SIMPLE { ( 5 ) / ( 5 ) }

The docs on this are pretty sketchy, i wonder if i could get an expanded description of what the Dataset Region Reference structure looks like? Heres whats there (last page of 1.6.5 doc):

"Dataset region references are stored as a heap-ID which points to the following information within the file-heap: an offset of the object pointed to, number-type information (same format as header message), dimensionality information (same format as header message), sub-set start and end information (i.e. a coordinate location for each), and field start and end names (i.e. a [pointer to the] string indicating the first field included and a [pointer to the] string name for the last field)."

specifically:
 "an offset of the object pointed to" = object id ?
"number-type information (same format as header message)" datatype message? "dimensionality information (same format as header message)" = dataspace message?

and then im even more lost as to what the fields are....

Hmm, looks like the docs are out of date and/or wrong for region references. Because region references are variable in length, the actual reference information is stored in the global heap and a (fixed length) heap ID is stored in the dataset's raw data - that part is right, although too briefly described. The format of the data in the heap is wrong however. The information actually stored in the heap is as follows: - The offset of the object header of the object (ie. dataset) pointed to (yes, an object ID) - A serialized form of a dataspace _selection_ of elements (in the dataset pointed to). I don't have a formal description of this information now, but it's encoded in the H5S_<foo>_serialize() routines in src/H5S<foo>.c, where foo = {all, hyper, point, none}.

There is _no_ datatype information stored for these sort of selections currently.

I'll file a bug report in our bugzilla to make certain this description gets fixed in the file format spec.

ok thanks, any further docs as you develop them would be appreciated.

BTW, it doesnt seem like h5dump distinguishes between type 0 and type 1 references.

===============================================================================
To unsubscribe netcdf-hdf, visit:
http://www.unidata.ucar.edu/mailing-list-delete-form.html
===============================================================================



==============================================================================
To unsubscribe netcdf-hdf, visit:
http://www.unidata.ucar.edu/mailing-list-delete-form.html
==============================================================================