Re: something startling I just noticed...

NOTE: The netcdf-hdf mailing list is no longer active. The list archives are made available for historical reasons.

To: netcdf-hdf@xxxxxxxxxxxxxxxx
Subject: Re: something startling I just noticed...
From: Quincey Koziol <koziol@xxxxxxxxxxxxx>
Date: Thu, 13 Nov 2003 07:11:23 -0600 (CST)

Hi Russ,

> > >There are some advantages of sequence numbers over times:
> > >  - you don't have to worry about clock resolution and the possibility
> > >    that creation times of two objects are equal
> >     Hmm, we use the gettimeofday() routine, which returns values in
> > microseconds, so this probably would not be too much of an issue, but I 
> > admit
> > it certainly is possible.
> 
> We ran into just this problem on a skiplist implementation (for LDM
> not netCDF) that required a total ordering.  Time stamps worked most
> of the time, but if two events happened to get assigned the same
> microsecond clock tick, we lost track of one of the corresponding
> objects.  On old machines, we never saw the problem, but it bit us
> when we tried running on faster hardware.  We ended up adding what was
> essentially a sequence number to the timestamp to disambiguate
> matching microsecond clock times.
    Well, I hope that we can create objects in the file fast enough that having
only a microsecond resolution is a problem for HDF5 also... :-)

> > Hmm, I think there may be some issues with a creation sequence number also:
> >     - The "last number issued" will need to be stored in the file (unlike
> >         creation times).
> >     - Should it be local to the group, or global to the file? There are
> >         pro's and con's to both:
> >             Global:
> >                 - Pro: One number to track for file
> >                 - Con: May have contention for updating this number in a
> >                     parallel environment.
> >                 - Con: Faster to roll over than a sequence number per group.
> >                 - Con: Sequence numbers in one group will have gaps, if
> >                     objects are created in other groups, which does not
> >                     imply objects were deleted in the group.
> > 
> >             Local:
> >                 - Pro: More consistent numbering within one group than a
> >                     sequence number per file.
> >                 - Con: May have contention for updating this number in a
> >                     parallel environment.
> >                 - Con: A new piece of metadata to update with every object
> >                     created in a group.
> > 
> > I guess I would tend toward a local (i.e. per group) sequence number.
> > How's that sit with people?
> 
> Good analysis of sequence number problems.  I agree with you, local
> seems to be adequate unless we chose to ignore Group semantics for the
> netCDF-4 interface and just treated the Group name as part of a global
> name for a netCDF-4 object.  In that case, local would be a problem,
> because two netCDF-4 objects that we wanted to iterate over in order
> could get the same sequence number.  Maybe this is an argument not to
> treat Groups as just part of the name.
    Yes, local sequence numbers cut both ways sometimes...  Since most (all?)
current netCDF users should be used to a 'flat' file, putting all the objects
in the root group of the file and using the creation order in that group seems
like a reasonable default.  Then, you could change the definition of the way the
creation order information is used for netCDF 4 users so that the group
structure was accounted for.
    BTW, I was looking through the netCDF 3 API for functions that take or
return an 'index' in the file and I can't find one.  Which function(s) applies
to this situation?

> For us, a different kind of local would also work: a set of sequence
> numbers for Datasets, for each Dataset's Attributes, and for shared
> dimension Scales.  But if you have other uses for time stamps or
> sequence numbers, our use shouldn't dictate the requirements, since
> anything that allows us to determine the creation order of netCDF
> variables, dimensions, and attributes would work.
    This is along lines that we've thought about for a long time: adding a 
live" index capability to HDF5 files, where every change to the file's metadata
(object creation, modification, deletion and attribute creation & deletion)
could update an index in the file in some way.  I think this is a great idea,
but I think it would be too much work at the current time. :-(

    Quincey

Follow-Ups:
- Re: something startling I just noticed...
  - From: Russ Rew

References:
- Re: something startling I just noticed...
  - From: Russ Rew

2003 messages navigation, sorted by:
1. Thread
2. Subject
3. Author
4. Date
5. ↑ Table Of Contents
Search the netcdf-hdf archives: