Re: Schedule decision

Hi Russ,

> >     I've been working on revising groups in HDF5 files (to allow for 
> > creating
> > groups which track creation order, among other things) and its become 
> > obvious
> > to me that my first attempt at implementing the indices required will not be
> > a good long-term solution.  Switching horses midstream could delay getting 
> > the
> > HDF5 1.8.0 beta release by ~6 weeks if I change the indexing implementation
> > right now.  I can, however, continue with the flawed index implementation I
> > currently have and build up to most of the API changes that would be 
> > required
> > and then go back and revise the guts of the library to use a better data
> > structure on disk for storing the indices required.
> > 
> >     This would allow outside applications/libraries (like netCDF-4) to 
> > mostly
> > stabilize their code on the new API while I went back and reworked internal
> > things.  This has several trade-offs that I can think of:
> >       A - It gets a [reasonably] stable API to testers somewhat sooner.
> >       B - Its going to take longer, because I'll have to re-do some work.
> >       C - Files created during the "transition period" will _never_ be able 
> > to
> >             be read by any other version of the HDF5 library - they must be
> >             discarded by testers.  
> > 
> >     If we've got enough flexibility in our schedules, I would prefer to 
> > avoid
> > doing the re-work and just get things right first.  But, since there is an
> > alternate plan that could work, I thought I would raise the issue.
> > 
> >     What does everyone think?
> 
> We think you should "switch horses in midstream" (being careful not to
> slip into the current :-) and implement the index using the better
> data structure you've discovered.

    Ok, I'll go in that direction then.

> Speaking of horses and mangled metaphors, allow me to try beating a
> dead horse to see if it bears fruit :-).

    :-)

> If you are reconsidering the implementation of creation order
> tracking, we would also suggest reconsidering whether timestamps are
> the right way to store information about order of creation.  It seems
> entirely plausible that two Datasets could be created in the same
> Group within a very short time interval, get the same time stamp, and
> then information about their creation order would be lost.  A simple
> sequence number that is incremented for each object would preserve the
> creation order no matter how fast creation occurs, and would represent
> all the information netCDF-4 needs.

    I was planning on including a hidden field to disambiguate objects that
were created at the same time, so this wouldn't happen.  Since there's no
advantage to using a creation order field instead of using the creation time
when determining the n'th object inserted into a group (when factoring deleted
objects into the equation), I'm still leaning toward using a time instead of an
index for this purpose.  Using the time provides the same functionality and
adds information as well.

    I'm still somewhat split on the issue however and would welcome persuasive
arguments in favor of one mechanism or the other.  :-)  I'm also thinking about
including both fields (creation order and creation time) and allowing users to
create an index on either, to suit their particular needs...

    Quincey