[jwlong@xxxxxxxxxxxxxxxxx: ]

Hi,

For your information, I'm forwarding this reply I just got from Jeff Long of
Lawrence Livvermore Laboratories describing the SILO extensions to netCDF.
Jeff's description makes it clear that there is more to his extensions than
can be easily represented with attribute and variable-name conventions, as I
had earlier characterized these extensions.  I haven't responded to this
note yet, but Jeff's extensions appear tastefully done.  They make some
things a bit more complicated (e.g. an open netCDF file now has a "current
working directory") and I'm not convinced that adding two primitives
(directories and objects) was necessary rather than one, but I think this
desrves more study.  If you have comments, I can try to incorporate them
into a reply.

--Russ



Russ,

I have defined a database API, called SILO, which is based on the netCDF
interface.  SILO is intended to be fully compatible with existing netCDF
applications. What distinguishes SILO are two new 'primitives' it adds to
the netCDF model: directories and objects. Both of these extensions were
added in a non-obtrusive way; that is, no changes were made to existing
netCDF functions. Our current implementation of SILO rests on a local
database library, but we are very interested in using the netCDF/HDF merge
being done by NCSA.

The directory primitive allows the user to organize a database file into a
hierarchical structure analogous to the Unix file system. Each directory
created in a SILO file can be thought of conceptually as a virtual netCDF
file: it has its own dimensions, variables, attributes and so on. An
inquiry function will only return the contents of the current directory. In
keeping with the netCDF model, however, there is just one set of global
attributes, and just one unlimited dimension ID.

One difficult decision I had to make regarding directories dealt with
identifiers for variables and dimensions. As you know, with netCDF you can
determine how many variables there are in a file, and automatically know
that their identifiers range from 0 to nvars-1 (for C). When a file
contains multiple directories, however, an extra level of complexity is
added. What I finally decided was to treat each directory like netCDF
treats the file -- within a directory, variable identifiers range from 0 to
nvars-1, where nvars is the number of variables IN THAT DIRECTORY. I refer
to this scheme as "relative" identifiers. Therefore, to uniquely identify
any entity in the file, one needs three items: the parent directory ID, the
entity type (variable, dimension) and the entity ID (a variable ID if the
entity is a variable, a dimension ID if the entity is a dimension.)

My original design called for "absolute" identifiers, which in effect meant
that any entity (variable, dimension, etc.) in a file could be uniquely
specified with a single identifier. This simplified the interface for the
object functions, but required changes to the inquiry functions so that a
list of identifiers was returned in addition to the number of variables,
dimensions, etc. This was such a big departure from the "natural" netCDF
way of doing things that I switched back to relative identifiers.

The programming interface for directories is described below:

1. Define new directory (mkdir)
        ncdirdef (int sid, char *name);

2. Get current directory (pwd) 
        ncdirget (int sid);

3. Get dir ID from name 
        ncdirid(int sid, char *name);

4. Inquire about a directory 
        ncdirinq (int sid, int dirid, char *name, int *parent, int *nchild);

5. List dirs beneath current dir (lsd)
        ncdirlist (int sid, int dirid, int *ndirs, int dirids[]);       

6. Set the current directory (cd)
        ncdirset (int sid, int dirid);

The function ncdirlist() is necessary because, unlike variables and
dimensions, directory identifiers are absolute. It is essential that a
single identifier can point to any directory within the entire file.

The second extension to the netCDF model provided by SILO is the concept of
objects. Objects are simply a mechanism for grouping related information.
The components of an object can be variables, dimensions, directories, and
even other objects. Components can be in any directory within the file.

The programming interface for objects is described below:

1. Define an object.
        ncobjdef (int sid, char *name, int type, int ncomps);

2. Get object ID from name.
        ncobjid (int sid, char *name);

3. Inquire about object.
        ncobjinq (int sid, int objid, char *name, int *type, int *ncomps);

4. Write an object. 
        ncobjput (int sid, int objid, char *cnames[], int cids[],
                  int ctypes[], int cparents[]); 
5. Read an object.
        ncobjget (int sid, int objid, char *cnames[], int cids[],
                  int ctypes[], int cparents[]); 


An object is composed of a name, a type, and four parallel lists describing
the components of the object. The lists contain the component names,
identifiers, types, and parent IDs. Component names are arbitrary, and do
not necessarily match the actual names of the variables or dimensions whose
IDs are provided. Note that if absolute IDs were used, the types and
parents lists could be eliminated. SILO itself does not impose meaning on
the objects within a file. I have a higher level interface which reads and
writes certain types of SILO objects.

Having used SILO for about a year now, we have found directories and
objects to be very useful additions to the netCDF interface. Objects are
essential for dealing with compound data such as physics meshes and their
related information. Directories have been heavily used by applications
which use multi-block meshes; in the past they had to use a flat file
structure and employ a naming scheme to differentiate variables -- they
called their variables "x_block1", "x_block2", etc. Now they can create a
directory called "block1" and in it write the variable with its natural
name, "x". Without these extensions, our applications and databases would
be much more difficult to maintain.

Because the NCSA people have shown an interest in the SILO extensions, I am
very eager to get feedback from real netCDF users such as yourself. I am
completely willing to make modifications to the programming interface or
the underlying model if a more reasonable approach is found. In particular,
I am interested in your responses to the following questions:

1. Should the concept of relative IDs be kept, even though it makes it more
   difficult to specify object components? Should absolute IDs be
   introduced, even though this departs from the netCDF model? It could be
   possible to have both schemes simultaneously, and provide a mechanism
   for mapping between absolute and relative.

2. Is there a better name for objects than 'objects'? Perhaps 'groups'?

If you have any comments, ideas, or questions, I can be reached via email
at "long6@xxxxxxxx", or my phone number is (510)423-6421. I have a detailed
document in paper form which I can send to you if you are interested.
Thanks for your help.

Jeff Long
LLNL