To: caron@dilbert.acd.ucar.edu (John Caron)
Cc: nuwg@comet.ucar.edu
Subject: Re: More comments on netcdf "gridded" data conventions. 
Date: Thu, 20 Apr 1995 13:48:31 -0600
From: Russ Rew <russ@unidata.ucar.edu>
Organization: UCAR Unidata Program

Hi John,

Here's my comments on your ideas for improving the NUWG conventions.  I'd
like to keep this discussion going on the nuwg mailing list, so please feel
free to jump in.

> ---------------------------------------
> General NetCFD Conventions and Definitions
> 
> Coordinate Variables
>     A variable with the same name as a dimension is a "coordinate
> variable", and its data values are the coordinate values for that dimension.
> The variable must be indexed by the dimension.
> 
> Coordinate Reference 
>     Coordinate systems cannot always be described by coordinate 
> variables which are single valued and have the same cardinality as
> a dimension.  A "coordinate reference" is one or more variables that
> also describe a coordinate system, and may have different
> sizes than the dimension it describes.
>     A "coordinate reference variable" is a variable whose data values
> describe the coordinate values for a dimension. Its cardinality is a
> function of the size of the dimension. It must have an attribute named
> "dimension" whose value is the name of the dimension.
>     A coordinate reference is defined by an attribute with
> the same name as a dimension. The value of the attribute is the
> name(s) of coordinate reference variable(s) that describe the coordinate 
> values for that dimension.  An attribute that is global becomes the default 
> description for that dimension in the file, and may be overridden by a 
> variable's attribute.

I don't understand the necessity or desirability of having an attribute
named "dimension" for coordinate reference variables.  If the coordinate
attribute names the coordinate reference variables and has the same name as
the dimension, why do the coordinate variables also have to have attributes
that name the dimension?  This seems to be redundant representation of the
same information.  Problems with requiring redundant representation include
the possibility that the two representations are inconsistent, and the
probability that data providers will neglect to provide the information in
one or the other form.

To make this clearer, here's an example we are currently using in our
ruc.cdl for the output of the MAPS/RUC model:

			
    dimensions:
	    record = UNLIMITED ;	     // (reference time, forecast time)
	...
    variables:
	    double	reftime(record);     // reference time of the model
			reftime:long_name = "reference time";
			reftime:units = "hours since 1992-1-1";

	    double	valtime(record);     // forecast time ("valid" time)
			valtime:long_name = "valid time";
			valtime:units = "hours since 1992-1-1";
	...
	    :record = "reftime, valtime" ;   // "dimension attribute" -- means
					     // (reftime, valtime) uniquely 
					     // determine record

Here "reftime" and "valtime" are like what you have described as coordinate
reference variables for the "record" dimension, and so should have
attributes named "dimension" with value "record".  But the record dimension
has a global coordinate attribute "record" that names the two coordinate
reference variables, so this information is already represented.

The idea here was to be able to use the "record" dimension for representing
an ordered pair of times, (reference time, forecast time), so you could
model outputs for multiple reference times and multiple forecast times in a
single netCDF file with only one unlimited dimension.  Perhaps this isn't
the use you had in mind for coordinate reference variables, but we do need
this capability.


> ---------------------------------------
> > (1) Time
> > 
> > 1) Time variables are always type double 
> 
> Not always adequate or possible. Should have several approved
> types of time coordinates.

For NUWG conventions, we decided that type double is adequate, especially
since the time units can be anything from picoseconds to eons and can
specify a base time offset, according to the udunits library standard.  I
don't remember the rationale for not permitting offsets from a base time to
use a short types, which would seem to be desirable if the bulk of data
in a file were relatively low-resolution time offsets.

We also agreed that data providers can include other representations for
times as well (e.g. human readable).  This is a general property of the NUWG
conventions, that they specify a minimum of what should be in a netCDF file
so that the generic applications we are developing will be able to deal with
it.  We don't proscribe adding extra information to netCDF data; data
providers are free to include additional attributes, variables, and
dimensions for other purposes, but our applications won't use that
information.  Furthermore, data providers are free to develop a more
restrictive set of conventions (e.g. requiring that a time variable always
be named "time") that our generic applications will be able to handle.

> > 2) Time variables are indexed by another variable (can 
> >    be the unlimited dimension)
> > 3) The names given to the time variables and the indexing variables 
> >    are not subject to convention
> 
> How do you know which dimension is time? Also, if you want to use
> "coordinate variables" (often handy), then the variable name must =
> dimension name, by convention.

Applications that need to know the name of the time variable will have to be
provided that information as an input, either in a table of associated
variable names, as an argument, as a clickable selection, etc.  We decided
very early that we wanted NUWG conventions to avoid explicit variable names,
where possible; the French name for the time variable should be acceptable
for French data, and applications that require specific variable names will
have to support a mapping between file variable names and names used in the
application.  Again, more restrictive conventions that require explicit
variable names should work fine with this set of conventions.

> > 4) This convention supports grids requiring any number of times 
> >    to fully describe the data
> 
> Replace all above with Gridded Data Convention:
>    1) The time dimension is always named "time". It may or may not be
>       an unlimited dimension.
>    2) A coordinate variable or coordinate reference must be defined
> to describe the time coordinates for all fields with the time dimension.
>    3) The following ways of specifying the time coordinate values are agreed
> to, with extensions possible in the future:
> 	3a) a double variable, units "secs since <date and time>"

Since the time variable must have a "units" attribute by our conventions,
this restriction is unnecessary; any units acceptable to the udunits package
are OK for time variables, including "years since <base year>" and
"picoseconds since <base time>".  Applications should use the udunits
library to deal with such differences.

> 	3b) a character string in the form specified by the FGDC "Content
> 	    Standards for Digital Geospatial Metadata", unit = "FGDC". (?)
> 	3c) a character array, suitable for displaying to the user. Define
> 	    attributes to the coordinate variable or coordinate reference
> 	    variables(s) as needed to disambiguate. 

These are certainly OK as auxiliary representations for time, but because
comparing or doing arithmetic with such times can be unwieldy, the NUWG
conventions require the provision of a time that applications can easily
compare and compute with as well.

> Comment: possible problem: do multiple coordinate reference variables
> create unique instances, such as ref_time/valid_time, or do they
> describe alternate possibilities, such as a "secs since..." and a
> descriptive string?

I assumed the former.  Alternate human-readable representations can appear
in auxiliary variables or attributes (e.g. "text_time").

> A better example, from the Summary:
> > ...
> > This specifies that for grid point u(1,2,1,1), the value of the
> > "z" dimension (vertical level) can be found in p(1,2,1,1) or in 
> > vpt(1,2,1,1).
> 
> does this mean that both p and vpt are needed, or just one? are they
> both valid?

You're right that this is not clear from the Summary.  My assumption is that
it means that the values of p and vpt are used to uniquely determine the
value of z at each point of the grid.  How z is to be determined from p and
vpt or even how this information is to be represented in the file is not
clear to me.

> ---------------------------------------
> > (2) Vertical Coordinate Systems/Levels
> > 
> > 1) When necessary, a referential variable can be used as an index
> >    into associated variables
> > 2) This referential indexing is indicated by a variable or global
> >    attribute with the same name as the dimension 
> 
> This is covered implicitly by the "Coordinate Reference"
> convention. A good set of examples would help, though.

Yes, and that is what we ultimately hope to provide.  Example NUWG-approved
CDL files are intended to be an important product of our deliberations, and
in some cases a more practical way to present details of our consensus than
a formal specification.  These also give people a set of examples against
which to test their generic applications.

> ---------------------------------------
> > (3) Navigation
> 
> > Of all the special topics, the conventions concerning navigation are
> > the least mature.  Thus far, we have agreed that the navigation information
> > associated with a grid will be stored in a suite of navigation variables.
> > These variables are defined by the GRIB Edition 1 document by John Stackpole
> > in the section on the GDS (Grid Description Section) octets 7-44.  The
> > actual set of variables stored for any given navigation, will depend on
> > that navigation.  For example, the variables needed to describe a polar
> > stereographic grid are different than the variables needed to describe a
> > simple lat/lon grid.  
> > 
> > Each suite of navigation variables must contain a numeric ID containing 
> > the grid identification number, and an indication of the originating 
> > center, both assigned by the GRIB Edition 1 document. Missing values 
> > may be used if the particular grid is not described by the GRIB document.
> > For example, the navigation variables "grid_number" and "center_id" 
> > may be used.
> > 
> > Each variable that is defined on the grid must have the "navigation_dim"
> > variable attribute associated with it.  The string defined in this
> > attribute is the name of the dimension by which all navigation variables
> > are dimensioned.  In this way, the "navigation_dim" groups all the 
> > navigaton variables together (in the same sense that a structure groups
> > quantities of varying types together).  The "navigation_dim" attribute
> > also indicates which variables in a netCDF file are actually defined
> > on the grid.  
> ...
> > Summary of Conventions:
> > 
> > 1) Navigation information stored in variables and dimensioned 
> >    by the value of the variable attribute "navigation_dim" 
> 
>    You seem to actually be trying to define a structure. Why tie
> it together with an artificial dimension called "nav"? 

Because the netCDF model is incapable of directly representing structures;
it can only directly represent scalars and arrays and has no nesting
capabilities.  But an artificial dimension can be used to associate a
collection of named variables into something isomorphic with a structure.
The dimension name can then be used much like a structure name, to identify
the cluster of variables that use it.  This still doesn't support nested
structures, because a dimension can't have a dimension, but we think it's
adequate so far.  The limitations of supporting a fully-functional Fortran77
interface prevent adding nested structures to the netCDF model (though
netCDF actually supports types such as short and byte that can't be used in
strictly conforming Fortran77 programs).

> > 2) All grid variables have the "navigation_dim" attribute
> 
>    Why not just a global attribute? Are you trying to 
> seperate the real data from the navigation data? It seems to me
> that a more natural way to do it is through the dimensions. If a
> field looks like var(lon, lat, level) then it's "defined on the grid".

navigation_dim can't be a global attribute because we want to be able to
store variables defined on multiple grids in the same file.  For example,
we want to be able to store a satellite image and model output data for the
same time in the same netCDF file, requiring two different navigation
dimensions for two very different sets of georeferencing parameters.

> > 3) Which variables (content, not variable names) defined in
> >    GRIB edition 1 document GDS octets 7-44 "Grid description".
> >    (Table C).  Content determined by which grid navigation described.
> > 4) A numeric ID listing the Grid Identification number and an originating 
> >    center ID from the GRIB edition 1 document must be included in the
> >    navigation variables.  Missing data values are OK for grids not
> >    described by a GRIB document.
> 
>    I disagree:
>         1) makes file not self describing
>         2) GRIB variables are predefined. None of the variables I work
> with are GRIB variables.
>         3) too "GRIB-centric"
> 
>    I dont mind GRIB ID being one of the ways to specify a variable or
> its navigation. Im sure its quite convenient if you're translating GRIB
> to netcdf.

Right.  Actually we agreed on another navigation variable, "nav_model",
associated with each navigation that provides the context in which to
interpret all the other variable names:

        char    nav_model(nav, nav_len) ;      // navigation parameterization
                nav_model:long_name = "navigation model name";
	...

and for GRIB-centric data, its value is:

        nav_model = "GRIB1" ;

but for parameterizations based on the Federal Geographic Data Committee
Content Standards for Digital Geospatial Metadata it could be:

        nav_model = "FGDC-1994" ;

and for parameterizations based on the geo-TIFF model, it might be

        nav_model = "geo-TIFF version 1" ;

Notice that it is possible to use multiple navigation parameterizations
within the same netCDF file with this mechanism.

We haven't yet agreed on how many values of nav_model our generic
applications should support, but we want to support at least "GRIB1".
Yesterday you gave us the idea of having a very simple nav_model used when
coordinate variables suffice to specify the georeferencing, for example with
simple lat/lon grids.  Some suggestions for this parameterization were
nav_model = "simple" or "BDN" or "" or even the default interpretation when
a variable has no navigation dimension and hence no nav_model variable.  I
think I like this last convention best, but we need to try it out.  I'll be
proposing a simple lat/lon gridded file soon that uses one of these
conventions.

> > 5) Ordering or naming of grid dimensions not subject to convention.
> >    Dimensions defining grid variables defined by "x_dim" and "y_dim" 
> >    navigation variables
> 
>    How do you know what means what? Why not make some naming conventions
> for the dimensions?

The GRIB1 georeferencing model assumes an "x" and "y" dimension for *some*
projections, so we require that the netCDF file make clear which netCDF
indices correspond to which GRIB1 indices in such cases.  We try to avoid
requiring particular names when we can avoid it with indirection.

> Well, I guess Im not very happy with this. 
> A few thoughts:
> 	* Are we ok with the notion that gridded data == "geo referenced" data.
> If not, then we are into a specialization of "gridded data". I think we
> might change the name to "Conventions for GeoReferenced Gridded Data".

I agree, since there are lots of examples of gridded netCDF data that's not
georeferenced. 

> 	* Given that, I think that referencing grids comes down to two
> (always orthogonal in my possibly limited experience) parts: 
>         1) specifying the grid in its natural projection plane, and 
>         2) specifying the projection function.
> The first part can (almost) entirely be done with coordinate variables or
> coordinate reference variables. The second part involves enumerating each
> projection and its parameters. Theres no obvious reason not to adopt the
> FGDC's work on this enumeration, with extensions for GRIB or other formats.

Our approach permits using either the FGDC or GRIB approach, as well as
others, with the nav_model variable.  One obvious reason for us not to adopt
the FGDC approach is that we have megabytes of GRIB1 data pouring into our
machines every hour, and most of us lack familiarity or experience with the
FGDC standard.  This might be a good place to note where it's available:

    ftp://fgdc.er.usgs.gov/pub/metadata/meta.6894.ps  (PostScript)
    ftp://fgdc.er.usgs.gov/pub/metadata/meta.6894.wp5 (Wordperfect 5.0)
    http://fgdc.er.usgs.gov/			      (FGDC Home Page)

> 	* There is a reasonably big payoff to getting this geo referencing
> right. I assume its been driven so far by the "real time" RUC feeds, 
> obviously "GRIB-centric". The modelers are all in their own private Idaho,
> but with a new generation of models poised to define their data formats,
> we might get lucky if we "do it right". Does anyone have any opinions on the
> technical merit / politics surrounding the FGDC stuff? I agree with the 
> skepticism at the UCAR data conference about "one size fits all" data formats.
> Nonetheless, we should do what we can when we can. I think gridded data 
> can be standardized along the lines we are discussing. I think a few more
> iterations.... 

Ultimately, if a great data fusion framework or "killer" applications are
developed that require an FGDC parameterization, data will be put in that
form, but right now NUWG feels that most of our constituency is more
familiar with the GRIB georeferencing, and we already are getting huge
volumes of data that use it.

When everyone is more familiar with it, we should consider whether it's a
good candidate for an acceptable value for our "nav_model" variable.  But
right now it seems somewhat incomplete and GIS- and USGS-centric.  For
example, GRIB and BUFR are not mentioned, but DEM (USGS Digital Elevation
Model format) and DTED (Digital Terrain Elevation Data format) are.  But on
page 39 netCDF is included, so maybe it's complete enough :-).

Sorry for the length of this ...

--Russ

______________________________________________________________________________

Russ Rew                                           UCAR Unidata Program
russ@unidata.ucar.edu                              http://www.unidata.ucar.edu


Organization: NCAR ACD
From: caron@dilbert.acd.ucar.edu (John Caron)
Subject: Re: Navigation Information Query
To: nuwg@comet.ucar.edu
Date: Wed, 26 Apr 1995 10:03:58 -0600 (MDT)


Heres how I would think about this problem using the principle that
the dimensions describe the grid via coordinate variables and coordinate
reference variables, and the "nav" structure defines the mapping of the grid
to world coordinates.

For reference, here is a repeat of my proposal for "Coordinate References",
which are an extension of "Coordinate Variables". (actually I would like to
reword it some to make this case more explicitly covered, as I wasnt
thinking of non-indeependent coordinate systems at the time).

   Coordinate Variables
       A variable with the same name as a dimension is a "coordinate
   variable", and its data values are the coordinate values for that dimension.
   The variable must be indexed by the dimension.

   Coordinate Reference
       Coordinate systems cannot always be described by coordinate
   variables which are single valued and have the same cardinality as
   a dimension.  A "coordinate reference" is one or more variables that
   also describe a coordinate system, and may have different
   sizes than the dimension it describes.
       A "coordinate reference variable" is a variable whose data values
   describe the coordinate values for a dimension. Its cardinality is a function
   of the size of the dimension. 
       A coordinate reference is defined by an attribute with
   the same name as a dimension. The value of the attribute is the
   name(s) of coordinate reference variable(s).  An attribute that is global 
   becomes the default description for that dimension in the file, 
   and may be overridden by a variable's attribute.


> // everything starts out normally
>
>         byte   Z(elevs, radials, refs);
>                Z:long_name       = "Reflectivity";
>                Z:units           = "dBZ";
>                Z:navigation      = "navZ";
>
>         byte   V(elevs, radials, vels);
>                V:long_name       = "Velocity";
>                V:units           = "meters / second";
>                V:navigation      = "navV";

So we have a 3D grid. The two variables share two of the dimensions. The
third is different. In the simplest case, we define 4 coordinate variables:

        float elevs(elevs);
                elevs:long_name = "elevation angles";
                elevs:units     = "radians";

        float radials(radials);
                radials:long_name = "radial angles";    // or something
                radials:units     = "radians";

        float refs(refs);
                refs:long_name = "reflectivity range gates";  // or something
                refs:units     = "meters";

        float vels(vels);
                vels:long_name = "velocity range gates";  // or something
                vels:units     = "meters";

This assumes that the coordinates they are independent from each other. 
In the post, it appears they are not independent, so how do we handle that case?
The first two coord. vars are probably correct:

        float elevs(elevs);
                elevs:long_name = "elevation angles";
                elevs:units     = "radians";

        float radials(radials);
                radials:long_name = "radial angles";    // or something
                radials:units     = "radians";

And we need a way to specify the other coord as a function of elev angle:

         float  rangeZ(elevs, refs);
                rangeZ:long_name        = "Radial reflectivity range";
                rangeZ:units            = "meters";

         float  rangeV(elevs, vels);
                rangeV:long_name        = "Radial velocity range";
                rangeV:units            = "meters";

So heres the coordinate reference, defined as a global or variable-specific 
attribute:
	: refs = "rangeZ";
	: vels = "rangeV";


Now, its the job of the nav structure to map the grid coordinate system,
(defined by the coord vars. "elevs", "radials", and the coord reference vars.
"rangeZ", "rangeV") to world coords, say (lat, lon, altitude). So we imagine
we have a tranformation function t(azi, zen, rho) -> (lat, lon, z). What
does it need to know? Probably just the lat, lon position
of the radar. Then you just feed it azi = radial(i), zen = elevs(j), and
rho = rangeZ(j,k) or rangeV(j,k). Note t() can be a very general function 
in this way.

So you just need, for the nav structure:
	 
         char   nav_model(nav, nav_len) ;      // navigation parameterization
                nav_model:long_name = "navigation model name";

         char   projection_type(nav, nav_len) ;

         float  siteLat(nav);
                siteLat:long_name = "Latitude of site";
                siteLat:units     = "degrees_north";

         float  siteLon(nav);
                siteLon:long_name = "Longitude of site";
                siteLon:units     = "degrees_east";

         float  siteAlt(nav);
                siteAlt:long_name = "Altitude of site above mean sea level";
                siteAlt:units     = "meters";

     data:
         nav_model = "Unidata projection library" ;
         projection_type  = "radar spherical coordinates";


Now, if there were some parameters to the projection function that 
depended on whether you were transforming the reflectivity or the velocity, 
you might come back to either seperate nav structures (navZ and navV as in the
original post, or to Russ' heirarchy.  In this case, I'm guessing there isn't.
In any case, the advantage of this approach might be in reducing the 
complexity of the nav structure.  The overall complexity of the problem is
not obviously reduced, other than obviating the need for 2 nav structures
or a heirarchy of nav structures. 

I would argue that the real advantage, however, is making 
explicit the seperation of the grid description from
the world mapping. Each is straightforward in itself, and a little bit
confusing munged together, especially from the perspective of an 
automatic file reader. Looking at Stackpole's summary of GRIB grids, I can
see how tempting it is to use their system.  After all, there it all is, nicely
packaged and described. All the work is apparently done for you. Its glaring
weakness is that it describes very specific grids, rather than a family
of grids based on user-settable parameters. Such a family would be exactly
what I mean by a projection function. Note that the projection function maps 
any x,y in the projection plane to lat,lon, so it can handle any grid topology. 
The number of projections in common use is reasonably small, but the number of 
different possible grids is infinite.

I am going to continue these thoughts more abstractly in the email thread to
"Georeferenced gridded data conventions", but it would probably be useful to
include this example in that thread.

Organization: NCAR ACD
From: caron@dilbert.acd.ucar.edu (John Caron)
Subject: Re: More comments on netcdf "gridded" data conventions.
To: nuwg@comet.ucar.edu
Date: Wed, 26 Apr 1995 13:51:14 -0600 (MDT)

Georeferenced gridded data conventions, continued...

...
> I don't understand the necessity or desirability of having an attribute
> named "dimension" for coordinate reference variables.  If the coordinate
> attribute names the coordinate reference variables and has the same name as
> the dimension, why do the coordinate variables also have to have attributes
> that name the dimension?  This seems to be redundant representation of the
> same information.  Problems with requiring redundant representation include
> the possibility that the two representations are inconsistent, and the
> probability that data providers will neglect to provide the information in
> one or the other form.
> 
Yes, you're right, we dont need the attribute named dimension, since the 
coordinate reference properly associates the dimension with the coordinate 
reference variable(s). 

...
> To make this clearer, here's an example we are currently using in our
> ruc.cdl for the output of the MAPS/RUC model:
> 
> 			
>     dimensions:
> 	    record = UNLIMITED ;	     // (reference time, forecast time)
> 	...
>     variables:
> 	    double	reftime(record);     // reference time of the model
> 			reftime:long_name = "reference time";
> 			reftime:units = "hours since 1992-1-1";
> 
> 	    double	valtime(record);     // forecast time ("valid" time)
> 			valtime:long_name = "valid time";
> 			valtime:units = "hours since 1992-1-1";
> 	...
> 	    :record = "reftime, valtime" ;   // "dimension attribute" -- means
> 					     // (reftime, valtime) uniquely 
> 					     // determine record
> 
> Here "reftime" and "valtime" are like what you have described as coordinate
> reference variables for the "record" dimension, and so should have
> attributes named "dimension" with value "record".  But the record dimension
> has a global coordinate attribute "record" that names the two coordinate
> reference variables, so this information is already represented.
> 
> The idea here was to be able to use the "record" dimension for representing
> an ordered pair of times, (reference time, forecast time), so you could
> model outputs for multiple reference times and multiple forecast times in a
> single netCDF file with only one unlimited dimension.  Perhaps this isn't
> the use you had in mind for coordinate reference variables, but we do need
> this capability.

no, I was intending to include this case.  The fact that the example here is 
the unlimited dimension is immaterial, so I wanted to state the general rule 
by which such coordinate descriptions are made. 

...
> > ---------------------------------------
> > > (1) Time
> > > 
> > > 1) Time variables are always type double 
> > 
> > Not always adequate or possible. Should have several approved
> > types of time coordinates.
> 
> For NUWG conventions, we decided that type double is adequate, especially
> since the time units can be anything from picoseconds to eons and can
> specify a base time offset, according to the udunits library standard.  I
> don't remember the rationale for not permitting offsets from a base time to
> use a short types, which would seem to be desirable if the bulk of data
> in a file were relatively low-resolution time offsets.
> 
> We also agreed that data providers can include other representations for
> times as well (e.g. human readable).  This is a general property of the NUWG
> conventions, that they specify a minimum of what should be in a netCDF file
> so that the generic applications we are developing will be able to deal with
> it.  We don't proscribe adding extra information to netCDF data; data
> providers are free to include additional attributes, variables, and
> dimensions for other purposes, but our applications won't use that
> information.  Furthermore, data providers are free to develop a more
> restrictive set of conventions (e.g. requiring that a time variable always
> be named "time") that our generic applications will be able to handle.

So if I understand you, the convention is to always include a type double
time variable, and optionally other variables that describe the time. So
if I have monthly averaged data, my time variable could be month number, with
units "months"? Then I provide an alternate description which is the month
name?  Generic application, would provide both descriptions to the user?
The motivation for this presumably is to define a time ordering of the data?
If these are true, then I agree with this convention.

> > > 2) Time variables are indexed by another variable (can 
> > >    be the unlimited dimension)

I presume this means "you must define a coordinate variable for the time
coordinate"?

> > > 3) The names given to the time variables and the indexing variables 
> > >    are not subject to convention
> > 
> > How do you know which dimension is time? Also, if you want to use
> > "coordinate variables" (often handy), then the variable name must =
> > dimension name, by convention.
> 
> Applications that need to know the name of the time variable will have to be
> provided that information as an input, either in a table of associated
> variable names, as an argument, as a clickable selection, etc.  We decided
> very early that we wanted NUWG conventions to avoid explicit variable names,
> where possible; the French name for the time variable should be acceptable
> for French data, and applications that require specific variable names will
> have to support a mapping between file variable names and names used in the
> application.  Again, more restrictive conventions that require explicit
> variable names should work fine with this set of conventions.

Here I continue to disagree. Time is important enough that we have a few
conventions about it, such as requiring a coordinate variable of type
double.  So you know that the generic application needs to figure out
which coordinate is time.  So you live in Quebec and know you better
damn well not assume English. So you define a lookup table that specifies
that the real name of the time variable is. The way that you do that is
you specify the keyword "TIME", and the name of the time variable. Ooops!
there we are, using english again. Anyway the point is, you need to tell
a generic application which is the time coordinate, and you might as well 
make some convention about it.  How about a global attribute named time,
whose value is the name of the time dimension? That way all of the file
is in the native language of the user, and only one place has this 
conventional name? (By the way, I dont care if its English, but it has to
be fixed for all users, no matter what language). You already have
English keywords like "data", and "variables" etc in the netcdf file.

...
> > Comment: possible problem: do multiple coordinate reference variables
> > create unique instances, such as ref_time/valid_time, or do they
> > describe alternate possibilities, such as a "secs since..." and a
> > descriptive string?
> 
> I assumed the former.  Alternate human-readable representations can appear
> in auxiliary variables or attributes (e.g. "text_time").
> 
> > A better example, from the Summary:
> > > ...
> > > This specifies that for grid point u(1,2,1,1), the value of the
> > > "z" dimension (vertical level) can be found in p(1,2,1,1) or in 
> > > vpt(1,2,1,1).
> > 
> > does this mean that both p and vpt are needed, or just one? are they
> > both valid?
> 
> You're right that this is not clear from the Summary.  My assumption is that
> it means that the values of p and vpt are used to uniquely determine the
> value of z at each point of the grid.  How z is to be determined from p and
> vpt or even how this information is to be represented in the file is not
> clear to me.

I think it will work to assume that both are needed, and their combination
is unique.  Even when they are really synonyms, I think that interpretation
is valid.  I would say in this example, that the "z coordinate" IS the
combination (p, vpt).  The projection function would be responsible for
mapping (p, vpt) to altitude in km if that was desired. This also works for
the hybrid coordinate system, which is the most complicated vertical coordinate
I have seen.

...
> >    You seem to actually be trying to define a structure. Why tie
> > it together with an artificial dimension called "nav"? 
> 
> Because the netCDF model is incapable of directly representing structures;
> it can only directly represent scalars and arrays and has no nesting
> capabilities.  But an artificial dimension can be used to associate a
> collection of named variables into something isomorphic with a structure.
> The dimension name can then be used much like a structure name, to identify
> the cluster of variables that use it.  This still doesn't support nested
> structures, because a dimension can't have a dimension, but we think it's
> adequate so far.  The limitations of supporting a fully-functional Fortran77
> interface prevent adding nested structures to the netCDF model (though
> netCDF actually supports types such as short and byte that can't be used in
> strictly conforming Fortran77 programs).

ok, I get it, artificial dimensions are really structures.

> 
> > > 2) All grid variables have the "navigation_dim" attribute
> > 
> >    Why not just a global attribute? Are you trying to 
> > seperate the real data from the navigation data? It seems to me
> > that a more natural way to do it is through the dimensions. If a
> > field looks like var(lon, lat, level) then it's "defined on the grid".
> 
> navigation_dim can't be a global attribute because we want to be able to
> store variables defined on multiple grids in the same file.  For example,
> we want to be able to store a satellite image and model output data for the
> same time in the same netCDF file, requiring two different navigation
> dimensions for two very different sets of georeferencing parameters.

I propose global attribute ok, with variable override,
since in the majority of files there will be only one navigation_dim.

> 
> > > 3) Which variables (content, not variable names) defined in
> > >    GRIB edition 1 document GDS octets 7-44 "Grid description".
> > >    (Table C).  Content determined by which grid navigation described.
> > > 4) A numeric ID listing the Grid Identification number and an originating 
> > >    center ID from the GRIB edition 1 document must be included in the
> > >    navigation variables.  Missing data values are OK for grids not
> > >    described by a GRIB document.
> > 
> >    I disagree:
> >         1) makes file not self describing
> >         2) GRIB variables are predefined. None of the variables I work
> > with are GRIB variables.
> >         3) too "GRIB-centric"
> > 
> >    I dont mind GRIB ID being one of the ways to specify a variable or
> > its navigation. Im sure its quite convenient if you're translating GRIB
> > to netcdf.
> 
> Right.  Actually we agreed on another navigation variable, "nav_model",
> associated with each navigation that provides the context in which to
> interpret all the other variable names:
> 
>         char    nav_model(nav, nav_len) ;      // navigation parameterization
>                 nav_model:long_name = "navigation model name";
> 	...
> 
> and for GRIB-centric data, its value is:
> 
>         nav_model = "GRIB1" ;
> 
> but for parameterizations based on the Federal Geographic Data Committee
> Content Standards for Digital Geospatial Metadata it could be:
> 
>         nav_model = "FGDC-1994" ;
> 
> and for parameterizations based on the geo-TIFF model, it might be
> 
>         nav_model = "geo-TIFF version 1" ;
> 
> Notice that it is possible to use multiple navigation parameterizations
> within the same netCDF file with this mechanism.

yes, I think this makes sense, and allows future growth without breaking
existing conventions. I withdraw my objections now that I understand what
you are doing. Thanks for updating the WWW document to reflect your 
current convention. I dont have any problem with making GRIB one of the
ways to describe grids, as long as its not the only way.

> 
> We haven't yet agreed on how many values of nav_model our generic
> applications should support, but we want to support at least "GRIB1".
> Yesterday you gave us the idea of having a very simple nav_model used when
> coordinate variables suffice to specify the georeferencing, for example with
> simple lat/lon grids.  Some suggestions for this parameterization were
> nav_model = "simple" or "BDN" or "" or even the default interpretation when
> a variable has no navigation dimension and hence no nav_model variable.  I
> think I like this last convention best, but we need to try it out.  I'll be
> proposing a simple lat/lon gridded file soon that uses one of these
> conventions.
> 
> > > 5) Ordering or naming of grid dimensions not subject to convention.
> > >    Dimensions defining grid variables defined by "x_dim" and "y_dim" 
> > >    navigation variables
> > 
> >    How do you know what means what? Why not make some naming conventions
> > for the dimensions?
> 
> The GRIB1 georeferencing model assumes an "x" and "y" dimension for *some*
> projections, so we require that the netCDF file make clear which netCDF
> indices correspond to which GRIB1 indices in such cases.  We try to avoid
> requiring particular names when we can avoid it with indirection.

Same problem as with time. You simply have to know which dimension is x, y and z,
if you want to reference the data to the earth. Clearly these are part of the nav
structure, and have to be clarified. Thinking of the recent radar example, which
was using spherical coordinates, perhaps you want to define nav variables "dimen1", 
"dimen2", etc. whose values are the names of the dimensions. Its ok with me if
you want to stick with "x_dim", "y_dim", and add "z_dim" to accomplish this, but you
also have to say what order the projection function expects its argument. So lets
just define x,y,z as the order in which the projection function expects its argument.

A related GRIB question: knowing the GRIB nav structure, how do you find out what the
world coords are for each grid cell? Is there a standard library, a lookup table or what?
Because whatever technique you use, there is probably an implicit ordering of the
coordinates that you just have to know. So lets make it explicit (by convention), so that 
the file is truly self-describing.

...
> 
> Ultimately, if a great data fusion framework or "killer" applications are
> developed that require an FGDC parameterization, data will be put in that
> form, but right now NUWG feels that most of our constituency is more
> familiar with the GRIB georeferencing, and we already are getting huge
> volumes of data that use it.
> 
> When everyone is more familiar with it, we should consider whether it's a
> good candidate for an acceptable value for our "nav_model" variable.  But
> right now it seems somewhat incomplete and GIS- and USGS-centric.  For
> example, GRIB and BUFR are not mentioned, but DEM (USGS Digital Elevation
> Model format) and DTED (Digital Terrain Elevation Data format) are.  But on
> page 39 netCDF is included, so maybe it's complete enough :-).
> 

The FGDC "Spatial Reference Information", (section 4) seems to be the relevent subset for
"nav" information. I would tentatively suggest we ignore the rest, including section 6
that lists data formats such as netCDF etc.
There are a lot of fields in section 4 I dont completely understand. It has some 
satellite projections (Landsat), complete Map Projections, but apparently no 
radar spherical projections. Apparently it describes both the grid and the projection,
but the projections are parameterized, and thus describe classes of projections.

If I read Stackpole's GRIB document right, the grids described are fixed and not 
parameterized. (Does anyone understand differently?).  If so, it cant be used for 
general georefrenced data.  The FGDC clearly parameterizes their projections, and 
also seperates the grid description from the projection.  I would prefer that the 
grid be described by coordinate and coordinate reference variables, and the 
projection only be described by the nav variable.  However, if in order to support
FGDC we had to put also the grid description into FGDC form, I think I could
live with it.  Since I'm not sure of the details of implementing the FGDC, its
possible that the coordinate and coordinate reference variables might even satisfy
what the FGDC means by "content standards for  metadata".


In summary, I think my proposals fall into three categories:

	1) Make a general definition of how to create "coordinate reference variables",
of which the (ref_time, valid_time) and hybrid vertical coordinates are instances.
	2) Add enough conventions to make generic applications be able to georeference
gridded data.
	3) Seperate the grid description from the projection description. Parameterize
the projection description and encode in the nav structure. use coordinate reference 
variables to describe the grid.


	I think that the nuwg intends to do the first two, and perhaps my ideas
are merely to clarify that. The third is somewhat different from what exists now. 

To: caron@dilbert.acd.ucar.edu (John Caron)
Cc: nuwg@comet.ucar.edu
Subject: Re: Navigation Information Query 
Organization: UCAR Unidata Program
Date: Thu, 27 Apr 1995 09:51:26 -0600
From: Russ Rew <russ@unidata.ucar.edu>

John,

Before considering the rest of your proposal, I have to correct what I think
may be a misunderstanding:

You wrote:

>                    ... Looking at Stackpole's summary of GRIB grids, I can
> see how tempting it is to use their system.  After all, there it all is,
> nicely packaged and described. All the work is apparently done for
> you. Its glaring weakness is that it describes very specific grids, rather
> than a family of grids based on user-settable parameters. Such a family
> would be exactly what I mean by a projection function. Note that the
> projection function maps any x,y in the projection plane to lat,lon, so it
> can handle any grid topology.  The number of projections in common use is
> reasonably small, but the number of different possible grids is infinite.

and in a second posting:

> If I read Stackpole's GRIB document right, the grids described are fixed and
> not parameterized. (Does anyone understand differently?).  If so, it cant be
> used for general georefrenced data.  The FGDC clearly parameterizes their
> projections, and also seperates the grid description from the projection.  I
> would prefer that the grid be described by coordinate and coordinate
> reference variables, and the projection only be described by the nav
> variable.  However, if in order to support FGDC we had to put also the grid
> description into FGDC form, I think I could live with it.  Since I'm not
> sure of the details of implementing the FGDC, its possible that the
> coordinate and coordinate reference variables might even satisfy what the
> FGDC means by "content standards for metadata".

Stackpole's summary of GRIB grids copies the WMO standard specification of a
GRIB Grid Description Section (GDS) and adds some particular frequently-used
NMC-specific grids that NMC identifies with a small integer grid ID.  

The GDS provides a way to describe infinite families of grids based on
user-settable parameters.  It's the parameterizations in the GDS for GRIB
that I intended to use for grid descriptions.  That's one of the reasons I
stated at our last meeting that I wanted to change our primary reference to
the WMO GRIB document rather than Stackpole's version of it with all the
NMC-specific additions.  The netCDF files we will produce from GRIB products
have a grid parameterization based on the WMO GRIB GDS, even when an NMC
grid ID is also available.  There are some "International Exchange Grids"
for which grid IDs have been assigned that are independent of the
originating center, but my GRIB decoder always manufactures a GDS
parameterization for these even when it's not sent with the GRIB product.
Also for any of the infinite number of non-cataloged grids, a grid ID of
255 is used which specifically means the grid is specified in the GDS.

As far as I can see, the GRIB GDS is very similar to the FGDC Content
Standards for Geospatial Data parameterizations for grids.  Each
parameterization includes some families of grids not included by the other,
but they both include all common projections.  The GRIB1 GDS standard
includes some unique parameterizations, such as the "quasi-regular" grids
that are arguably not even grids, but they are already so commonly used and
that we have to deal with them.

If you want to see an on-line version of the WMO GRIB1 GDS specification
look at:

   ftp://nic.fb4.noaa.gov/pub/nws/nmc/docs/gribguide/guide.txt

which is a version of Stackpole's document without most of the NMC-specific
additions.  I don't know of any on-line copy of the original WMO standard.

This is all independent of your suggestion to separate the description of
the grid from the description of the projection.  I'm still looking at that
...

--Russ

Organization: NCAR ACD
From: caron@dilbert.acd.ucar.edu (John Caron)
Subject: Re: Navigation Information Query
To: russ@unidata.ucar.edu (Russ Rew)
Date: Thu, 27 Apr 1995 10:31:20 -0600 (MDT)
Cc: nuwg@comet.ucar.edu

Thanks for clarifying that, that makes GRIB conventions infinitely more useable.
Do you have an answer to this:

> A related GRIB question: knowing the GRIB nav structure, how do you find out what the
> world coords are for each grid cell? Is there a standard library, a lookup table or what?
> Because whatever technique you use, there is probably an implicit ordering of the
> coordinates that you just have to know. So lets make it explicit (by convention), so that
 

Organization: UCAR Unidata Program
From: Russ Rew <russ@unidata.ucar.edu>
To: caron@dilbert.acd.ucar.edu (John Caron)
Cc: nuwg@comet.ucar.edu
Subject: Re: Navigation Information Query 
Date: Thu, 27 Apr 1995 11:38:07 -0600

John,

> A related GRIB question: knowing the GRIB nav structure, how do you find
> out what the world coords are for each grid cell? Is there a standard
> library, a lookup table or what?  Because whatever technique you use,
> there is probably an implicit ordering of the coordinates that you just
> have to know. So lets make it explicit (by convention), so that

I don't know of any freely-available standard library for finding out world
coordinates based on GRIB1 GDS navigation.  Maybe ATD/RDP's zebra package
<URL:http://www.atd.ucar.edu/rdp/zeb.html>, has dealt with this problem?
This is what the mythical "udgeo" library is intended to handle.

Given a navigation (in the form of an open netCDF file id and a navigation
dimension id), the udgeo library would provide a mapping between netCDF
variable indices (in the form of a netCDF variable id and a corresponding
vector of indices) and world coordinates (in some convenient form, e.g. UTM
coordinates, perhaps plus time).  Various convenience functions would permit
mapping cross-sections of indices all at once to arrays of world
coordinates, much like netCDF hyperslab access.  Presumably it could be
initialized to understand navigations in other forms than the navigation
dimension of an open netCDF file, for example by parsing descriptions from a
small navigation language that could be stored as a string.  Steve Emmerson
has proposed such a language and shown some of its advantages.

Development of the udgeo library is on a list of possible future Unidata
projects, but plans are not very far along and resources haven't yet been
committed.  I imagine that large parts of the necessary code are already
available as components of USGS Fortran projection libraries, geoTIFF
software, various large application packages, etc.  We would consider
adopting/adapting someone else's library or classes that already solved the
problem, if we knew of any good candidates.

As an aside, someone should probably be looking at the zebra conventions for
storing "Meteorological DataChunk" objects and "Nspace DataChunk" objects in
netCDF files, to see how they have dealt with coordinates and georeferencing
conventions.

--Russ

To: caron@dilbert.acd.ucar.edu (John Caron)
Cc: nuwg@comet.ucar.edu
Subject: Re: More comments on netcdf "gridded" data conventions. 
Organization: UCAR Unidata Program
Date: Mon, 01 May 1995 14:11:47 -0600
From: Russ Rew <russ@unidata.ucar.edu>


Still more on georeferenced gridded data conventions, responding to John
Caron's response ...

> So if I understand you, the convention is to always include a type double
> time variable, and optionally other variables that describe the time. So
> if I have monthly averaged data, my time variable could be month number, with
> units "months"? Then I provide an alternate description which is the month
> name?  Generic application, would provide both descriptions to the user?
> The motivation for this presumably is to define a time ordering of the data?
> If these are true, then I agree with this convention.

Yes, mostly, except that NUWG time variables are supposed to use units that
are understood by the udunits library (see
<URL:ftp://ftp.unidata.ucar.edu/pub/udunits/>).  "months" is not a legal
time unit for udunits, since it can't be converted into an exact number of
seconds, like "fortnight" can.  I don't think NUWG conventions have dealt
with time variables such as "months" for monthly averaged data.  I don't
think the NUWG conventions should require that you store a month number as a
double.  In fact since type conversions such as integer to double are more
easily handled than units conversions, I agree with you that it doesn't seem
necessary to require by convention that time variables must be of type
double, but that's currently the NUWG proposal.

> > > > 2) Time variables are indexed by another variable (can 
> > > >    be the unlimited dimension)
> 
> I presume this means "you must define a coordinate variable for the time
> coordinate"?

Within reason, but I can imagine defining a dimension with the purpose of
specifying only a time ordering of events, without specifying the time at
which each event happened, in which case a time coordinate would be
unnecessary.  I think the NUWG conventions are intended for certain kinds of
meteorological and oceanographic data, but that the NUWG conventions are
only appropriate for a subset of all the data the might be stored in netCDF.

> > > > 3) The names given to the time variables and the indexing variables 
> > > >    are not subject to convention
> > > 
> > > How do you know which dimension is time? Also, if you want to use
> > > "coordinate variables" (often handy), then the variable name must =
> > > dimension name, by convention.
> > 
> > Applications that need to know the name of the time variable will have
> > to be provided that information as an input, either in a table of
> > associated variable names, as an argument, as a clickable selection,
> > etc.  We decided very early that we wanted NUWG conventions to avoid
> > explicit variable names, where possible; the French name for the time
> > variable should be acceptable for French data, and applications that
> > require specific variable names will have to support a mapping between
> > file variable names and names used in the application.  Again, more
> > restrictive conventions that require explicit variable names should work
> > fine with this set of conventions.
> 
> Here I continue to disagree. Time is important enough that we have a few
> conventions about it, such as requiring a coordinate variable of type
> double.  So you know that the generic application needs to figure out
> which coordinate is time.  So you live in Quebec and know you better
> damn well not assume English. So you define a lookup table that specifies
> that the real name of the time variable is. The way that you do that is
> you specify the keyword "TIME", and the name of the time variable. Ooops!
> there we are, using english again. Anyway the point is, you need to tell
> a generic application which is the time coordinate, and you might as well 
> make some convention about it.  How about a global attribute named time,
> whose value is the name of the time dimension? That way all of the file
> is in the native language of the user, and only one place has this 
> conventional name? (By the way, I dont care if its English, but it has to
> be fixed for all users, no matter what language). You already have
> English keywords like "data", and "variables" etc in the netcdf file.

Good points.  I think my example of French vs. English missed the original
motivation for not having rigid variable names in the NUWG conventions.
NUWG wants multiple existing application packages that already have their
own conventions for variable names to be able to access netCDF data.  Rather
than requiring existing applications and existing data archives to change,
NUWG chose to avoid contention about names and just specify that tables
would be used to specify variable name mappings.  The idea was
application-independence, not language-independence.  If data from different
source files is to be "fused" in an application, a table may be necessary
for each input file, since they could use different names for the same
variables.

One problem with specifying the time variable with a global attribute is
that there may be multiple time variables (e.g. the time of the 3-hour
forecast for a 12Z model run may be the same as the time of the 12-hour
forecast of a a 3Z model run, but these must be distinguished if the data
for both are stored together in the same file).  The COARDS conventions at
<URL:http://ferret.wrc.noaa.gov/noaa_coop/coop_cdf_profile.html> similarly
state that 

    The names of coordinate variables are not standardized by these
    conventions (since data sets may in general contain multiple coordinate
    variables of the same orientation).

By the way, the English keywords "data" and "variables" are not stored in
the netCDF file and have no special significance in the netCDF library.
They are used in the netCDF utilities ncdump and ncgen built on top of the
library, but need not be used by data providers conforming to the NUWG
conventions.  But your point is valid because we have had to sacrifice
language-independence in the library in a few other cases, e.g. the reserved
attribute name "_FillValue".

 ...
> > navigation_dim can't be a global attribute because we want to be able to
> > store variables defined on multiple grids in the same file.  For example,
> > we want to be able to store a satellite image and model output data for the
> > same time in the same netCDF file, requiring two different navigation
> > dimensions for two very different sets of georeferencing parameters.
> 
> I propose global attribute ok, with variable override,
> since in the majority of files there will be only one navigation_dim.

I think that would work OK.

 ...
> > >    How do you know what means what? Why not make some naming conventions
> > > for the dimensions?
> > 
> > The GRIB1 georeferencing model assumes an "x" and "y" dimension for *some*
> > projections, so we require that the netCDF file make clear which netCDF
> > indices correspond to which GRIB1 indices in such cases.  We try to avoid
> > requiring particular names when we can avoid it with indirection.
> 
> Same problem as with time. You simply have to know which dimension is x, y
> and z, if you want to reference the data to the earth. Clearly these are
> part of the nav structure, and have to be clarified. Thinking of the recent
> radar example, which was using spherical coordinates, perhaps you want to
> define nav variables "dimen1", "dimen2", etc. whose values are the names of
> the dimensions. Its ok with me if you want to stick with "x_dim", "y_dim",
> and add "z_dim" to accomplish this, but you also have to say what order the
> projection function expects its argument. So lets just define x,y,z as the
> order in which the projection function expects its argument.

We use "x" and "y" when the GRIB GDS specification uses "x" and "y"
(e.g. for polar stereographic projections, parameterized in terms of Nx, Ny,
Dx, Dy, ...) and "i" and "j" when the GRIB GDS specification uses "i" and
"j" (e.g. latitude/longitude grids, parameterized in terms of Ni, Nj, Di,
...).  For a different georeferencing model, we would presumably name the
parameters using names specific to a standard description of that model.
The use of parameter names from a standard reference avoids specifying their
order by a NUWG-specific convention.

 ...
> In summary, I think my proposals fall into three categories:
> 
> 	1) Make a general definition of how to create "coordinate reference 
>          variables", of which the (ref_time, valid_time) and hybrid vertical
>          coordinates are instances. 
> 	2) Add enough conventions to make generic applications be able to
>          georeference gridded data.
> 	3) Seperate the grid description from the projection
>          description. Parameterize the projection description and encode
>          in the nav structure. use coordinate reference variables to
>          describe the grid.
>
> 	I think that the nuwg intends to do the first two, and perhaps my
> ideas are merely to clarify that. The third is somewhat different from what
> exists now.

Yes, and I think I'm convinced that separation of grid descriptions from
projection descriptions can simplify things.  However, if we are relying on
a standard parameterization such as the GRIB edition 1 standard that
doesn't separate these, we end up with a much more complicated task because
we then have to develop the mechanisms ourselves instead of pointing to the
external reference.  I think the FGDC standard doesn't separate these much
better than the GRIB parameterization, but I may be wrong about that.  If
there is a standard for grid descriptions and projections that has a clean
separation, then you may be right that the benefits would be worthwhile.

--Russ