Re: Coordinate Systems Proposals

Steve Emmerson (steve@unidata.ucar.edu)
Fri, 11 Jul 97 13:45:29 -0600

Greetings,

    I've been following this conversation on coordinate systems for
some time now, partly because it's my job and partly because it's
interesting.  Recently, I've been learning about a new data-model called
the "field" data-model (FDM).  I think it provides a useful framework
for understanding the situation and -- with your indulgence -- I'd like
to use it to present an analysis and to propose a solution.

    In a nutshell, the FDM comprises three coordinate systems, the
transformations between them, and the same dependent variable in all
three systems.  The coordinate systems have the following names:

    MANIFOLD
    BASE
    WORLD

We are already familiar with the manifold coordinate system of a
variable: it is nothing more than the variable's netCDF dimensions in
the absence of any "coordinate" variables.  For example, the variable
"var" in the following

    dimensions:
	i = 5;
	j = 6;

    variables:
	float var(i,j);

is defined in the 2-D manifold coordinate system (i,j).

    The base coordinate system, on the other hand, is the coordinate
system in which we normally think of the variable.  For example, the
variable "var" above could actually be surface temperature on the
Earth, in which case the ""i and "j" MANIFOLD coordinates should
undoubtably map to latitude and longitude BASE coordinates.  Currently,
the netCDF model handles this via "coordinate variables", e.g.

    dimensions:
	lat = 5;
	lon = 6;

    variables:
	float temp(lat,lon);
	float lat(lat);
	float lon(lon);

In the above, the "lat" and "lon" dimensions comprise the "temp"
variable's manifold coordinate system.  The "lat" and "lon" variables,
on the other hand, comprise the variable's base coordinate system.  For
manifold and base coordinate system that are representable in this way,
this convention allows us to easily convert between the two coordinate
systems.

    The problem is that not all useful manifold and base coordinate
systems are representable by this convention.  For one thing, a
variable's base coordinates might not be on the regular (i.e. Cartesian
product) grid implied by this convention.  For another, it is difficult
to represent a variable whose manifold coordinate system has fewer
dimensions than its base coordinate system.  As an example of the later
situation, think of the temperature along a spiral of wire, one end of
which is hot and the other cold.  The manifold coordinate system for
this is one-dimensional: it is the position along the wire from one end.
The base coordinate system, on the other hand, is three dimensional
(one possibility would be a cylindrical coordinate system).  There is
currently no "official" way for the netCDF model to handle these more
complicated situations ("official" in this sense means "in the netCDF
User's Guide" :-).

    Thus, we need a convention for representing the transformation
between a variable's manifold coordinates and the variable's base
coordinates that goes beyond the capabilities of netCDF coordinate
variables.  Ideally, such a convention

    1.  Should be easily understandable.

    2.  Shouldn't break existing applications, i.e. it should be
        possible to operate applications that are unaware of the
        convention -- though the operation might not have all the
        capability of an "aware" application.

    3.  Should be general enough to handle any transformation between
	manifold and base coordinate systems.

    4.  Should be efficient.

    There appear to be two major categories of proposed conventions to
handle this problem.  The categories are: 

    1.  Multidimensional coordinate variables; and
    
    2.  Referential attributes.

Here's an example of a multidimensional coordinate variable approach
for a temperature variable that is defined at irregular positions on the
Earth:

    dimensions:
	lat = 5;
	lon = 6;

    variables:
	float temp(lat,lon);
	float lat(lat,lon);
	float lon(lat,lon);

Here, both "lat" and "lon" have double meanings.  In one sense, "lat" is
one of the manifold coordinates (and could have very little to do with
latitude); in the other sense, "lat" is a variable that associates a base
coordinate (latitude) with every position in the manifold domain.

    Personally, I do not like this use of the same name to refer to
different coordinates in different coordinate systems.  I feel that
it can lead to confusion (for the record, I admit that this argument
can be made against the current system of 1-D coordinate variables).
I also fear that it will break existing applications that assume
one-dimensional coordinate variables

    Here's an example of a referential attribute approach for the same
problem:

    dimensions:
	i = 5;
	j = 6;

    variables:
	float temp(i,j);
	    temp:coordinates = "lat, lon";
	float lat(i,j);
	float lon(i,j);

In this system, the names for manifold and base coordinates are
distinct.  I believe this has the advantages of being clearer (at
the expense of being more verbose) and of not breaking existing
applications, which would limited to working in the manifold (i,j)
coordinate system.  This approach is quite general.  Here's an example
of a base coordinate system comprising a moving x-y grid (e.g. for
tracking a storm):

    dimensions:
	i = 32;
	j = 64;
	time = 100;

    variables:
	float var(time, i, j);
	    var:coordinates = "time, lat, lon";
	float lat(time, i, j);
	float lon(time, i, j);
	float time(time);

And here's an example of the previously mentioned spiral situation using
a 1-D manifold coordinate system and a 3-D (cylindrical) base coordinate
system:

    dimensions:
	s = 100;

    variables:
	float temp(s);	// temperature along spiral
	    temp:coordinates = "z, rho, theta";
			// cylindrical coordinate system (CCS)
	float rho(s);	// distance from CCS center axis
	float theta(s);	// CCS azimuth
	float z(s);	// CCS height

    I think the key concept is the distinction between manifold
coordinate space (in which the phenomena is defined) and base coordinate
space (in which we want to work).  Only in the simplest cases are they
identical.

DISCLAIMER: I work at the Unidata Program Center -- sometimes on the
netCDF package -- but I most definitely do not speak for the netCDF
team.

Regards,
Steve Emmerson   <http://www.unidata.ucar.edu>