Re: coordinate systems in netcdf (again)

John Caron (
Fri, 06 Jun 1997 17:26:52 -0600

This is a multi-part message in MIME format.

Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Attached is a long attempt at defining coordinate systems in a
formalized way, along with proposals for (what else?) netcdf conventions
on coordinate variables, and generalized coordinate systems.

Im a bit rusty at this sort of thing, so Im hoping others might have a
look at it and give me some feedback.  Perhaps someone somewhere else
has made a formalized specification in a more succinct way.  If so,
I'd appreciate a pointer to it.

Anyway, I'm muddling around trying to capture what a coordinate system
is in a precise way, trying to make it as general as possible.  I might
be wrong on some fundamental level, and i'd appreciate understanding
that if you can explain it.  Thanks!

(I couldnt read that attachment, so I'll just resend it here again.
Sorry for the 

Content-Type: text/plain; charset=us-ascii; name="coordvar"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline; filename="coordvar"

   A _dimension_ is a named range of integers = {0,1,..size-1}. A dimension
is completely specified by the pair (name, size). You can substitute {1..size}
in what follows if you prefer 1-based indexing. 


   A _variable_ is a function whose domain is D0 x D1 x D2 x .. x Dn = D,
where the Di are the dimensions of the variable, and n is its _rank_.
To include scalar variables of rank 0, we define D0 = {0}.
We can thus write a variable v in functional form as v = f(D) -> R,
where f denotes the function, and R is the range. We will use v as
identical to f in what follows.

   In the context of netcdf files, we represent functions as scalar arrays,
and so are limited to directly representing only scalar functions; some further
convention is needed for vector functions.

Coordinate Variable

   A _coordinate variable_ is a variable that assigns physical values to a dimension. 
It must be a strictly increasing or decreasing function, and has domain consisting of a 
single dimension:  CVi(Di) -> Ri so that CVi is said to be a coordinate variable for 
dimension Di. 

Coordinate System

   If V is a vector space, a _coordinate system_ for V is a set of basis vectors for V, 
along with units to give each coordinate physical meaning. A _coordinate_ here is a synonym
for basis vector.
   Let D be a domain, D = D1 x D2 x .. Dn, and define a set of scaler _coordinate functions_
fi(D) -> Ri.  Let V be the vector space (R1, R2,.. , Rn).  Then the vector function 
Fcs = (f1, f2, ..., fn) is said to be a coordinate system for D, Fcs(D) -> V, if Fcs is 
invertible. Given the discrete nature of D, Fcs is invertible if it is one-to-one, meaning
Fcs maps each point in D to a unique point in V. 

   Given a coordinate system Fcs for domain Dc, a variable v with domain Dv, and Dc a 
subset of Dv, then Fcs is a coordinate system for v. If Dc = Dv, then Fcs is a _complete_ 
coordinate system for v.  The value Fcs(di) = vi for a particular value di in the domain 
is the _position vector_ for di, and the variable is said to be located at vi for point di,
with respect to the coordinate system Fcs.  (I think "Dc is a subset if Dv" is not quite 
right; I probably want to restrict Dc = D1 x D2 x .. Dk to be equal to Dv = D1 x D2 x .. Dn,
with just some dimension Di missing).

   A special case of a coordinate system is one where the coordinate functions are 
coordinate variables, and so depend on a single domension Di.  Then 
Fcs(D1 x D2 x .. x Dn) = (f1(D1), f2(D2), ... fn(Dn)), and Fcs is said to be an 
_independent_ coordinate system.

Coordinate Transformations

A coordinate transformation is an invertible mapping M, between two coordinate systems.
Fcs1 and Fcs2:  
        Fcs1 = M * Fcs2,  M-1 * Fcs1 = Fcs2.
Here * is functional composition, and M-1 indicates the inverse of M.

Georeferencing Coordinate System

   In a georeferencing coordinate system, or GCS for short, there are 3 spatial 
dimensions x,y,z, which correspond as much as possible to the directions "east/west", 
"north/south" and "up/down", respectively.  A GCS is therefore a function
	Fgcs(D) -> (x,y,z)
where x,y,z describe the variable's position or spatial extent in each of the directions.
Note that if describing spatial extent, two values are needed for each direction, eg
x = (xleft,xright) or z = (zhigh,zlow).

Specifying Coordinate Systems in netcdf files.

   We have seen that a general coordinate system is specified by a domain 
D = D1 x D2 x .. Dn, a vector space V (and associated physical units for the basis 
functions), and an invertible function Fcs(D) -> V.  Netcdf semantics map domains to 
named dimensions, and units for coordinates are also very well done.  Variable arrays are 
fine for describing single-valued functions.  All that's really missing are vector valued 

   Here is a proposal for a netcdf convention for specifying coordinate systems. 
The goal is to
	1) build from existing practices.
	2) keep simple things simple
	3) make it flexible enough to handle any coordinate system.

   So the proposal is:

	1) coordinate variables remain an elegent way to define the coordinate system when 

	2) allow the natural extension of coordinate variables to higher dimensions. 
	  "A variable with the same name as a dimension is the coordinate variable for that
	dimension. If V is a variable with domain D1 x D2 .. Dn = D, let Dc be the subset 
	of D with coordinate variables defined. Then a coordinate system is defined on Dc 
	with the function 
		Fcs(Dc) = (cv1(D1), cv2(D2) ...)
        where the cvi's are the defined coordinate variables, and the Di's are each subsets
	of D. For any such Dc, Fcs must be invertible."

	You notice that coordinate variables are restricted to mapping D (in index space) 
	to D (in physical coordinate space).  This is a Good thing, and we try hard to 
	define our dimensions so that we can do exactly that.

	3) more generally, allow the specification of coordinate systems using attributes:

	    "A coordinate system can be defined by an attribute whose name starts with the 
	string 'coordinates' (case insensitive, optional trailing description) and whose 
	value is a (comma or blank delimited) list of variable names in the same file that 
	define the coordinate functions.  The domain Dc of the coordinate system is found 
	by forming the product of the set of any Di that is contained within the domains of 
	the coordinate functions. The coordinate system is defined by the function
                Fcs(Dc) = (cv1(D1), cv2(D2) ...)
        where the cvi's are the named coordinate functions"

	This is meant to cover William Weibel's case of:
	           npoints = 541;
       	 	   	geopotential:coordinates = "lon lat";

	and presumably any other coordinate system (?). It seems likely that the case
	var(dim, dim) would have to be excluded, ie using the same dimension twice
	in a variable declaration (?).

      	4) allow vector valued coordinates, to cover the famous (gen_time, valid_time) 
	from NUWG:

           "A vector valued coordinate function can be specified by enclosing in 
	parentheses a list of variables in the same file that define each component of 
	the coordinate function. Eg:
               geopotential:coordinates = "lon lat (gen_time, valid_time)";

	I still want to:
	   5) allow the specification of extents, as well as point positions for a 
	coordinate function.
	   6) clarify a number of special things about georeferencing coordinate systems
	but I'm running out of gas, and Im not totally sure this whole thing is solid.
        So I'll stop and see if anyone can give me feedback one way or the other.