Unidata - To provide the data services, tools, and cyberinfrastructure leadership that advance Earth system science, enhance educational opportunities, and broaden participation. Unidata
         
  advanced  
 

Mathematical formalism for Coordinate Systems

John Caron

3/18/98

1. Mathematical Concepts

For our purposes, we are interested only in functions of Rn, the product spaces of the real numbers, and we need only two simple concepts from differential geometry, namely manifold and field.

Manifold. Let w <Rm, W < Rn, (< means "is a subset of") and j : w -> W a function which has a continuous inverse [1]. Then (j, W) is an m-dimensional manifold embedded in Rn. In Figure 1, we show a two dimensional manifold embedded in R3, namely an octant of the surface of a sphere. In this case R2 is the manifold space and R3 is the base space.

(There is some ambiguity whether "the manifold" is w or W. A key point to understand is that the function j makes w and W "topologically equivalent", or "homeomorphic", so informally you can use them interchangeably. To visualize homeomorphic surfaces, think of a rubber sheet that you can stretch and distort, but cannot tear or fold over on itself. )

Field. If for every point on the manifold we have a function Y : w -> D then (j, Y) is a field on the manifold w. D is called the dependent variable space. In Figure 1, we show Y as a scalar function called Temperature; if W represents the surface of the earth, then (j, Y) defines the field "earth surface temperature".

(You might also wonder "is the domain of Y w orW ?" Again, due to the homeomorphism, you can use one or the other, depending on context. Formally, we will use w as the domain, which emphasizes that Y is a function of "independent variables". To emphasize where the temperature is located, formally we use Y o j -1 : W -> D. )

2. Some NetCDF examples

1. Variables without coordinates

  dimensions:

      lon = 128, lat = 64;

  variables:

      float SurfaceTemp(lon, lat);

Despite the fact that we have named our dimensions lat, lon, we have not defined a manifold, and so this is not a field by the definition above.

2. Variables with coordinate variables[2]

  dimensions:

      col = 128, row = 64;

  variables:

      float SurfaceTemp(col, row);

      float lon(col);

      float lat(row);

Now we have a manifold w defined by the product set {lon} X {lat} The function j is defined by (lat, lon) -> (lat, lon, R). The variable SurfaceTemp is simply the function Y: w -> R.

3. Variables on projective geometry surfaces.

  dimensions:

      col = 128, row = 64;

  variables:

      float SurfaceTemp(col, row);

      float x(col), y(row);

      float lat(x,y);

      float lon(x,y);

Now w is defined by the product set {x} X {y} which is the projection plane. We have a non-trivial j = (x,y) -> (lat(x,y), lon(x,y), R), where lat(x,y), lon(x,y) are the projective geometry coordinate transformation functions.

4. Trajectories

  dimensions:

      sample = UNLIMITED;

  variables:

      float Radiance(sample);

      double secs(sample);

      float lat(sample), lon(sample), height(sample);

We can think of this as a 1-D manifold embedded in 4D space time, j: {sample} -> (lat, lon, height, secs ), or as an embedding into a 3D space j: secs -> (lat, lon, height).

 

3. Coordinate Systems

A Coordinate System is a way of specifying elements in a set. A coordinate is a way of specifying an individual set element.

General Sets. For a general set S, you can name each element of the set, so the set of names is a coordinate system. You can also represent the set as an array, and specify the element by index, so this array constitutes a coordinate system, and the indices are coordinates. We will call such a coordinate system an index coordinate system.

Product Sets. To specify an element of a product set, you specify a tuple of the coordinates of the individual sets. This is a product coordinate system.

Vector Sets. Rn is a vector space, and a vector can be written as a unique linear combination of basis vectors, v = a1 x1 + a2 x2 + ... + an xn. A set of basis vectors (x1, x2, ..., xn) is therefore a coordinate system, and the tuple of coefficients (a1, a2, ..., an) is the coordinate for v in that coordinate system. Call this a vector coordinate system.

 

4. Coordinate Systems on Manifolds

One of the more important properties of a manifold j : w -> W, is that a coordinate system on w creates a coordinate system on W, by associating with each coordinate x in w, the coordinate j (x) in W. Every element y in W can therefore be named by naming its corresponding element j-1(y) in w. We call this a manifold coordinate system for W.

Typically w is chosen exactly because it has a convenient and natural coordinate system (typically a product set) to name the elements of the data. We can then understand the manifold space as the "natural" coordinate space from the data provider's point of view.

W is therefore likely a coordinate system useful to the data consumer. For a display system that wants to show the data in its location on the earth, W is likely to be something like (lat, lon, height). Another possibility is the need to simultaneously display multiple datasets that have different coordinate systems. Figure 2 is a schema for a display of 2D projections of datasets. It is possible that the display would use the function a to transform the data in the coordinate system projection1 directly into display coordinates. Its also quite likely that it has to convert to a "canonical" coordinate system like lat/lon, in order to show a map overlay, or to display multiple datasets.

 

5. Coordinate Systems and NetCDF Variables as Sampled Functions

NetCDF variables are stored in files as multidimensional arrays. The indexing for those arrays is an indexed coordinate system for the data values. We can think of a netCDF variable as a function whose domain I = "index space" is a product set of dimensions (a dimension is a named range of integers). It's possible to think of netCDF variables as being simple finite sets of elements, namely the data values. It is generally more useful to think of variables as finite samplings of a continuous function, since this permits us to interpolate between points.

Define a "netCDF coordinate system" s as a tuple of scalar netCDF variables (s1, s2, ... , sn) with domain I and range w. In order to use our concepts of manifolds, which require continuous, invertible functions, we first embed I into Rm (m the number of dimensions in I), by the trivial map which takes an integer to its equivalent real number in R. With that mapping we can think of I as a subset of Rm, and (s, w) is a manifold from Rm to Rn. In this case, we must think of the netCDF variables s1, s2, etc. as samplings of a continuous function.

When we add a coordinate system to netCDF files, we are adding a manifold coordinate system for the data. To make this work, the manifold coordinate system must use the same indexed coordinate system as the data. This means that if t is a netCDF variable, and s is the coordinate system for t , then t = Y o s. (See Figure 3) In fact, this is always the case, since in netCDF files we have no way to specify Y except as Y = t o s-1. Although this notation does emphasize that s must be invertible, it's obviously not needed to specify Y, because we just use t directly to access the data. The value of this formalism is to specify the meaning in a netCDF file of the coordinate system s , namely as a manifold coordinate system for the data.

By making I a subset of Rm, and s a homeomorphism between I and w, we are specifying a very important property of w, namely connection between adjacent points. This means that a grid cell g in I (defined by adjacent points in each dimension of I) maps to a grid cell G in w. Furthermore, interior points in G can only be mapped by interior points in g. This restricts the possible values that the netCDF variables si can take and still represent a sampling of a homeomorphism[3].

This mapping of I to Rm, and seeing s as a homeomorphism implies a certain kind of connectedness, which I will call a gridded topology. Other topologies are possible, and are useful in 3D modeling applications. However, I theorize that gridded topology is sufficient for describing the connectedness of meteorological datasets.

 

6. Georeferencing and Time Referencing Coordinate Systems

Coordinate systems that specify spatial and temporal location of data typically have special meaning to data consumers and especially to display systems. It is likely that data producers will need to factor those coordinate systems in the following way:

Let E be the set of 3D space "close to" the earth, and G be a coordinate system for E that is fixed relative to the earth's surface. Let F be a framework for data providers and data consumers. Then G is a georeferencing coordinate system in F if F knows how to convert G into its canonical georeferencing coordinate system. For our purposes, we will consider the canonical georeferencing coordinate system as (lat, lon, altitude). Similarly, F defines a canonical time referencing coordinate system, and T is a time referencing coordinate system in F if F can convert T to its canonical time coordinate.

A field F is a georeferenced field if its base set can be written as a product set Wgeo X Wother, and Wgeo is a georeferencing coordinate system.

 

Conclusions

The concepts of manifolds from Rm -> Rm and fields are sufficient mathematical frameworks for our data model.

Coordinate systems are way of specifying elements in a set.

The manifold coordinate system is the "natural" coordinate system for the data provider. The base coordinate system is chosen for its usefulness to the data consumer.

Coordinate systems in netCDF files can be specified by tuples of netCDF variables. By embedding the index set into Rm, a coordinate system represents a homeomorphism and defines a manifold coordinate system. This implies that the manifold has a gridded topology, and restricts the possible values of the coordinate system variables.

Georeferencing and time referencing coordinate systems likely should be factored out and presented explicitly by data providers.

 

Notes

Footnote 1. A function with a continuous inverse is a homeomorphism, and W and w are said to be homeomorphic. When the function is also required to be differentiable, it is called a diffeomorphism.

Footnote 2. You will note I am ignoring coordinate variable conventions, partly to disambiguate the notation.

Footnote 3. In the simple case of a map from R -> R, this means that the values must be monotonically increasing or decreasing. In the higher dimensional case, I don't know of a simpler way to say it than "no other points can map to the interior of a grid cell". I suspect that that requirement is both necessary and sufficient for s to be a homeomorphism.

 
 
  Contact Us     Site Map     Search     Terms and Conditions     Privacy Policy     Participation Policy
 
National Science Foundation (NSF) UCAR Office of Programs University Corporation for Atmospheric Research (UCAR)   Unidata is a member of the UCAR Office of Programs, is managed by the University Corporation for Atmospheric Research, and is sponsored by the National Science Foundation.
P.O. Box 3000     Boulder, CO 80307-3000 USA     Tel: 303-497-8643     Fax: 303-497-8690