[cf-pointobsconvention] Draft 2

NOTE: The cf-pointobsconvention mailing list is no longer active. The list archives are made available for historical reasons.

To: cf-pointobsconvention@xxxxxxxxxxxxxxxx
Subject: [cf-pointobsconvention] Draft 2
From: Jonathan Gregory <j.m.gregory@xxxxxxxxxxxxx>
Date: Wed, 19 Sep 2007 22:22:00 +0100

Dear John

(1) Thank you for your careful analysis and examples of the needs for this
kind of data. I've been wondering how one can characterise it in
general. Although most cases are "point" data of some kind, perhaps it would
be right to describe all these kinds of data as "ungridded". That's a word
which is used in various ways but it seems most apt to me for this
situation. In terms of structure, what you are describing are kinds of data
where the size of one dimension may vary as a function of index along another
dimension. That is "ungridded" in a deeper sense than data which is not evenly
arranged in x and y but is still contained within a rectangular array, for
instance.

(2) While the analogy to tables and SQL is interesting, personally I find the
CDL expression most obvious. Moreover, it would be a fairly small extension to
CF to include this kind of indirection. It is rather like the method described
for compression by gathering in CF 8.2:

dimensions:
 lat=73;
 lon=96;
 landpoint=2381;
 depth=4;
variables:
 int landpoint(landpoint);
   landpoint:compress="lat lon";
 float landsoilt(depth,landpoint);
   landsoilt:long_name="soil temperature";
   landsoilt:units="K";
 float depth(depth);
 float lat(lat);
 float lon(lon);

Here the coordinate variable of the "gather" dimension (landpoint) is an index
into the two dimensions which were jointly compressed by the gathering. As you
mention, we could represent the ungridded case in a very inefficient way by
constructing a coordinate variable which contains all possible values of the
variable-size dimension, for instance:

dimensions:
 station=10;
 pressure=11;
 allpossibletimes=6289; // for instance
variables:
 double allpossibletimes(allpossibletimes);
 float pressure(pressure);
 float latitude(station);
 float humidity(pressure,allpossibletimes,station);

and then compress it to eliminate the (time,station) combinations which don't
occur:

dimensions:
 station=10;
 pressure=11;
 allpossibletimes=6289;
 record=7478; // for instance
variables:
 double allpossibletimes(allpossibletimes);
 float pressure(pressure);
 float latitude(station);
 int record(record);
   record:compress="allpossibletimes station";
 float humidity(pressure,record);

That would be workable for the ungridded case. It can even be more efficient
than the schemes you describe, as it allows reuse of times that are common to
more than one station, but it doesn't seem natural, as you don't really regard
ungridded data as a compression of a huge sparse array. Instead of combining
indices to station and time, you prefer to keep them separate:

dimensions:
 station=10;
 pressure=11;
 record=7478;
variables:
 float latitude(station);
 int station_index(record);
   station_index:compress="station";
 double times(record);
 float pressure(pressure);
 float humidity(pressure,record);
   humidity:coordinates="station_index";

This is not the purpose for which the compress attribute was defined, but what
we need here is similar. The compress attribute indicates that the value of
its variable is an index into the dimensions listed, and if only one dimension
is listed, it must be a 1D index. In this application the index will have many
repeated values, because it's doing few->many by duplication rather than
many->few by eliminating unused entries as it does when gathering. We could
give the attribute a different name, since it's being used for a different
purpose, and because it's being attached to an auxiliary coordinate variable
rather than a coordinate variable.

(3) In your examples you have auxiliary coordinate variables such as
z(sample,z). In CF we recommend against giving an aux coord var the same name
as a dimension, because this could confuse any software that was looking for
(Unidata) coord vars but didn't check how many dimensions they had.

(4) Much of the subsequent discussion has been about your proposed dataset
classification. I think that the quantity of discussion indicates that the
distinction is hard to draw, because it's one of interpretation and purpose
rather than structure. I believe you intend this attribute as discovery
metadata, don't you. Is it possible you could store such a description in one
of the existing global attributes whose contents aren't standardised by CF?

Best wishes

Jonathan

Follow-Ups:
- Re: [cf-pointobsconvention] Draft 2
  - From: John Caron

2007 messages navigation, sorted by:
1. Thread
2. Subject
3. Author
4. Date
5. ↑ Table Of Contents
Search the cf-pointobsconvention archives: