[cf-pointobsconvention] [Fwd: Re: [CF-metadata] Seeking example program for storing surface obs in CF?convention]

NOTE: The cf-pointobsconvention mailing list is no longer active. The list archives are made available for historical reasons.

To: cf-pointobsconvention@xxxxxxxxxxxxxxxx
Subject: [cf-pointobsconvention] [Fwd: Re: [CF-metadata] Seeking example program for storing surface obs in CF?convention]
From: John Caron <caron@xxxxxxxxxxxxxxxx>
Date: Tue, 11 Sep 2007 11:14:09 -0600

Below is an earlier conversation with Jonathan Gregory that would be good to 
review.

-------- Original Message --------
Subject: Re: [CF-metadata] Seeking example program for storing surface obs in   
CF?convention
Date: Thu, 09 Aug 2007 11:11:06 -0600
From: John Caron <caron@xxxxxxxxxxxxxxxx>
To: Jonathan Gregory <j.m.gregory@xxxxxxxxxxxxx>
CC: cf-metadata@xxxxxxxxxxxx
References: <20070808080651.GA11219@xxxxxxxxxxxxxxxxx>

Hi Jonathan:

Thanks for taking the time to look at this. Comments are inline.

Jonathan Gregory wrote:

Dear John

My own opinion is that CF is not currently adequate for writing observationaldata to NetCDF. The basic limitation in section 5.4 is thatfloat humidity(time,pressure,station)
  float pressure(pressure);
  double time(time);
requires the same number and values of the time and pressure coordinates at each station.


Yes, this is wasteful of space if you make all the stations share the
coordinate variables but they don't all have info at all (time,pressure)
points. Alternatively you have to create separate coordinate variables for
each station, which may be inconvenient.

If we put them in common variables, if I have understood your proposal, I
prefer the contiguous arrangement, something like this:

dimensions:
  record=UNLIMITED;
  station=5;
  stringlen;
variables:
  char station_name(station,stringlen);
  float latitude(station);
  float longitude(station);
  double time(record);
  float humidity(record);
    humidity:coordinates="time";
  float temperature(record);
    temperature:coordinates="time";

where the individual stations are contiguous in the humidity and temperature
variables. Then the question is how to indicate the range of records which
belongs to each station. One way, as in your example, is to provide an array
of start or end pointers into the records. Another way, which takes up a bit
more space but could be more convenient for using the data, would be to include

  int whichstation(record);
    whichstation:coordinate_index="station";

where the presence of the coordinate_index attribute indicates that the value
of whichstation is an index into the station coordinate dimension. whichstation
could be identified an an auxiliary coordinate variable by naming it in the
coordinates attribute:

  float humidity(record);
    humidity:coordinates="time whichstation";

E.g. if you have two timeseries, one with temperature data (1.1, 1.2, 1.3) and
the other with data (2.1, 2.2), you would have:

data:
  temperature=1.1, 1.2, 1.3, 2.1, 2.2;
  whichstation=0, 0, 0, 1, 1;

If it is done this way, rather than with start pointers, the individual
timeseries actually do not have to be stored contiguously, so any of them can
be appended to at any time. That might be a useful feature.

Yes, I think its a good alternative to just have each record refer to its owningstation, and not have to maintain the links. The parent/child linked (andcontiguous array) variant is useful to make finding the data fast; otherwise youhave to read through all of the data when you want to find data for one (or asmall subset) of stations.

The reference to the station could either be by index or by name, in our typicalfiles of this type, a few bytes wont matter much.


Your proposal appears to me to introduce several extra features which are
redundant or duplicating other CF attributes. The _CoordinateAxisType attr
has the same function as the CF axis attribute. I don't see the need for the

global attributes latitude_coordinate etc. since the lat etc. coordinates canbe identified by units and by standard_name; also, having a *global* attr

restricts the file to having only *one* coord variable of each type. The
attributes giving the max and min of each of the coordinates contain info
which can be deduced from the coord variables themselves, of course; is that
an important kind of discovery metadata? I'd be worried about it because it
is almost certain to be wrong some of the time i.e. inconsistent with the
coord variables. The cdm_datatype attribute implies a distinction between
various kinds of data which are formally not really different and would be
processed in the same way, so I don't see why this is useful.

The Convention wasnt intended to be a proposal for CF, just a stand-aloneConvention for this type of data, so we were making it rather broad tocover several existing data formats. So there is likely to be some redundancyand I guess the next step is to decide which parts should be added to CF.

The _CoordinateAxisType enumeration is intended to be a complete listing ofgeoreferencing axis types. We use them instead of parsing the units, lookingfor "positive", looking for standard names, and the other ways of identifyingcoordinate axes that have evolved out of COARDS/CF. They are for sure redundantto all of that.

The min/max values are a kind of discovery metadata. We also use them to tellthe user what are the possible valid space/time queries on this dataset. Again,this is an optimization for reading/serving data that obviates having to readthrough the entire file.

The cdm_datatype reflects our experience in how to describe kinds of data("scientific data types"). This has been a long and ongoing evolution of ourunderstanding. For example the coordinate system for a "time series of pointdata" looks just like "trajectory" data, so we use the cdm_datatype todisambiguate. It essentially describes the connectivity of the points. Itsneeded by visualizers, and useful for discovery.

Our "Observation Convention" introduces the notion of grouping variables into"Structures" by specifying that all variables with a common outer dimensionare part of the structure. This works especially well for the record dimension,where the variables really are a Structure (that is, all record variables arestored contiguously for record 0, then record 1, etc). Its also useful fornon-record dimensions, eg all variables whose outer dimension is "station"comprise the "Station Structure".

Anyway, it would be great to get some other heads onto this, especially thosewho have written or need to write this kind of point observation data. If wecan get 3 or 4 interested parties, we could put together a real proposal for CF.


Thanks again, Jonathon!

John
_______________________________________________
CF-metadata mailing list
CF-metadata@xxxxxxxxxxxx
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata

2007 messages navigation, sorted by:
1. Thread
2. Subject
3. Author
4. Date
5. ↑ Table Of Contents
Search the cf-pointobsconvention archives: