Re: Replies to comments on GDT

Harvey DAVIES (hld@dit.csiro.au)
Thu, 24 Jul 1997 23:20:32 +1000 (EST)

On Wed, 23 Jul 1997, Jonathan Gregory wrote:

> Section 6: Variable names
> 
> > I think there should be a recommendation that names consist of whole words
> > unless there is some strong reason to do otherwise.  So 'latitude' would be
> > preferred to 'lat'.  Note that such full-word variable names often obviate
> > the need for a 'long_name' attribute. [HD]
>
> I would be happy with such a recommendation. I do not think it would reduce
> the need for long_name, though. The long_name might really be quite detailed,
> for instance "volumetric soil moisture content at wilting point" (or this
> might be the quantity in GDT).

But surely the variable name 'latitude' is adequate!

> > I suppose the ability to define 0-dimensional variables could come in handy,
> > though such a quantity is probably more appropriately stored as a global
> > attribute. [JS]

There seems to be some confusion about what is meant by 0-dimensional.  I
would assume it means rank=0.  In other words an ordinary scalar value
(i.e. no dimensions).  JS seems to mean an array with a dimension of
size 0.

> > I wish to propose allowing missing (invalid) values in coordinate variables.
> > All corresponding data in the main variable would also have to be missing.
> > In particular this would simplify the problem of calendar dimensions which
> > GDT discuss.  You could simply allocate 31 days to every month and set data
> > for illegal dates (e.g. 30 Feb) to a missing value. [HD]
> 
> I am not happy about this idea, myself. To me it would imply that the data
> existed in principle, but was simply unavailable. See also Section 24.

I would argue strongly for a much broader concept of 'missing' or 'invalid'.
I see no reason why some of the missing values specified in the missing_value
vector should not mean things like 'meaningless' and 'undefined'.  This is
very similar to having missing values in the ocean for land-only variables
like soil-moisture.  How else can such values be represented?

I would also argue strongly for the above proposal to allow missing (invalid)
values in coordinate variables.  I feel it is a neat solution to the date
problem and is likely to be useful in other contexts.

> Section 11: Units
> 
> > I would like to see "none" added as a legitimate characterization, as it
> > would serve as a definite affirmation that the variable really does have no
> > units. [JS]
> 
> Good idea. Perhaps "one" or "unity" would be acceptable, since this could
> perhaps be inserted comfortably into the udunits "constants" section?

You ignored my comment that the required functionality is already provided
by udunits which allows units=" " for this purpose.  If you do  not like
using blank then udunits also allow units="1".

> I'm afraid I do not understand Harvey Davies's "measurement level" proposal.

Measurement level (measurement scale) describes the valid operations on a
variable and thus determines what statistics are valid.  The four levels are:

1. NOMINAL: Only valid operation is '='.  A measure of location is the 
   MODE (most frequent value).

2. ORDINAL: Comparisons are possible using operations '<' and '>'.  
   Non-parametric statistics can be used.  The usual measure of location is
   the MEDIAN (value with 50% of cases above & 50% below).

3. INTERVAL: Addition and subtraction are allowed.  So the ordinary
   ARITHMETIC-MEAN can be calculated and most standard statistical techniques
   can be used.

4. RATIO: Multiplication and division are allowed.  So the 
   GEOMETRIC-MEAN can be calculated.  Most physical and chemical measurements
   are at this level.

Here are some meteorological examples:

1. NOMINAL: Cloud Type (e.g. 1=cirrus, 2=nimbus, etc.)

2. ORDINAL: Beaufort Wind Scale (from 0=calm to 12=Hurricane).

3. INTERVAL: Temperature in Celsius. 

4. RATIO: Temperature in Kelvin. It makes sense to say that 200K is twice
   the temperature of 100K.

I am trying to think of a better example of an INTERVAL variable.  The above
temperature example is confusing in that it is the unit which makes it
INTERVAL, not the nature of the variable itself.  Perhaps a better example
would be altitude measured relative to an arbitrary datum whose absolute
altitude (height above standard sea-level) is unknown.

> Section 24: Time axes

> > If the unit is a day then there should be a fixed number (31 for 'normal'
> > calendars such as Gregorian) days in each month.  The time coordinate
> > variable should have a missing value for each day which does not exist in the
> > calendar used.  I think this obviates the need for the 'calendar' global
> > attribute and allows for most kinds of calendars without having to hard-code
> > them into a standard. [HD]
> 
> This would deal with the particular case of calculating the interval between
> two dates when a time axis at daily intervals is provided. I am not sure that
> counting the non-missing days between two points in a vector would be more
> convenient than working it out using a calendar-dependent algorithm, although
> it would be more general, I agree. However, it would not help if you did not
> wish to provide time coordinates at daily intervals. What if I have time
> coordinates at monthly intervals? To indicate the lengths of the months, would
> I have to pad out the coordinate vector, and presumably the data too, with
> missing data values at daily intervals i.e. approximately 30 times more missing
> data than genuine data? Not only would wasted space be added to the file, but
> it could easily be misunderstood, no matter how explicit the convention is
> made.

I suggest storing monthly data as follows:

dimensions:
    month = 120;
variables:
    length(month);
	length:units="days";
    temperature(month);
data:
    length = 31, 28, 31, 30, 31, ...

> Certainly, monthly and yearly mean data are among the most important types of
> climate data, so it is crucial to keep the representation of such data as
> simple and natural as possible, while representing them.  But I think it is
> good to avoid units of months and years. Although the udunits unit of "months"
> has a precise meaning (30.4368 days), this is probably not what you intend, and
> could lead applications to make mistakes if they do not check carefully what
> the intention is.

I do not see what the problem is with 'year' and 'month' in this context.  All
that matters is that there are 12 months in a year, a fact with which udunits
agrees!

> Section 32: Missing values in a data variable
> 
> > I think that the data should be checked against the "missing_value" *before*
> unpacking. [JS]
> 
> Yes, you may well be correct. Thanks.

Of course you can use missing_value however you like in SPECIFIC
applications.  But the netCDF User's Guide now states that GENERIC
applications should use the valid range (as defined by valid_range or
valid_min/max), not missing_value.  (I confess that you have me to blame me
for this change. You may want to throw something in the direction of
Australia, so I am donning my helmet as follows:  [(:-) )

Harvey Davies, CSIRO Mathematical and Information Sciences,
723 Swanston Street, Carlton, Victoria 3053, Australia            
Email: harvey.davies@cmis.csiro.au
Phone: +61 3 9282 2623 or +61 3 9239 4556
  Fax: +61 3 9282 2600