Replies to comments on GDT

Jonathan Gregory (jmgregory@meto.gov.uk)
Wed, 23 Jul 1997 15:57:47 +0100 (BST)

John Sheldon (JS) and Harvey Davies (HD) both posted lengthy comments on the
netCDF conventions proposed by Gregory, Drach and Tett (GDT). We were very
grateful for these comments. Here are some replies. In the following, "I" means
either Jonathan Gregory or Bob Drach; we are largely in agreement.

General points:

> 1. Conventions ought to be as simple and undemanding as possible, to make the
> use of netCDF as easy as possible. [JS]

> And we need to be careful not to make it even harder for the writers of
> netCDF files by defining so many conventions that they need to be 'netCDF
> lawyers'. [HD]

I agree with this. It is not a platitude. I am aware that GDT have added a lot
of new material, and it is important to consider carefully whether any of this
is an unnecessary burden.

> 3. As I mentioned in my previous mail, I am, in general, opposed to the use
> of external tables. [JS]

I am also opposed to them in general. However, as you acknowledge, our
reference to an external table from which the quantity should be selected is
not a translation, but a restriction on the possible values. We do not aim to
make the file less self-describing, which would be the case if the quantity
string were trans- lated into a code that you had to look up. The aim is to
make files produced by different people more readily comparable. Suppose I have
a program which works out the albedo from quantities "downward shortwave
radiation" and "upward shortwave radiation". If you supply a data file which
contains these quantities but calls them "total downward sw radiative flux" and
"upward sw radiative flux", my program will not be able to find the inputs it
needs.  Codes avoid this kind of problem, of course, since our corresponding
strings would translate to the same code. But we can also avoid it by agreeing
on the strings we will use. That is the purpose of our convention.

We recognise, however, that it will not be workable if it is too inflexible.
If I want to produce a new quantity which does not have a standard name, I need
a name to be defined very quickly - within a very few days - or I shall just
ignore the convention and go my own way. We think that such a system will
therefore have to be set up.

> 4. There seems to be a preference in your proposal for associating additional
> qualities with the axes/coordinate-variables themselves.
> 5. Your proposal does not rule out the use of referential attributes,
> but neither does it endorse or exploit them. [JS]

We propose several different kinds of ancillary coordinate (associate,
component and boundary). A standard referential attribute could attach only one
kind to a variable, I believe. I have a personal preference for attributes with
defined names, as I mentioned in my previous posting about coordinate systems.

Section 3: Data types

Our convention does not forbid the "byte" type, so no existing programs need to
be changed for GDT. We do not recommend it because of the statement in the
NetCDF Users' Guide (version 2.4), "It is currently possible to interpret byte
data as either signed (-128 to 127) or unsigned (0 to 255). However, the
addition of packed data in a future version of netCDF will require arithmetic
operations on values, and for that purpose byte data will be interpreted as
signed." A data type whose signedness is not well defined does not seem such a
safe choice for exchanging data. If it is important to include the "byte" type,
perhaps we could instead have a convention that it must be interpreted as
signed.

Section 5: Global attributes

> I prefer the name 'source' (as in CSM Conventions) in place of 'institution'
> and 'production'. [HD]

Fair enough.

> The 'conventions' attribute should include any version numbers, etc. rather
> than having additional attributes such as 'appendices'. [HD]

Having a separate appendices attribute would allow applications to read in this
attribute without having to parse a longer string. This information may be
useful because it allows the application to know the list of quantities which
were available when the file was created (see section 12).

Section 6: Variable names

> I think there should be a recommendation that names consist of whole words
> unless there is some strong reason to do otherwise.  So 'latitude' would be
> preferred to 'lat'.  Note that such full-word variable names often obviate
> the need for a 'long_name' attribute. [HD]

I would be happy with such a recommendation. I do not think it would reduce
the need for long_name, though. The long_name might really be quite detailed,
for instance "volumetric soil moisture content at wilting point" (or this
might be the quantity in GDT).

> GDT suggest avoiding case sensitivity.  I do not think this is the kind of
> thing which should be standardised in a convention.  Instead it should be
> recommended as good practice. [HD]

It is only a recommendation in GDT, but I think it should be a strong one. If
someone puts two variables in a file distinguished only by case, it may work
OK for them while they remember, but I think that there is a considerable
chance it may confuse or waste the time of someone else trying to use the file
who is not warned sufficiently about it!

Harvey Davies suggests we could use the long_name as the variable name, in
effect. But if they are to be as detailed as the long_name above, they would be
very cumbersome variable names. To get round having more than one variable in a
given quantity, you need schemes to add suffixes or something, such as Harvey
Davies suggests. I feel this is more awkward than having an attribute.

Section 8: Axes and dimensionality of a data variable

> In the spirit of simplicity, I don't think I would make storage of coordinate
> variables mandatory if they are simply 1,2,3 [JS]

The main reason for mandatory coordinate variables, even if they are mere
indices, is so that they can have associated coordinate variables. This is not
a very strong reason, I admit. If there are associated coordinate variables,
one of them could be made the main coordinate variable instead. I do not feel
strongly about this.

> I suppose the ability to define 0-dimensional variables could come in handy,
> though such a quantity is probably more appropriately stored as a global
> attribute. [JS]

I think global attributes are appropriate for information about the file or
other variables, whereas 0-dimensional variables would be more natural for
scalar "physical" data.  For instance, a global-average temperature might be a
0-dimensional variable.  In a file conforming to GDT, it is likely that a
single-element variable of this kind would have various singleton dimensions,
giving the time it applies to or showing that it was a mean over latitude and
longitude, for example. However, I don't think this should be a requirement.

I'm glad John likes singleton dimensions.  An alternative approach would be to
use attributes instead for such metadata, but I don't like that so much because
it does not show the correspondence between the single-valued and multi-valued
cases. I feel that a data variable which applies to only one vertical level,
for instance, should record that level using the same mechanism as one which
has a multi-valued vertical axis.

> I can see that there there might be some use for variables with more than 4
> dimensions, but this is likely to frustrate some existing utilities. [JS]

COARDS does not forbid more than four dimensions, although it does not
recommend it. Software which conforms to COARDS hopefully can handle more than
four.

Section 9: Coordinate variables

> I disagree with GDT's suggestion that every dimension have a coordinate 
> variable.  This would triple the space required [certain] timeseries. [HD]

The main reason for this requirement was to provide something to which
attributes could be attached. That is not necessarily a very good reason, I
agree. However, I think the case Harvey Davies mentions is untypical. Files of
one-dimensional fields such as timeseries are unusual for climate model
data. They arise for observational station data, but in that case it is very
likely that several timeseries will share the time coordinate variable (though
they might have missing data for some days), so the overhead will be
proportionately less.

> It would be nice to have an efficient way of specifying a coordinate variable
> with constant steps i.e. an arithmetic progression (AP).  I propose doing
> this by replacing the rule that the shape of a coordinate variable consist of
> just a dimension with the same name. The new rule should allow any single
> dimension with any size (including 0).  Then any trailing undefined elements
> of a coordinate variable would be defined as an AP. [HD]

I think that this is an issue of compression rather than representation - how
to make the file smaller. When in memory, you will probably want a vector
explicitly containing the values. It certainly makes programs more complicated
if they have to handle a coordinate variable which might be either a set of
explicit values or constants defining an arithmetic progression. For the sake
of compression, one could do this, but I doubt that it is worthwhile. Except in
unusual cases like Harvey Davies's timeseries example, the space taken by
coordinate variables is relatively small. Although it would work, I do not
think the scheme he suggests would be wise to adopt, as it is not backward
compatible for existing applications.

> I wish to propose allowing missing (invalid) values in coordinate variables.
> All corresponding data in the main variable would also have to be missing.
> In particular this would simplify the problem of calendar dimensions which
> GDT discuss.  You could simply allocate 31 days to every month and set data
> for illegal dates (e.g. 30 Feb) to a missing value. [HD]

I am not happy about this idea, myself. To me it would imply that the data
existed in principle, but was simply unavailable. See also Section 24.

Section 11: Units

> I would like to see "none" added as a legitimate characterization, as it
> would serve as a definite affirmation that the variable really does have no
> units. [JS]

Good idea. Perhaps "one" or "unity" would be acceptable, since this could
perhaps be inserted comfortably into the udunits "constants" section?

Since udunits is publicly available, I do not see that it will be a particular
problem for an application to support its syntax for offset and scale. In some
cases, this will avoid having to introduce new quantities or units. For
example, in our version of the GFDL-Cox ocean model, salinity is in
psu/1000-0.035. This "unit" can be represented as "1000 psu @ 0.035". (This
is not a perfect example because psu is not a unit in udunits, but presumably
could be added as a dimensionless unit if it is really necessary.)

Section 12: Physical quantity of a variable

> I am unhappy with this proposed 'quantity' attribute. ... Why not simply
> standardise the name of the variable? ... But I do agree that there is a need
> for something more than just the 'units' attribute to give information about
> the nature of a variable. [HD]

We feel that standardised variable names are a less elegant solution than the
quantity attribute, especially given the need for suffixes or other devices
needed to distinguish different variables having the same quantity. Since
variables must have names, requiring use of standard variable names presents a
problem when the standard table does not have a suitable value - what would you
call the variable? You would need another attribute to tell if the name were
standard! The quantity attribute is optional in most cases, as it should be, so
valuable information need be added only when appropriate.

I'm afraid I do not understand Harvey Davies's "measurement level" proposal.

> I have never been happy with having both the FORTRAN_format and the C_format
> giving essentially the same information.  (Although it is usually possible to
> derive one from the other.) It might be better to replace [them] by some
> language-independent attributes. [HD]

The language-independent representation proposed by Harvey Davies seems more
cumbersome to me than either the Fortran or the C format. If the convention
could pick one or the other as standard (we recommended Fortran), it will be
convenient for programs in that language, which can use it directly, and it
can be fairly easily parsed and converted into a format for any other language.

Section 13: Topology

> float lon(lon);
>     lon:long_name = "longitude";
>     lon:quantity = "longitude";
>     lon:topology = "circular";
>     lon:modulo = 360.0f;
>     lon:units = "degrees_east";
> 
> There is a lot of redundancy here, especially if 'lon' is the standard name
> for longitude. I would prefer to replace the above by:
> 
> float longitude(longitude);
>     longitude:modulo = 360.0f;
>     longitude:units = "degrees_east";
>
> The ... proposed attributes 'quantity' and 'topology' do not appear to
> provide any useful additional information. [HD]

I prefer the use of the quantity attribute to depending on the name of a
variable or the units (see above on Section 6 regarding variable names).
Although in this case the units do almost certainly imply the quantity, this is
not true in most cases. For instance, units of "kg m^-2 s^-1" tell you very
little about the quantity - there are a great many things it could be.  The
quantity attribute is proposed as an umambiguous indication. It will be most
convenient for software if it can be relied upon always to be present.

The "topology" and "modulo" attributes do convey different information. For
instance, a longitude coordinate variable *limited* to values in the eastern
hemisphere between the Greenwich meridian and the date-line (e.g. 0E, 25E,
120E, 130E, 180E) does not have circular topology. (This might be from a model
of a limited area of the world.) If you make a contour map of a field with such
a longitude axis, you can interpolate anywhere within 0 and 180 to draw the
contours, but it is not legitimate to interpolate over the western hemisphere
and draw the rest of the world. The rest of the world is simply missing. The
implication of circular topology would be that you could put any longitude you
like on the left-hand side of the map, which is not the case here. However,
this coordinate variable *does* have a modulo, of 360, since you can *label*
the points in any way which is equivalent under the modulo to the coordinates
in the file. The coordinates 0,25,120,130,180 can equally well be labelled as
-360,-335,-240,-230,-180. Thus, modulo does not imply circular.

> I suggest the monotonicity requirement should be relaxed if modulo is
> specified. [HD]

In principle, a circular axis does not require a modulo either, but GDT say
that it should have one. This is to give a means of preserving monotonicity if
you rotate the axis. I do not think that the monotonicity requirement should
be relaxed because it is a very useful assumption for software to be able to
make. I expect that existing software does make it.

Section 16: Vertical (height or depth) dimension

In GDT we proposed that vertical axis should have a quantity defined. We made
this suggestion because we were unhappy about requiring units to be defined for
the vertical axis (as COARDS does), especially in case the quantity was
dimensionless.  Our quantity proposal means that all vertical axes are treated
the same way, and no units have a special status.

In our comments, we mention the positive attribute of COARDS, and John Sheldon
argues that this attribute is essential. I think that if it is used, it should
always be included, or should be assumed to have a particular value (probably
"up") if it is absent, and I would be happy with this. John notes that it
defines the handedness of the system. This is so, but only if you know the
senses of the other two spatial axes. I think it is interesting that John feels
happier to assume which way these go. GDT make a comment that the vertical
dimension can be identified from the order of dimensions. In the case of a
zonal-mean wind (Y-Z), Y will be the last dimension and Z the next-to-last in
the CDL declaration. The quantity and units of Y make clear that it is a
horizontal dimension, and of Z that it is not. Since Z is next to Y, it must be
the vertical dimension (assuming there is one). This is rather weak, though, I
agree.

Perhaps it would be a good idea of adopt John's local solution and attach
attributes pointing out which "Cartesian" axis a coordinate variable
corresponds to. I would argue that one should also say, as for up/down, what
its sense is, in that case. However, I *do* think this is an issue for display
of the data, rather than of the data itself.

> Is there any reason why one could not simply adopt the single standard
> variable name 'height' and handle depths as negative heights? [HD]

I would prefer quantities to variable names. But even so, the main reason why
the convention cannot insist on either height or depth is that some climate
datasets and climate models use one, and some the other. A convention that
prescribes which quantities can be represented will probably not be popular.

Section 18: Component coordinate variables

> But I disagree with Russ Rew on allowing 2D NUMERIC coordinate variables for
> such things as dates. [HD]

GDT propose component coordinate variables as a way of storing coordinates that
require several numbers at each point to specify them. (This is a separate
issue from multidimensional coordinates; it refers to "composite numbers" and
the example given is a one-dimensional hybrid vertical coordinate.) However, we
do not recommend this for time because schemes for encoding time unambiguously
into simple numbers can be specified in the convention.

Section 21: Boundary coordinate variables

> Is there any particular reason why you made the additional dimension the
> slowest varying dimension? [JS]

I have found that I more often want to access all the lower or all the upper
boundary values at once than to get both boundaries at once for a particular
level. Moreover, doing it this way makes the boundary vectors just like two
coordinate vectors end-to-end.

> This has some appeal, but it does not seem basic enough to justify
> generalising coordinate variables to 2D. [HD]

We do not regard boundary coordinates as "ordinary" 2D coordinates. They
are extra information attached to 1D coordinate variables. In GDT, the
dimension of size 2 is hard-coded, not a dimension of the netCDF file.

Section 22. Point values versus average values

> I agree that this distinction is important. The rainfall example suggest a
> third alternative - a value integrated (accumulated) over intervals along one
> or more axes. [HD]

I don't think this is necessary because the quantity will make this clear e.g.
quantities of "rainfall rate" and "accumulated rainfall". These have to have
different quantities because they have different units, of kg m^-2 and kg m^-2
s^-1 (or perhaps mm and mm s^-1, which again would be different quantities).
The subcell attribute is proposed simply to tell you whether the value applies
to the whole cell or a point within the cell. It indicates whether a rainfall
rate is an instantaneous value, or averaged over a time interval, for instance.

In general, there is another possibility. If you want explicitly to record that
something has been summed over one or more axes, this is a contraction (next
section) and should be recorded as such. In a sense, the subcell attribute is
doing a similar job, but at a smaller scale. The subcell attribute is
indicating how the data relates to a subscale about which we have no
information at all.

Section 23: Contracted dimensions

> I definitely like the idea of a "contraction" attribute to document, in a
> general way, the operation that was performed. [JS]

Good. You are right to say, "in a general way". You have to decide how much
information you really want to store. We felt that the contraction approach
would be sufficient to distinguish the variables we need to distinguish, which
is the practical criterion.

> We should agree on a set of valid strings (e.g. "min vs. "minimum"). [JS]

Yes. We propose to define the valid strings in an appendix to the standard.

> How would I store, and document, say, a time-series of monthly means? [JS]

See below.

The example given for a simultaneous contraction is of a standard deviation.
For instance, suppose you have

  dimensions:
    lat=72;
    lon=96;

  variables:
    float temperature(lat,lon);
    float lon(lon)
      lon:quantity="longitude";
      lon:bounds="bounds_lon";
    float bounds_lon(2,lon); // E and W limits of each cell in longitude
    float lat(lat); // Probably would have bounds_lat as well
      lat:quantity="latitude";

A zonal mean of this can be recorded as

  dimensions:
    lat=72;
    con_lon=1;

  variables:
    float zm_temperature(lat,con_lon); // lat is unchanged
    float con_lon(con_lon);
      con_lon:quantity="longitude";
      con_lon:bounds="bounds_con_lon";
      con_lon:contraction="mean";
    float bounds_con_lon(2,con_lon); // E and W limits of the zonal mean

A global mean can be recorded by contracting the latitude axis in the same
way. It doesn't matter which order you do it in (except for missing data -
that is something we decided not to try to record in the metadata). Standard
deviation is a non-linear operation, though. The meridional SD of the zonal
SDs will not equal the zonal SD of the meridional SDs, and neither equals the
SD calculated by considering all the boxes at once. This last case we have
to represent as a "simultaneous contraction", thus:

  dimensions:
    area=1;

  variables:
    float gsd_temperature(area);
    short area(area); // a dummy coordinate variable
      area:associate="con_lat,con_lon";
      area:contraction="sd";
    float con_lon(area); // The mean longitude of the region
      con_lon:bounds="bounds_con_lon";
    float bounds_con_lon(area); // E and W limits of the region
    float con_lat(area); // The mean latitude of the region
      con_lat:bounds="bounds_con_lat";
    float bounds_con_lat(area); // N and S limits of the region

> Take a vertical average (mean) over pressure from some pressure level (say,
> 50mb) down to the earth's surface.  Now, since the surface pressure varies
> two-dimensionally, it seems that a dimension, being 1-D, will not be adequate
> to store the information about the integration bounds. [JS]

I agree, this is a problem. You really want to be able to store a contracted
pressure coordinate with "bounds_pressure=surface,50 mb;". One way - rather
contrived - I can see to do this is suggested by the use of a hybrid vertical
pressure-sigma coordinate, where the pressure at level i is given as
"pressure=pressure_level(i)+sigma_level(i)*surface_pressure". Your upper level
is a pure pressure level, with pressure_level=50 mb, sigma_level=0. Your lower
level is a pure sigma level with pressure_level=0 mb, sigma_level=1.  This
could be recorded in our conventions using a component attribute, thus:

  dimensions:
    con_eta=1;

  variables:
    float ke(con_eta,lat,lon);
    float con_eta(con_eta);
      con_eta:component="pressure_level,sigma_level";
      con_eta:quantity="hybrid sigma-pressure";
      con_eta:contraction="mean";
    float pressure_level(con_eta);
      pressure_level:quantity="pressure";
      pressure_level:units="Pa";
      pressure_level:bounds="bounds_pressure_level";
    float bounds_pressure_level(2,con_eta);
    float sigma_level(con_eta);
      sigma_level:quantity="sigma";
      sigma_level:bounds="bounds_sigma_level";
    float bounds_sigma_level(2,con_eta);

  data:
    bounds_pressure_level=0,50;
    bounds_eta_level=1,0;

This approach does require the application to know how to work out the
pressure, if it needs to, from the pressure_level and the sigma_level.

> The 'history' attribute should provide a complete audit-trail of every task
> (e.g. input, copy, reduction, other arithmetic) which created and modified
> the variable. [HD]

Such detailed information could be useful, but more difficult to process. The
aim of the contraction attribute is to provide a systematic and simple scheme
for recording the common ways whereby a dimension is contracted. Harvey Davies
lists some further useful examples of such contractions. I think it is much
more convenient to record this information, ready-parsed, as a separate
attribute than as part of a variable name.
      
Section 24: Time axes

> Suppose I have an idealized model that is not associated with any calendar -
> it just ticks off hour after hour.  How would I be allowed to specify time
> is this case? [JS]

I think you specify it as a pure interval of time, with units="hours". There
is no problem with this, because if it is really not associated with any
calendar you will not want to add it to a date, which is when the problems
arise.

> If the unit is a day then there should be a fixed number (31 for 'normal'
> calendars such as Gregorian) days in each month.  The time coordinate
> variable should have a missing value for each day which does not exist in the
> calendar used.  I think this obviates the need for the 'calendar' global
> attribute and allows for most kinds of calendars without having to hard-code
> them into a standard. [HD]

This would deal with the particular case of calculating the interval between
two dates when a time axis at daily intervals is provided. I am not sure that
counting the non-missing days between two points in a vector would be more
convenient than working it out using a calendar-dependent algorithm, although
it would be more general, I agree. However, it would not help if you did not
wish to provide time coordinates at daily intervals. What if I have time
coordinates at monthly intervals? To indicate the lengths of the months, would
I have to pad out the coordinate vector, and presumably the data too, with
missing data values at daily intervals i.e. approximately 30 times more missing
data than genuine data? Not only would wasted space be added to the file, but
it could easily be misunderstood, no matter how explicit the convention is
made.

Section 26: Non-Gregorian Calendars

> UDUNITS already has "common_year" defined indicate a calendar with 365 days.
> [JS]

We used "noleap" for compatibility with LATS.

"perpetual" and "model" would be OK for calendar. Anything is allowed; all
the convention says is that generic applications may not know how to handle
it. I would have no problem with these if the ways to handle them were defined
generically.

Section 27: Unimonth calendar

The problem we are trying to address here is that of applications trying to
convert between different representations of time, where

* one representation uses months or years, and the other uses days, hours,
minutes or seconds, or

* they use different choices from the variety of calendars currently in use
within climate models.

> I see the attraction of being able to convert between any calendar and
> "unitime".  This is the same thing many of us do when we store date/time as
> YYYYMMDDHHNNSS. [JS]

Yes, it is similar.

> I agree that date/time should be represented by a single number.  I suggest
> the form YYYYMMDD.d where d is a fraction of a day.  So 19970310.5 represents
> noon on March 10, 1997.  Similarly year/month is represented by YYYYMM. [HD]

This representation has the advantage of human readability, I agree. However,
Harvey Davies is apparently suggesting storing both this form and the time in
some other form with which one can do calculations. Is this really necessary
when you can always do a conversion from one to the other easily enough?

> I'm not opposed to this, but I wouldn't want to use it in place of a UDUNITS-
> type representation. Hopefully, UDUINTS will someday handle additional
> calendars. [JS]

We propose it as a way of encoding time into numbers. We would like an
extension to udunits to do this encoding. However, it is very easy translation
from year,month,day into unitime (given in GDT) and can be done in one line;
much easier than for the Gregorian calendar! So if unitime was in use, you
would have something like this:

  dimensions:
    time=2;

  variables:
    double time(time);
      time:quantity="unitime";
      time:units="days since 1-1-1";

  data:
    time=0,102.5;

The times encoded here are "1-1-1 0:0:0" and "1-2-3 12:0:0" (noon on 3rd
February in year 1). It would be nice to have a translator for this, but the
difficulty is only in parsing the strings (just as for any other calendar), not
calculating the days.

> One difficulty with this sort of an axis is that an otherwise continuous
> coordinate now become non-continuous; ie, there are lots of what appear to be
> "breaks" in the timeline.  Utilities can be made to realize that some
> processing is needed, but this will require more work. [JS]

Yes, it is now more difficult to calculate some intervals. To work out the
interval between the two dates above, for example, I have to convert them back
from unitime into components, and then into days in their native calendar. If
this is Gregorian calendar, the interval above is 33.5 days; if the 360-day
calendar, it is 32.5 days. This complexity is not necessary if the hundreds
(i.e. the month) is the same.

However, some intervals are easier than in the Gregorian calendar. If the
interval is an exact multiple of 100 (e.g. 12202.5 and 24202.5) you can work
out how many years and months it is without having to do any conversions.  In
this case, the interval is 12000 days = 120 months = 10 years. If you stored
the days in Gregorian time, you would have to convert to components and note
that the "day in month" was the same in both dates in order to come to this
conclusion. I would suggest that this kind of interval is rather common in
handling climate data, so it may be a significant convenience.

What is more, the representation will be the same regardless of the native
calendar. 102.5 days always means "1-2-3 12:0:0" and an interval of 12000 days
is always 10 years for both model and observed data, if both are in unitime.
This may help a bit with John Sheldon's issue of comparing calendars, although
it is still a thorny problem.

> Also, unless I'm missing something, storing "monthly" data by defining a
> "unitime" axis with units of "days" doesn't necessarily buy us more than a
> "time" axis with units of "months". [JS]

> Climate data should normally use a time axis with a unit of a day, month or
> year (or some multiple of these). [HD]

Certainly, monthly and yearly mean data are among the most important types of
climate data, so it is crucial to keep the representation of such data as
simple and natural as possible, while representing them.  But I think it is
good to avoid units of months and years. Although the udunits unit of "months"
has a precise meaning (30.4368 days), this is probably not what you intend, and
could lead applications to make mistakes if they do not check carefully what
the intention is.

Section 29: Multiple time axes and climatological time

> What would a time series of June means look like with and without "wrt"? [JS]

Yes, this is rather complicated. We would label the temperatures for 1st, 2nd
and 3rd June averaged over 1990 to 1994 inclusive like this:

  dimensions:
    con_year=1;
    day=3;

  variables:
    float temperature(con_year,day);
    double con_year(con_year);
      con_year:quantity="unitime";
      con_year:units="days since 1-1-1";
      con_year:contraction="mean";
      con_year:min_interval=1200;
      con_year:max_interval=1200;
      con_year:bounds="bounds_con_year";
    double bounds_con_year(2,con_year);
    float day(day);
      day:wrt="con_year";
      day:quantity="time";
      day:units="day";
      day:subcell="cell";
      day:bounds="bounds_day";
    float bounds_day(2,day);

  data:
    con_year=2279300;
    bounds_con_year=2279300,2284100;
    day=0.5,1.5,2.5;
    bounds_day=0,1,2, 1,2,3;

2279300 is 1 June 1900 in the unimonth calendar, and 2284100 1 June 1994. The
interval between these is 4800 days = 48 months = 4 years. The min_interval and
max_interval tell us that the periods meaned together were separated by 1200
days = 1 year. The use of unitime here means that we avoid problems with leap
years - we are clear that the interval is exactly one year. Since the absolute
time is always interpreted as a "point" coordinate, we infer that the periods
meaned together were at intervals of 0,1,2,3,4 years.  The day dimension is
marked as an offset by the wrt attribute. This tells us we can add it to the
con_year coordinate to obtain the result that the periods meaned were between
midnights on 1st-2nd June, 2nd-3rd June and 3rd-4th June in each of the five
years.

If the wrt attribute were omitted, these two dimensions would be unrelated.
This implies two separate time dimensions, which is a much less likely case.
Two separate dimensions could be verification time and forecast time in a
numerical weather prediction model; this example might be that it was a mean of
forecasts made at midnight on 1 June in each of the five years of the mean
temperature over forecast periods 1,2,3 days ahead, but in such a case the
quantities on the time axes should probably be more precise.

Section 30: Special surfaces

This section is appropriate when the surface can be described just by being
named e.g. "top of atmosphere". The point of doing this is that it then means
you need only one kind of quantity of "net downward shortwave radiation", for
instance - not separate quantities for "surface" and "top of atmosphere" and
"tropopause". If a scheme for multidimensional coordinates is agreed, that will
take over the case where the surface has to be specified.

Section 32: Missing values in a data variable

> I think that the data should be checked against the "missing_value" *before*
unpacking. [JS]

Yes, you may well be correct. Thanks.

Section 33: Compression by gathering

John Sheldon points out that the mask used for gathering could be recorded,
rather than the indices. These approaches are equivalent. We preferred the
indices because it seemed likely to us they are actually quicker to use for
gathering and scattering, and take up less space than the mask. But either
approach could be allowed.