A late reply to John Sheldon's comments on multidimensional and contracted time-axes, as proposed by GDT: > First, the easy case (by way of example): > ----- > ** A time series (3) of January average temperature, where each ** > ** monthly mean is derived from daily means. ** > > Our local approach uses up to 6 additional quantities to store the > required information. The first 3 of these are used in all cases. > The last 3 can be used to describe the items comprising the average. > > dimensions: > time = 3; > day = 31; > > variables: > float Tavg(time); > Tavg:long_name="Average monthly temperature" ; > Tavg:units="deg_K" > Tavg:average_info="T1, T2, nitems, time_of_item, \ > item_is_avg, dt_item"; > double time(time); > time:units="days since 1-1-1990"; > time:calendar="common_year"; > > double T1(time); > T1:long_name="starting time of average"; > T1:units="days since 1-1-1990"; > T1:calendar="common_year"; > > double T2(time); > T2:long_name="ending time of average"; > T2:units="days since 1-1-1990"; > T2:calendar="common_year"; > > long nitems(time); > nitems:long_name="Number of items in average"; > > float time_of_item(day,time); > time_of_item:long_name="time of individual items comprising average" > time_of_item:units="days since 1-1-1990"; > > short item_is_avg(day,time); > item_is_avg:long_name="flag indicating whether item in average is itself an average"; > > double dt_item(day,time); > dt_item:long_name="length of time over which the items comprising average are representative"; > dt_item:units="days"; > > data: > time = 15.5, 380.5, 745.5 ; > > T1 = 0., 365., 730. ; > T2 = 31., 396., 761. ; > > nitems = 31, 31, 31; > > time_of_item = 0.5, 1.5, 2.5, ... 30.5, > 365.5, 366.5, 367.5, ... 395.5, > 730.5, 731.5, 732.5, ... 760.5 ; > > item_is_avg = 1, 1, 1, ... 1, > 1, 1, 1, ... 1, > 1, 1, 1, ... 1 ; > > dt_item = 1., 1., 1., ... 1., > 1., 1., 1., ... 1., > 1., 1., 1., ... 1. ; > > > This works fine, because each mean is taken over a continuous span of > time (ie, all of a January). "T1" and "T2" bracket the period. The > "time" value is only somewhat arbitrary. (It seems logical to me > that it be the midpoint of the averaging period, but I've heard > others argue for assigning it a time equal to the starting or ending > time of each period.) It is flexible enough to handle disparate > items included in the average. And, "time" stays 1-D. > > * How would you handle this case using "contraction" and "wrt"? The simplest way you could represent this using the conventions of GDT would be without the information about the items making up the average. In this case, as with your scheme, time is one-dimensional. T1 and T2 are recorded as boundary coordinates, and time as the main coordinate. I agree with you that it is logical that time should be the midpoints, but it is arbitrary. dimensions: time = 3; variables: float Tavg(time); Tavg:quantity="temperature"; Tavg:units="deg_K"; double time(time); time:quantity="time"; time:subcell="cell"; // indicates these are not instantaneous values time:units="days since 1-1-1990"; time:bounds="bounds_time"; double bounds_time(2,time); data: time = 15.5, 380.5, 745.5 ; bounds_time=0., 365., 730., 31., 396., 761.; To arrive at the idea of a contracted axis, consider the original 3*31 days organised into two dimensions of time. The first dimension is over the "major" time interval of months, the second over the "minor" interval of days within the month. This gives: dimensions: months=3; days=31; variables: float Tday(months,days); double months(months); months.quantity="time"; months.units="days since 1-1-1990"; float days(days); days.quantity="time"; days.subcell="cell"; days.units="days"; days.wrt="months"; days.bounds="bounds_days"; float bounds_days(2,days); data: months=0., 365., 730., days= 0.5, 1.5, 2.5, ..., 30.5; bounds_days=0.0, 1.0, 2.0 ..., 30.0, 1.0, 2.0, 3.0 ..., 31.0; The way to interpret the time coordinates here is to add the offset times (marked with wrt) to the absolute times. Thus, the element Tday[1][2] has a time coordinate 365.0+2.5, with boundaries 365+2.0 and 365.0+3.0. Now we contract the days axis, to produce dimensions: months=3; con_days=1; variables: float Tday(months,con_days); double months(months); months.quantity="time"; months.units="days since 1-1-1990"; float con_days(con_days); con_days.quantity="time"; con_days.subcell="cell"; con_days.units="days"; con_days.wrt="months"; con_days.bounds="bounds_con_days"; con_days.contraction="mean"; con_days.interval=1.0; float bounds_con_days(2,days); data: months=0., 365., 730.; con_days= 15.5; bounds_con_days=0.0, 31.0; (Here I have departed slightly from GDT, by showing an "interval" attribute instead of "max_interval" and "min_interval". This is because I have a further suggestion to make below.) The contracted axis, with a dimension of unity, tells us that the data value for each of the three months was derived by averaging values applying to times separated by 1 day and covering a period of 31 days. The subcell attribute tells us, further, that these values were initially representative of their time cells, rather than instantaneous measurements. It is possible that this last piece of information is not sufficiently precise. Suppose the original daily values were daily maxima. In this case we consider a notional sub-daily time axis, containing an indefinitely large number of times within the daily cycle. This axis is then contracted by finding the *maximum* value rather than the mean. The sub-daily interval is not defined or needed. We thus record the information that the monthly value is the mean of 31 daily maxima by appending a third time axis with bounds of 0.0 and 1.0 day, contraction="max", wrt="con_days". This might seem a bit excessive. I am not entirely sure about whether in fact this information might be better off by allowing subcell="max" instead. However, I think it would be sensible to include an extra contracted axis if there *was* an interval you wanted to record. For example, suppose you wanted to record that the daily value was the maximum of pressure measurements made at 3-h intervals through the day. > NOW, the hard case (again, by way of example): > --- > ** 5-year average of the daily avg Temperature for each ** > ** of January 1,2,3 (ignoring any 2-D location) ** > > ... The principal > problems are in specifying the "time" coordinate to assign to each > point, and how to specify the boundaries of the period over which the > average was taken. And these are only problems because we've > picked items out of a continuous stream and processed only them. > > The decomposition of the time axis into 2 dimensions using the "wrt" > approach seems to solve the latter problem to a large extent (at the > expense of added complexity (IMHO:-) and the necessity of dealing > with a 2-D time axis). The starting and ending "times" of the > average are (effectively) "1990" and "1994". But we still have the > problem of what "time" coordinate value to assign to the data... > > ** > ** What we lack is a way to express the fact the we have "extracted" > ** certain points out of a continuum and averaged only those points. > ** ie, the average was not truly *along* the "time" axis! > ** > ** Again, there are 2 problems associated with this type of average: > ** 1. *where* to "locate" the data along the contracted axis; > ** 2. how to document the span of coordinate values over which > ** the average was taken (since part of the total span > ** isn't actually used in the calculation) > ** These are hard questions, by which I also have been tormented! I cannot deny of course that the contracted multidimensional time axes are complex; I hope I can persuade you that the complexity is worthwhile. The approach aims principally to deal with point (2). To repeat in words what GDT suggest: There are two time axes. The first is a contracted years axis, with boundaries of 1st Jan 1990 and 1st Jan 1994, interval of 1 year, contraction of "mean". The second is a 5-element days axis, wrt the contracted years axis, coordinates 0.5,1.5,2.5 days, lower boundaries 0.0,1.0,2.0 days, upper boundaries 1.0,2.0,3.0 days. This means that the second data value, for instance, represents a period of 1 day, and was obtained by averaging corresponding periods spaced a year apart. The first of these periods is definitely located as from 1st Jan 1990 + 1.0 days to 1st Jan 1990 + 2.0 days, with a representative value of 1st Jan 1990 + 1.5 days. The last, similarly, is in 1994. I think this gives enough information to enable one automatically to produce a description of what this value applies to. I would label it "00:00 2nd Jan - 00:00 3rd Jan, meaned over 1990-1994". Although this can be represented by a brief phrase, the GDT scheme is not limited to cycles that can be easily related to the calendar. We could use exactly the same method to describe a value which applied to an average of periods of 23.93 h spaced 365.3 days apart, for example. The answer to your question (1) is not really well defined. The best answer I can give to where to "locate" the value in time is the label I suggest above, which is a translation of all the available information. It doesn't really belong anywhere in particular on the contracted axis alone. However, if I had to produce a single time coordinate, for the sake of plotting, I would probably go for 1 Jan 1990 + 1.5 days. I would label the point just "12:00 1 Jan" on the plot, if possible, omitting the year. An advantage of the multidimensional approach is that you can have as many of these axes as you like without straining the scheme. At the end of the discussion of the easy case I gave a three-dimensional example. I think this is appealingly flexible. It is easy, for example, to label an average as applying to 10:00-12:00 on all days in JJA in a range of years. Since each contraction has a separate contraction attribute, it is possible to record that a quantity is a maximum over a number of years of the March mean of daily minima, for example. Unlike yours, our scheme does not indicate how many points there were before averaging, or what their coordinates were. This is because our aim was to provide enough metadata to distinguish quantities which are likely to need to be distinguished. We did not try to include all possibly useful information. I think it is *unlikely* that you would have two different quantities, both being means for an average 2nd Jan, one for the years (1990,1991,1992,1993,1994) and the other for (1990,1991,1994). These could not be distinguished in GDT, but they could in your scheme, which records the number and values of the original coordinates. Perhaps you feel that this distinction may need to be drawn? You have further examples of this: > > a) mean zonal wind within the two principal storm tracks > (longitude=160-240E and 280-350E) > > b) mean combined cloudiness for the two layers 200-400mb and > 700-850mb I agree that you might well want to record such information in the metadata. GDT does not handle this, although we did think about it. My preferred extension to GDT would be to allow upon a contracted axis an attribute "expand", naming a variable which provides the coordinates of the original uncontracted axis, including boundaries if appropriate. In case (a), for instance, we might have dimensions: con_lon=1; lon=2; variables: float uwind(con_lon); float con_lon(con_lon); con_lon:expand="lon"; float lon(lon); lon:bounds="bounds_lon"; float bounds_lon(2,lon); data: con_lon = 180.0 ; // a purely nominal value lon = 200.0, 315.0; bounds_lon = 160.0, 280.0, 240.0, 350.0; Best wishes, Jonathan