Conventions for time in netCDF files?

Russ Rew (russ@unidata.ucar.edu)
Tue, 30 Oct 90 13:14:27 -0700

Hi,

Jon Corbet writes:
> In my continuing attempt to use netCDF in a real time environment, I find
> myself wondering how to represent the time of each sample in a netCDF file.
> This seems to me like a fairly common problem -- I can't be the first to
> have thought about it. Yet I can't find anything in the manual about how
> this should be done.

One of the items on our TODO list has been writing a chapter on netCDF
conventions, an expanded version of the section on conventional attributes.
We have some strawman proposals for various conventions that should be
supported by generic applications and netCDF operators, but want to write a
couple of netCDF operator examples first to determine whether the
conventions are usable, before we promulgate something we'll later regret.
We have Steve Emmerson's "units" library (not ready for release yet) that
specifies an acceptable syntax for units, and includes functions to
determine whether any two units strings represent conformable units, to
convert values with units into a canonical internal form, and to convert
between conformable units. Dave Fulker has written a strawman proposal for
conventional ways to store earth-referencing data in netCDF files that
represents the use of different coordinate systems.

> So, I'm curious: what sort of conventions are y'all using to keep track of
> times in netCDF files? I would really like to create something which will
> work with the rest of the world.

For real time data I would think you'd want a base_time for all the
records in one set of observations, and a time_offset from the origin for
each separate record of data. On the other hand, if the data records will
become separated in processing, you would want a complete time stamp on each
record. But assuming the first case, there are two questions: how to
represent the time_origin, and how to represent the time_offsets. The
second question is the easiest: use any units you want and specify the
units, for example

dimensions:
observation = unlimited;
variables:
float time_offset(observation);
time_offset:units = "sec"

Or you could use type "double" if you record times to the nearest
picosecond. (The units library actually accepts any of "s", "sec", or
"second" for seconds.)

We have several proposed conventions for representing time, and are
currently using still other conventions in the netCDF files we are writing,
(an undesirable side-effect of delaying the conventions document until we
can test it with netCDF operators). NASA's original CDF insisted on a
single convention that time would be represented in a double precision
variable neamed EPOCH representing the number of milleseconds since January
1, 0 A.D, which may appropriate for astronomical data. We propose
permitting several time conventions and using a library of conversion
functions (extensions to the existing units library) to make conversion
among them easy.

The convention for time that is most useful when you need to do arithmetic
on times (e.g. subtract two times to obtain an interval) is storing it in
seconds since the Epoch (ours is 1 Jan 1970 00:00:00 UTC following the UNIX
and MSDOS conventions, and allowing times until 2038 to be stored in 32-bit
integers). We propose that besides the "units" attribute, variables can
have an "origin" attribute, so that the base time in your example is
represented as something like:

long base_time;
base_time:units = "sec";
base_time:origin = "Epoch";

If you don't need to do arithmetic on times, a text-string representation
that is understandable by humans is a better choice. In this case, one
proposal is illustrated in the following example:

char base_time(time_string); // yyyy mm dd hh:mm:ss.ssss ZZZ
base_time:units = "text_time";
...
base_time = "1990 10 30 18:00";

We want to use four-digit years instead of the two-digit year of the century
because we believe our data to be one of our best hopes for immortality :-).
We propose using the two-digit "mm" month instead of three character "mmm"
month abbreviation because it is more international and it permits
determining the order of dates with a simple string comparison (but we
should support the use of the "mmm" month abbreviation also because it is
more readable and less ambiguous). By default, times will refer to UTC
time, but a time-zone indicator may be appended (losing the dubious
possibility of time comparisons by string comparisons for times in different
time zones). Also, only as much as is needed of the "hh:mm:ss.ssss" need be
given; zeros will be assumed for unspecified components.

Another conventional way of representing this information in a single "time"
variable rather than two possibly unrelated "base_time" and "time_offset"
variables uses another potential convention illustrated in the following
example:

float time(observation);
time:origin = "=base_time";
time:units = "sec";

where the "=" as the first character of an attribute value indicates another
netCDF variable (or, more generally, an expression involving other netCDF
variables or attributes). Thus, an associated variable would be expected to
be defined, as before, but this time it is explicitly linked to the time
variable:

char base_time(time_string);
base_time:units = "text_time";

We would appreciate comments on these proposed conventions. The proposal on
how "virtual" variables and attributes may be defined by referring to other
variables or attributes in expressions such as

var1:att1 = "= var2:att2 + var3";

is very preliminary, and implies no changes in the netCDF library, only in a
thin layer on top of netCDF to support netCDF operators and generic
applications.

--Russ

"I could have done it in a much more complicated way", said the Red Queen, immensely proud.
[authentic sounding but apocryphal quote erroneously attributed to Lewis Carroll]