|
|
|||
|
||||
1. INTRODUCTION 2. PURPOSE 3. BACKGROUND 4. METHODOLOGY 5. THE LOGICAL MODEL AND PHYSICAL ASPECTS 6. THE MODEL 6.1 Topology 6.2 Purpose, or presentation process, or receiving process 6.3 Generating Process 6.4 Co-ordinates 7. MORE FORMAL METHODS
1.1 It is common practice in the computer industry to define data and processes in relatively abstract terms. This abstract description is known as a logical or conceptual model.
2.1 The initial purpose of this model is to supply a sound foundation for data naming conventions for use within the Office, and hopefully, outside the Office. The naming conventions will enable efficient systems to support effective searching and retrieval of data across the whole Office, or wider.
2.2 The model will be a useful and convenient reference document for understanding and discussing solutions attempting to address the short comings of existing systems. It can also help establish common terminology across the organisation.
2.3 Such a model can also help identify which areas have been tackled successfully, which not so successfully, and which have been ignored, by highlighting those assumptions that have been made and need revisiting. Consequently, it should also help in achieving consensus for a consistent direction to follow (or avoid!).
2.4 It is also important that the model reflects the complete end-to-end life-cycle of the data, from source to sink, because at any one time, the optimal solution may actually be sub-optimal in any limited area.
2.5 In the longer term, such a model will also help explain to industry our needs, and help make a successful transition to using object-oriented technology by making explicit our everyday assumptions.
3.1 Such a model is possible because the logical structure of meteorological data has been remarkably stable for decades - encompassing observational reports, satellite and radar imagery, NWP model dumps, 2D gridded fields, plotted data, contoured data, thermodynamic diagrams. What has changed is the resolution, the quantities exchanged, the amount of processing involved, and the balance between manual and automated processing.
3.2 It is important to define the model in terms independent of technology, wherever possible, to ensure a robust and stable model, and to enable optimal choices of appropriate technologies when they becomes readily available. It should also be largely independent of current solutions, which may be too closely tied to a technology. For example, all SYNOP messages have an attribute of date and time. However, a SYNOP message does not actually specify the year or month of the data, as it was designed within the context of the WMO Global Telecommunications System which is supposed to contain only observations less than 24 hours old. This decision to save the transmission of 6 extra bytes over slow lines now looks dated compared to the cost of all the software that must check and add the (correct) month and year for display, processing and archiving.
4.1 The model is presented as an Entity-Relationship-Attribute (ERA) Model. Items of interest are identified (entities), their attributes listed, and the relationships between entities described in an hierarchical fashion.
4.2 The ERA approach is flexible and this is useful because some attributes may be entities in their own right in other contexts. For example, a synoptic surface observation of different variables (SYNOP) has an attribute of date and time, but the date and time itself may be considered an entity when transforming from one form to another. In fact, if an attribute could be subject to processing and transforming, it is probably a good candidate to be an entity.
4.3 A more specific and rigorous methodology can be adopted when designing physical systems. This is important, because some attributes may be assumed for efficiency reasons, and if the data is transmitted out of the original context, those assumed attributes need to be made explicit (e.g. date and time in a SYNOP).
4.4 A more stringent methodology should also be adopted if the Office migrates to proper object oriented technologies, and then this model will be a useful starting point to agree on common classes, instances and methods. In this case, some of the attributes would be inherited as part of a class, and some attributes would be objects in their own right. A simple object oriented approach would not suffice, because multiple inheritance would be needed to capture the basic features of the model.
4.5 Firstly, a descriptive version of the model is presented, followed by a more concise, abstract version more amenable to processing. Simple, obvious, components of the data are described first.
4.6 It is recognised that some of the detail of the model may be questioned as inappropriate for specific areas of interest or presented in a different manner from current preconceptions, but the intention is to provide a coherent model rich enough to capture the major features of the data. consequently, some details also have been omitted as not necessary for the immediate goals.
5.1 Data structures used in meteorology vary from a single value at an arbitrary location to 4D gridded multivariate model data. Other data structures have mixtures of random and gridded dimensions. Some data structures emphasise the locations rather than the variables. (e.g. fronts, contours, graphical products), whereas others may use non-physical co-ordinate systems.
5.2 The presence of explicit co-ordinate values with the data may increase the apparent dimensionality of the data. E.g. a sounding (1D data) may need a physical 2D array to store the variables and co-ordinates. An extra array dimension may be used to associate different variables for a given co-ordinate.
5.3 At the lowest level, it is envisaged that each 'atomic' entity is one of the following abstract types:
It is recognised that there may be no definite boundary between integer and real types. It could be just a matter of relative ranges and precisions of the numbers. The Boolean type (true or false) is a special limiting case of an enumerated type. Also, confusingly, integers are often used to represent enumerated or Boolean types in physical systems. These types would be mapped to physical encodings in a physical system.
More complex entities can be constructed by using arrays, unordered collections or mixed aggregates of the above abstract types.
This version neglects missing data type or bad data indicators, but these should be addressed in physical rather than logical models.
5.4 The low level encoding, whether into characters, or a binary representation, the use of packing or compression, and the data format are part of the physical domain, and consequently, the size and length of the data also. Whether check-sums are included is also part of the physical domain.
5.5 There is a 'granularity' in the specification of data: a user may be interested in a field of surface pressure for the North Atlantic, but is not too concerned whether the field is on a polar stereographic grid or a latitude-longitude (Plate-Carée) grid. However, any processing software, such as a display package, does need such detail. It is envisaged such distinctions are part of a summary process, with more specific detail at a lower level hidden from higher levels. The higher levels must contain enough information to meet simple, broad, requests, or requests must be able to 'drill down' to the level of detail needed. This approach has been used to constrain some of the freedom of choice.
5.6 Certain aspects are excluded from this model. In particular, administration details relevant to any storage or transmission systems involved. In particular, the owner, sender, recipient, routing list, security, encryption, priority or unique identifier are all considered attributes of the data that should be handled by systems. The users' administrative information, such as experiment number, is also considered out of scope.
The major high level attributes that will be addressed in conjunction with other data types must include the data topology or structure and the co-ordinate structure. This is considered more fundamental, from the computing point of view than the traditional observational data vs. forecast categorisations.
The major high level attributes are:
This refers to the space-time distribution of the data, rather than any data structures. Aggregates of the following components will be addressed later.
0D (point) data.
This covers the conventional synoptic observation: a collection of associated observed variables at a single time and surface location (SYNOP). However, a forecast for a specific location and time is also point data.
1D (line) data.
Typically this covers a line of associated observations, in space, such as a sounding, trajectory or a time series.
Line data may be a vertical sounding for a fixed horizontal location and time, or may be on a finer scale, the vertical co-ordinates having times associated with them. Even more accurately, it may be considered a trajectory through x, y, z and t. However it is highly unlikely to occur with only one of the horizontal co-ordinates fixed. Such a 4D trajectory is similar in structure to a random collection of points that have been sorted vertically or in time.
A trajectory may be purely horizontal at a fixed level, or may be three dimensional.
A time series (or a vertical sounding) may occur at regular or irregular intervals. In the latter case, the times (or levels) must be specified explicitly, whereas in the former, the relevant co-ordinate may be calculated by counting values.
2D horizontal gridded data.
These are usually known as fields or images. They may be observed or forecast. They are usually quasi-horizontal. Usually, the main difference between a field and image is that the former usually has floating point values and the latter, small integers. A multi-channel image is an example with a vector of variables.
If the variables on the grid are components of a vector (such as u and v), it may be necessary to indicate that they may occur only on a subset of the grid, such as every other grid point, with indications of how the edges are handled.
2D vertical gridded data.
Vertical cross-sections, for a given time, occur, as do vertical time-space cross-sections, usually called Hovmueller diagrams. These will be addressed later.
3D gridded data
These are usually full model dumps for a given time, or collections of model output fields. However, note that a times series (animation) of 2D horizontal fields can also be a 3D gridded structure. Also, a Doppler radar can generate a 3D gridded observation.
If multiple variables, or components of a vector, occur on a grid, the subgrids on which each variable occurs, both horizontally and vertically, may need specifying.
4D gridded data
These encompass a full sequence of model dumps from a forecast, or collections of model output fields. Again if multiple variables, or components of a vector, occur on a grid, the subgrids on which each variable occurs, both horizontally and vertically, may need specifying.
Graphical data
These will be addressed later.
This may be a high level categorisation such as the sphere of interest: meteorology or oceanography or hydrology. An oceanographer would consider sea surface temperature differently from a meteorologist. Much data in the Office has been named using the purpose for which the data is to be used (e.g. a forecast to update a subsequent analysis. In this case, perhaps calling it a short forecast based on an analysis with a very late data cut-off time is more precise, if verbose, and does not preclude other possibilities.
The purpose often has influenced the design of complete systems. For example, data needed solely for viewing as an image can still be useful even if it contains random errors, or even completely missing scan lines, whereas data intended for automatic processing usually is not.
Purpose may include test status, warning or advisory or legal status, 'action required' or restrictions on use or dissemination, such as time of use or validity period.
The data may be another version intended as a correction or replacement of earlier data. A sequence number is needed as there are often multiple replacements.
This is usually so obvious to meteorologists, they never specify it. The major categories are: Observation, Analysis, Forecast, graphical generation. More detailed information may be about specific processes, such as the data cut-off time for the analysis, or whether there was interpolation or initialisation, or the length of the forecast, or the location of an instrument for remotely sensed data. Version numbers and run numbers are needed.
The spatial and temporal characteristics, such as representativeness and error details, are also considered an attribute of the generating process.
As more detail is incorporated, there may be multiple, sequential generating processes, including quality control steps. The Levels categories as defined by GCOS, FGGE, CEOS, etc. reflect this (see Annex). This model attempts to address all of these 4 categories of data, but is biased towards fully geo-referenced and formatted data.
Some processes may reduce the dimensionality of the data (e.g. by averaging to a single number). Some processes may increase it (e.g. a forecast).
Generally, rectilinear, earth based, co-ordinates are assumed, but this may not be true for data early in the observation process chain.
Time (and date) appropriate for, or characteristic of, the data is usually a single nominal time often with an associated range. There are a number of different times (and dates), such as nominal data time, validity time (sometimes called verification time) and a true validity period (i.e. this 24 hour forecast should only be used for the next 9 hours).
The appropriate time for a forecast is the validity time, not the data time. The data time, and length of the forecast, are attributes of the generating process. The true validity period is considered an aspect of the Purpose rather than the data.
The characteristic range for an analysis or image is the spread of data around the nominal data time. The range for a forecast could be the period over which an ensemble has been averaged. The boundaries of a range usually use the same units.
It is usual and efficient to specify a reference time and then couch all other times as a displacement with respect to this. This reference time may be the nominal data time (for a forecast), the beginning of this century, the beginning of the Christian era, or some other epochal time (especially for an image). this is part of specifying the temporal co-ordinate system. Some may not have 365 days per year.
Other aspects of time include the temporal characteristics of the data (e.g. the 10 minute mean wind). These should be under the appropriate attribute (e.g. for winds, it is a sub-attribute of the generating process, as are the data time range and cut-off time for an analysis, or the generating time of a forecast). These are different from the accuracy of measurement of the co-ordinate.
Horizontal Locations covered by the data may be regular areas, such as octants of the globe, with a relatively concise mathematical description or irregular areas that are easy to recognise but difficult to specify, such as countries. These latter are appropriate to request data, but not to define it.
For point data, the horizontal co-ordinates must be specified, along with the co-ordinate system and units. Locations may be specified explicitly or via a Look-up Table, called a station list.
For gridded data, it must include the map projection (and assumptions about the shape of the earth or other bodies) and the grid, and the scanning pattern of the grid. The scanning pattern also includes Arakawa type details linked to the variables. Many systems in the Office mix up the grid and projection details The details can be specified either in earth co-ordinates or in the grid co-ordinates. Some projections can be specified concisely and mathematically, others, such as images from polar orbiting satellites, have to use numerical approximations.
Other aspects of horizontal location or area could include the spatial characteristics of the data (e.g. the temperature is a point value, but the rainfall rate is a mean for a 10 km square). These should be under the appropriate attribute (e.g. in this case, they are attributes of the generating process). These are different from the accuracy of measurement of the co-ordinates.
Level consists of the vertical co-ordinate and the vertical co-ordinate type appropriate for the data. There may be a single level or a layer. Some layers encompass the whole atmosphere or ocean. Layers may have their upper and lower co-ordinates of different types. Co-ordinates may be specified directly or via a look-up table. The vertical co-ordinate type is normally specified from the units used.
Another aspect of vertical co-ordinate could include the spatial characteristics of the data (e.g. the temperature is either a point value or a layer mean). In this model, they are attributes of the generating process. These are different from the accuracy of measurement of the co-ordinate. If layers are specified to indicate the representativeness, they do not have to correspond to the levels or layers used for co-ordinates (i.e. the layers may be overlapping or non-contiguous).
6.5 Parameter is usually the commonly accepted name for the data. Also whether the data is a scalar, vector or tensor must be specified. The dimensional units of the parameter must be specified. A number of parameters for a given co-ordinate may be associated together.
Most of the other common parameter attributes are actually attributes of the generating process. These include instrumental, platform and calibration details.
Vectors or groups of scalars, on grids may need to specify the arrangement on subgrids, both horizontally and vertically, and temporally. Sometimes it may be more convenient to consider a variable that occurs on a differing subgrids to actually be different variables, so that variables at different temporal subgrid points (timesteps) are considered as intrinsically different.
However, many parameters are often scaled to other units for convenience using a Look-Up Table (LUT). Strictly, this is another generating or receiving process, but is so frequent (ubiquitous for imagery) that it is, on balance, best to consider it here. It is also often used to specify the vertical levels of data (e.g. WMO Standard Pressure Levels). A LUT is a set of pairs of numbers, mapping from integers to either other integers or reals. The mapping may be valid only for the specified values (pointwise), or may map onto ranges of values. The ranges could be left or right 'handed' (e.g. a<=b<c or a<b<=c).
An entry may be reserved to indicate missing or undefined values.
Rather than adopt esoteric notation such as < >::= < >|< >*< >+, it has been decided to use
Meteorological data: Topology, Purpose, Generating process, Co-ordinates, Parameters. Data Topology: 0D (point) or 1D (line) or 2D gridded or 3D gridded or 4D gridded or Graphical. 1D (line): instantaneous sounding or extended sounding or horizontal trajectory or 3D trajectory or time series. 2D gridded: 2D horizontal gridded or 2D vertical gridded. Purpose: Domain, Processing, Status, Version, Time Domain: meteorology or oceanography or terrestrial; Processing: manual/display or automated; Status: warning/'action required' or advisory or test; Version: original or correction; Time: Time of use or validity period. meteorology: atmospheric or climatic or cryospheric. oceanography: to be defined. terrestrial: ecological or hydrology or seismic or limnographic. correction: part replacement or full replacement, sequence number. Generating Process: Process category, Process owner, Process version number, Process run identifier, Temporal characteristics, Horizontal Spatial characteristics, Vertical Spatial characteristics. Process category: Observation (Level 0/1/2 data) or Analysis (Level 3 data) or Forecast (Level 4 data) or Graphical generation. Temporal characteristics: 10 minute mean or frequency of occurrence or synoptic or asynoptic. Horizontal Spatial characteristics: horizontal point value or area around point. area around point: bounding area, horizontal co-ordinate system. Vertical Spatial characteristics: representative layer or vertical point value. representative layer: level 1, level 2. level: number, vertical co-ordinate units; Observation: instrumental details; including channel details, and whether visible or infra red, radar; sferics, etc. platform details: mobile, stationary, etc. platform location, calibration details: Level 0 for unprocessed instrument data in engineering/telemetry units, Level 1 in physical units and geo-referenced, Level 2 when converted to geo-physical parameters. spatial characteristics: upper air, surface, land, sea, satellite orbit data, ice/snow, etc. Analysis (Level 3 data): data cut-off time; interpolation or re-sampling details; interpolation or re-sampling details: no of samples, method, temporal and spatial range of samples, temporal and spatial range of output data, Forecast (Level 4 data): model type, resolution, owner, interpolation or initialisation details; model type: atmospheric, ocean, wave; interpolation or initialisation details; Forecast length/periods or data time. Co-ordinates: Time (and date), Spatial co-ordinate system. Spatial co-ordinate system: Cartesian or cylindrical or 3D polar. Cartesian: Horizontal co-ordinates, Level. Cylindrical: Horizontal co-ordinates, Level. Time (and date): Reference time (UTC), Displacement, Range. Reference time: UTC origin or epochal time or nominal data time for forecast. Displacement: period, time units (verification time for a forecast or nominal data time for observations/analysis). Range: period 1, period 2, time units. Horizontal locations: Bounding area, Co-ordinate system, Random points or rectangular grid. Bounding area: regular area or irregular area. regular area: lat/long rectangle or circle. lat/long rectangle: whole globe or hemisphere or quadrant or octant or Marsden square or other lat/long rectangle. other lat/long rectangle: lat 1 lon 1 lat 2 lon 2 circle: radius, units. irregular area: to be defined. e.g. countries, administrative regions etc. Co-ordinate system: map projection, reference co-ordinates, scale, map units, axis orientation. map projection: projection type, projection orientation, size and shape of the earth or other body. projection type: lat/long or instantaneous space view or polar orbitter or polar stereographic or other. projection orientation: normal or transverse or oblique. reference co-ordinates: to be defined. Rectangular grid: scanning pattern, rotation, nx ny, extent, grid length/resolution, aspect ratio. Random point: x, y Level: Direct vertical co-ordinate or vertical co-ordinate LUT reference. Direct vertical co-ordinate: layer or level layer: level 1, level 2. level: number, vertical co-ordinate units; Parameter: scalar or vector or tensor. scalar: value or parameter LUT reference. value: number or invalid number, dimensional units, other attributes(????accuracy/error range). number: integer or real. invalid number: missing number or undefined number. vector: 2D vector or 3D vector or arbitrary vector. 2D vector: scalar 1, scalar 2. 3D vector: scalar 1, scalar 2, scalar 3. arbitrary vector: component count, scalar 1, scalar 2, ..... tensor: to be defined???? LUT reference: enumerated type. LUT: LUT pairs, dimensional units, range type. LUT pair: LUT reference, number or ????Invalid number for parameter LUTs????. range type: pointwise or interval is after first value or interval is before first value.
The definitions for FGGE, CEOS, EUMETSAT, ESA etc., are similar but not identical.
The following conventions for defining levels of data and products have been widely used by the atmospheric research community since the Global Atmospheric Research Programme and are adopted for use by GCOS. When applied to data from satellites, these conventions should comply with the definitions used by CEOS which are very similar to the definitions given below.
LEVEL 0: Unprocessed instrument data at full space-time resolution with all available supplemental information to be used in subsequent processing appended.
LEVEL I: Instrument readings at full instrument resolution expressed in appropriate units and referred to earth co-ordinates, e.g. radiances, positions of constant level balloons, electrical current, etc.
LEVEL II: Geophysical parameters or environmental observations obtained directly from instruments or converted from Level I data.
LEVEL III: Gridded analyses (spatially or temporally resampled data) prepared from Level II data. The resampling may include averaging and compositing.
LEVEL IV: Model output products produced from Level II or Level III data. These often include model-derived parameters such as fluxes. Levels II, III and IV are often sub-divided into A and B sublevels:
LEVEL A: These data are usually available in real or near-real time and are subject to strict operational cut-off times. Level A data and analyses are usually subject to operationally driven quality control and are useful for operational purposes and preliminary research.
LEVEL B: These data are normally subjected to rigorous quality assurance procedures and often contain observations which were not available within the operational cut-off times applied to Level A. They are generally more accurate and complete than Level A data because they have the benefit of more data, freedom from rigid production schedules, and utilisation of more elaborate analysis techniques.
| Contact Us Site Map Search Terms and Conditions Privacy Policy Participation Policy | |||||
|
|||||