To: caron@dilbert.acd.ucar.edu (John Caron) Cc: nuwg@comet.ucar.edu Subject: Re: More comments on netcdf "gridded" data conventions. Date: Thu, 20 Apr 1995 13:48:31 -0600 From: Russ Rew Organization: UCAR Unidata Program Hi John, Here's my comments on your ideas for improving the NUWG conventions. I'd like to keep this discussion going on the nuwg mailing list, so please feel free to jump in. > --------------------------------------- > General NetCFD Conventions and Definitions > > Coordinate Variables > A variable with the same name as a dimension is a "coordinate > variable", and its data values are the coordinate values for that dimension. > The variable must be indexed by the dimension. > > Coordinate Reference > Coordinate systems cannot always be described by coordinate > variables which are single valued and have the same cardinality as > a dimension. A "coordinate reference" is one or more variables that > also describe a coordinate system, and may have different > sizes than the dimension it describes. > A "coordinate reference variable" is a variable whose data values > describe the coordinate values for a dimension. Its cardinality is a > function of the size of the dimension. It must have an attribute named > "dimension" whose value is the name of the dimension. > A coordinate reference is defined by an attribute with > the same name as a dimension. The value of the attribute is the > name(s) of coordinate reference variable(s) that describe the coordinate > values for that dimension. An attribute that is global becomes the default > description for that dimension in the file, and may be overridden by a > variable's attribute. I don't understand the necessity or desirability of having an attribute named "dimension" for coordinate reference variables. If the coordinate attribute names the coordinate reference variables and has the same name as the dimension, why do the coordinate variables also have to have attributes that name the dimension? This seems to be redundant representation of the same information. Problems with requiring redundant representation include the possibility that the two representations are inconsistent, and the probability that data providers will neglect to provide the information in one or the other form. To make this clearer, here's an example we are currently using in our ruc.cdl for the output of the MAPS/RUC model: dimensions: record = UNLIMITED ; // (reference time, forecast time) ... variables: double reftime(record); // reference time of the model reftime:long_name = "reference time"; reftime:units = "hours since 1992-1-1"; double valtime(record); // forecast time ("valid" time) valtime:long_name = "valid time"; valtime:units = "hours since 1992-1-1"; ... :record = "reftime, valtime" ; // "dimension attribute" -- means // (reftime, valtime) uniquely // determine record Here "reftime" and "valtime" are like what you have described as coordinate reference variables for the "record" dimension, and so should have attributes named "dimension" with value "record". But the record dimension has a global coordinate attribute "record" that names the two coordinate reference variables, so this information is already represented. The idea here was to be able to use the "record" dimension for representing an ordered pair of times, (reference time, forecast time), so you could model outputs for multiple reference times and multiple forecast times in a single netCDF file with only one unlimited dimension. Perhaps this isn't the use you had in mind for coordinate reference variables, but we do need this capability. > --------------------------------------- > > (1) Time > > > > 1) Time variables are always type double > > Not always adequate or possible. Should have several approved > types of time coordinates. For NUWG conventions, we decided that type double is adequate, especially since the time units can be anything from picoseconds to eons and can specify a base time offset, according to the udunits library standard. I don't remember the rationale for not permitting offsets from a base time to use a short types, which would seem to be desirable if the bulk of data in a file were relatively low-resolution time offsets. We also agreed that data providers can include other representations for times as well (e.g. human readable). This is a general property of the NUWG conventions, that they specify a minimum of what should be in a netCDF file so that the generic applications we are developing will be able to deal with it. We don't proscribe adding extra information to netCDF data; data providers are free to include additional attributes, variables, and dimensions for other purposes, but our applications won't use that information. Furthermore, data providers are free to develop a more restrictive set of conventions (e.g. requiring that a time variable always be named "time") that our generic applications will be able to handle. > > 2) Time variables are indexed by another variable (can > > be the unlimited dimension) > > 3) The names given to the time variables and the indexing variables > > are not subject to convention > > How do you know which dimension is time? Also, if you want to use > "coordinate variables" (often handy), then the variable name must = > dimension name, by convention. Applications that need to know the name of the time variable will have to be provided that information as an input, either in a table of associated variable names, as an argument, as a clickable selection, etc. We decided very early that we wanted NUWG conventions to avoid explicit variable names, where possible; the French name for the time variable should be acceptable for French data, and applications that require specific variable names will have to support a mapping between file variable names and names used in the application. Again, more restrictive conventions that require explicit variable names should work fine with this set of conventions. > > 4) This convention supports grids requiring any number of times > > to fully describe the data > > Replace all above with Gridded Data Convention: > 1) The time dimension is always named "time". It may or may not be > an unlimited dimension. > 2) A coordinate variable or coordinate reference must be defined > to describe the time coordinates for all fields with the time dimension. > 3) The following ways of specifying the time coordinate values are agreed > to, with extensions possible in the future: > 3a) a double variable, units "secs since " Since the time variable must have a "units" attribute by our conventions, this restriction is unnecessary; any units acceptable to the udunits package are OK for time variables, including "years since " and "picoseconds since ". Applications should use the udunits library to deal with such differences. > 3b) a character string in the form specified by the FGDC "Content > Standards for Digital Geospatial Metadata", unit = "FGDC". (?) > 3c) a character array, suitable for displaying to the user. Define > attributes to the coordinate variable or coordinate reference > variables(s) as needed to disambiguate. These are certainly OK as auxiliary representations for time, but because comparing or doing arithmetic with such times can be unwieldy, the NUWG conventions require the provision of a time that applications can easily compare and compute with as well. > Comment: possible problem: do multiple coordinate reference variables > create unique instances, such as ref_time/valid_time, or do they > describe alternate possibilities, such as a "secs since..." and a > descriptive string? I assumed the former. Alternate human-readable representations can appear in auxiliary variables or attributes (e.g. "text_time"). > A better example, from the Summary: > > ... > > This specifies that for grid point u(1,2,1,1), the value of the > > "z" dimension (vertical level) can be found in p(1,2,1,1) or in > > vpt(1,2,1,1). > > does this mean that both p and vpt are needed, or just one? are they > both valid? You're right that this is not clear from the Summary. My assumption is that it means that the values of p and vpt are used to uniquely determine the value of z at each point of the grid. How z is to be determined from p and vpt or even how this information is to be represented in the file is not clear to me. > --------------------------------------- > > (2) Vertical Coordinate Systems/Levels > > > > 1) When necessary, a referential variable can be used as an index > > into associated variables > > 2) This referential indexing is indicated by a variable or global > > attribute with the same name as the dimension > > This is covered implicitly by the "Coordinate Reference" > convention. A good set of examples would help, though. Yes, and that is what we ultimately hope to provide. Example NUWG-approved CDL files are intended to be an important product of our deliberations, and in some cases a more practical way to present details of our consensus than a formal specification. These also give people a set of examples against which to test their generic applications. > --------------------------------------- > > (3) Navigation > > > Of all the special topics, the conventions concerning navigation are > > the least mature. Thus far, we have agreed that the navigation information > > associated with a grid will be stored in a suite of navigation variables. > > These variables are defined by the GRIB Edition 1 document by John Stackpole > > in the section on the GDS (Grid Description Section) octets 7-44. The > > actual set of variables stored for any given navigation, will depend on > > that navigation. For example, the variables needed to describe a polar > > stereographic grid are different than the variables needed to describe a > > simple lat/lon grid. > > > > Each suite of navigation variables must contain a numeric ID containing > > the grid identification number, and an indication of the originating > > center, both assigned by the GRIB Edition 1 document. Missing values > > may be used if the particular grid is not described by the GRIB document. > > For example, the navigation variables "grid_number" and "center_id" > > may be used. > > > > Each variable that is defined on the grid must have the "navigation_dim" > > variable attribute associated with it. The string defined in this > > attribute is the name of the dimension by which all navigation variables > > are dimensioned. In this way, the "navigation_dim" groups all the > > navigaton variables together (in the same sense that a structure groups > > quantities of varying types together). The "navigation_dim" attribute > > also indicates which variables in a netCDF file are actually defined > > on the grid. > ... > > Summary of Conventions: > > > > 1) Navigation information stored in variables and dimensioned > > by the value of the variable attribute "navigation_dim" > > You seem to actually be trying to define a structure. Why tie > it together with an artificial dimension called "nav"? Because the netCDF model is incapable of directly representing structures; it can only directly represent scalars and arrays and has no nesting capabilities. But an artificial dimension can be used to associate a collection of named variables into something isomorphic with a structure. The dimension name can then be used much like a structure name, to identify the cluster of variables that use it. This still doesn't support nested structures, because a dimension can't have a dimension, but we think it's adequate so far. The limitations of supporting a fully-functional Fortran77 interface prevent adding nested structures to the netCDF model (though netCDF actually supports types such as short and byte that can't be used in strictly conforming Fortran77 programs). > > 2) All grid variables have the "navigation_dim" attribute > > Why not just a global attribute? Are you trying to > seperate the real data from the navigation data? It seems to me > that a more natural way to do it is through the dimensions. If a > field looks like var(lon, lat, level) then it's "defined on the grid". navigation_dim can't be a global attribute because we want to be able to store variables defined on multiple grids in the same file. For example, we want to be able to store a satellite image and model output data for the same time in the same netCDF file, requiring two different navigation dimensions for two very different sets of georeferencing parameters. > > 3) Which variables (content, not variable names) defined in > > GRIB edition 1 document GDS octets 7-44 "Grid description". > > (Table C). Content determined by which grid navigation described. > > 4) A numeric ID listing the Grid Identification number and an originating > > center ID from the GRIB edition 1 document must be included in the > > navigation variables. Missing data values are OK for grids not > > described by a GRIB document. > > I disagree: > 1) makes file not self describing > 2) GRIB variables are predefined. None of the variables I work > with are GRIB variables. > 3) too "GRIB-centric" > > I dont mind GRIB ID being one of the ways to specify a variable or > its navigation. Im sure its quite convenient if you're translating GRIB > to netcdf. Right. Actually we agreed on another navigation variable, "nav_model", associated with each navigation that provides the context in which to interpret all the other variable names: char nav_model(nav, nav_len) ; // navigation parameterization nav_model:long_name = "navigation model name"; ... and for GRIB-centric data, its value is: nav_model = "GRIB1" ; but for parameterizations based on the Federal Geographic Data Committee Content Standards for Digital Geospatial Metadata it could be: nav_model = "FGDC-1994" ; and for parameterizations based on the geo-TIFF model, it might be nav_model = "geo-TIFF version 1" ; Notice that it is possible to use multiple navigation parameterizations within the same netCDF file with this mechanism. We haven't yet agreed on how many values of nav_model our generic applications should support, but we want to support at least "GRIB1". Yesterday you gave us the idea of having a very simple nav_model used when coordinate variables suffice to specify the georeferencing, for example with simple lat/lon grids. Some suggestions for this parameterization were nav_model = "simple" or "BDN" or "" or even the default interpretation when a variable has no navigation dimension and hence no nav_model variable. I think I like this last convention best, but we need to try it out. I'll be proposing a simple lat/lon gridded file soon that uses one of these conventions. > > 5) Ordering or naming of grid dimensions not subject to convention. > > Dimensions defining grid variables defined by "x_dim" and "y_dim" > > navigation variables > > How do you know what means what? Why not make some naming conventions > for the dimensions? The GRIB1 georeferencing model assumes an "x" and "y" dimension for *some* projections, so we require that the netCDF file make clear which netCDF indices correspond to which GRIB1 indices in such cases. We try to avoid requiring particular names when we can avoid it with indirection. > Well, I guess Im not very happy with this. > A few thoughts: > * Are we ok with the notion that gridded data == "geo referenced" data. > If not, then we are into a specialization of "gridded data". I think we > might change the name to "Conventions for GeoReferenced Gridded Data". I agree, since there are lots of examples of gridded netCDF data that's not georeferenced. > * Given that, I think that referencing grids comes down to two > (always orthogonal in my possibly limited experience) parts: > 1) specifying the grid in its natural projection plane, and > 2) specifying the projection function. > The first part can (almost) entirely be done with coordinate variables or > coordinate reference variables. The second part involves enumerating each > projection and its parameters. Theres no obvious reason not to adopt the > FGDC's work on this enumeration, with extensions for GRIB or other formats. Our approach permits using either the FGDC or GRIB approach, as well as others, with the nav_model variable. One obvious reason for us not to adopt the FGDC approach is that we have megabytes of GRIB1 data pouring into our machines every hour, and most of us lack familiarity or experience with the FGDC standard. This might be a good place to note where it's available: ftp://fgdc.er.usgs.gov/pub/metadata/meta.6894.ps (PostScript) ftp://fgdc.er.usgs.gov/pub/metadata/meta.6894.wp5 (Wordperfect 5.0) http://fgdc.er.usgs.gov/ (FGDC Home Page) > * There is a reasonably big payoff to getting this geo referencing > right. I assume its been driven so far by the "real time" RUC feeds, > obviously "GRIB-centric". The modelers are all in their own private Idaho, > but with a new generation of models poised to define their data formats, > we might get lucky if we "do it right". Does anyone have any opinions on the > technical merit / politics surrounding the FGDC stuff? I agree with the > skepticism at the UCAR data conference about "one size fits all" data formats. > Nonetheless, we should do what we can when we can. I think gridded data > can be standardized along the lines we are discussing. I think a few more > iterations.... Ultimately, if a great data fusion framework or "killer" applications are developed that require an FGDC parameterization, data will be put in that form, but right now NUWG feels that most of our constituency is more familiar with the GRIB georeferencing, and we already are getting huge volumes of data that use it. When everyone is more familiar with it, we should consider whether it's a good candidate for an acceptable value for our "nav_model" variable. But right now it seems somewhat incomplete and GIS- and USGS-centric. For example, GRIB and BUFR are not mentioned, but DEM (USGS Digital Elevation Model format) and DTED (Digital Terrain Elevation Data format) are. But on page 39 netCDF is included, so maybe it's complete enough :-). Sorry for the length of this ... --Russ ______________________________________________________________________________ Russ Rew UCAR Unidata Program russ@unidata.ucar.edu http://www.unidata.ucar.edu Organization: NCAR ACD From: caron@dilbert.acd.ucar.edu (John Caron) Subject: Re: Navigation Information Query To: nuwg@comet.ucar.edu Date: Wed, 26 Apr 1995 10:03:58 -0600 (MDT) Heres how I would think about this problem using the principle that the dimensions describe the grid via coordinate variables and coordinate reference variables, and the "nav" structure defines the mapping of the grid to world coordinates. For reference, here is a repeat of my proposal for "Coordinate References", which are an extension of "Coordinate Variables". (actually I would like to reword it some to make this case more explicitly covered, as I wasnt thinking of non-indeependent coordinate systems at the time). Coordinate Variables A variable with the same name as a dimension is a "coordinate variable", and its data values are the coordinate values for that dimension. The variable must be indexed by the dimension. Coordinate Reference Coordinate systems cannot always be described by coordinate variables which are single valued and have the same cardinality as a dimension. A "coordinate reference" is one or more variables that also describe a coordinate system, and may have different sizes than the dimension it describes. A "coordinate reference variable" is a variable whose data values describe the coordinate values for a dimension. Its cardinality is a function of the size of the dimension. A coordinate reference is defined by an attribute with the same name as a dimension. The value of the attribute is the name(s) of coordinate reference variable(s). An attribute that is global becomes the default description for that dimension in the file, and may be overridden by a variable's attribute. > // everything starts out normally > > byte Z(elevs, radials, refs); > Z:long_name = "Reflectivity"; > Z:units = "dBZ"; > Z:navigation = "navZ"; > > byte V(elevs, radials, vels); > V:long_name = "Velocity"; > V:units = "meters / second"; > V:navigation = "navV"; So we have a 3D grid. The two variables share two of the dimensions. The third is different. In the simplest case, we define 4 coordinate variables: float elevs(elevs); elevs:long_name = "elevation angles"; elevs:units = "radians"; float radials(radials); radials:long_name = "radial angles"; // or something radials:units = "radians"; float refs(refs); refs:long_name = "reflectivity range gates"; // or something refs:units = "meters"; float vels(vels); vels:long_name = "velocity range gates"; // or something vels:units = "meters"; This assumes that the coordinates they are independent from each other. In the post, it appears they are not independent, so how do we handle that case? The first two coord. vars are probably correct: float elevs(elevs); elevs:long_name = "elevation angles"; elevs:units = "radians"; float radials(radials); radials:long_name = "radial angles"; // or something radials:units = "radians"; And we need a way to specify the other coord as a function of elev angle: float rangeZ(elevs, refs); rangeZ:long_name = "Radial reflectivity range"; rangeZ:units = "meters"; float rangeV(elevs, vels); rangeV:long_name = "Radial velocity range"; rangeV:units = "meters"; So heres the coordinate reference, defined as a global or variable-specific attribute: : refs = "rangeZ"; : vels = "rangeV"; Now, its the job of the nav structure to map the grid coordinate system, (defined by the coord vars. "elevs", "radials", and the coord reference vars. "rangeZ", "rangeV") to world coords, say (lat, lon, altitude). So we imagine we have a tranformation function t(azi, zen, rho) -> (lat, lon, z). What does it need to know? Probably just the lat, lon position of the radar. Then you just feed it azi = radial(i), zen = elevs(j), and rho = rangeZ(j,k) or rangeV(j,k). Note t() can be a very general function in this way. So you just need, for the nav structure: char nav_model(nav, nav_len) ; // navigation parameterization nav_model:long_name = "navigation model name"; char projection_type(nav, nav_len) ; float siteLat(nav); siteLat:long_name = "Latitude of site"; siteLat:units = "degrees_north"; float siteLon(nav); siteLon:long_name = "Longitude of site"; siteLon:units = "degrees_east"; float siteAlt(nav); siteAlt:long_name = "Altitude of site above mean sea level"; siteAlt:units = "meters"; data: nav_model = "Unidata projection library" ; projection_type = "radar spherical coordinates"; Now, if there were some parameters to the projection function that depended on whether you were transforming the reflectivity or the velocity, you might come back to either seperate nav structures (navZ and navV as in the original post, or to Russ' heirarchy. In this case, I'm guessing there isn't. In any case, the advantage of this approach might be in reducing the complexity of the nav structure. The overall complexity of the problem is not obviously reduced, other than obviating the need for 2 nav structures or a heirarchy of nav structures. I would argue that the real advantage, however, is making explicit the seperation of the grid description from the world mapping. Each is straightforward in itself, and a little bit confusing munged together, especially from the perspective of an automatic file reader. Looking at Stackpole's summary of GRIB grids, I can see how tempting it is to use their system. After all, there it all is, nicely packaged and described. All the work is apparently done for you. Its glaring weakness is that it describes very specific grids, rather than a family of grids based on user-settable parameters. Such a family would be exactly what I mean by a projection function. Note that the projection function maps any x,y in the projection plane to lat,lon, so it can handle any grid topology. The number of projections in common use is reasonably small, but the number of different possible grids is infinite. I am going to continue these thoughts more abstractly in the email thread to "Georeferenced gridded data conventions", but it would probably be useful to include this example in that thread. Organization: NCAR ACD From: caron@dilbert.acd.ucar.edu (John Caron) Subject: Re: More comments on netcdf "gridded" data conventions. To: nuwg@comet.ucar.edu Date: Wed, 26 Apr 1995 13:51:14 -0600 (MDT) Georeferenced gridded data conventions, continued... ... > I don't understand the necessity or desirability of having an attribute > named "dimension" for coordinate reference variables. If the coordinate > attribute names the coordinate reference variables and has the same name as > the dimension, why do the coordinate variables also have to have attributes > that name the dimension? This seems to be redundant representation of the > same information. Problems with requiring redundant representation include > the possibility that the two representations are inconsistent, and the > probability that data providers will neglect to provide the information in > one or the other form. > Yes, you're right, we dont need the attribute named dimension, since the coordinate reference properly associates the dimension with the coordinate reference variable(s). ... > To make this clearer, here's an example we are currently using in our > ruc.cdl for the output of the MAPS/RUC model: > > > dimensions: > record = UNLIMITED ; // (reference time, forecast time) > ... > variables: > double reftime(record); // reference time of the model > reftime:long_name = "reference time"; > reftime:units = "hours since 1992-1-1"; > > double valtime(record); // forecast time ("valid" time) > valtime:long_name = "valid time"; > valtime:units = "hours since 1992-1-1"; > ... > :record = "reftime, valtime" ; // "dimension attribute" -- means > // (reftime, valtime) uniquely > // determine record > > Here "reftime" and "valtime" are like what you have described as coordinate > reference variables for the "record" dimension, and so should have > attributes named "dimension" with value "record". But the record dimension > has a global coordinate attribute "record" that names the two coordinate > reference variables, so this information is already represented. > > The idea here was to be able to use the "record" dimension for representing > an ordered pair of times, (reference time, forecast time), so you could > model outputs for multiple reference times and multiple forecast times in a > single netCDF file with only one unlimited dimension. Perhaps this isn't > the use you had in mind for coordinate reference variables, but we do need > this capability. no, I was intending to include this case. The fact that the example here is the unlimited dimension is immaterial, so I wanted to state the general rule by which such coordinate descriptions are made. ... > > --------------------------------------- > > > (1) Time > > > > > > 1) Time variables are always type double > > > > Not always adequate or possible. Should have several approved > > types of time coordinates. > > For NUWG conventions, we decided that type double is adequate, especially > since the time units can be anything from picoseconds to eons and can > specify a base time offset, according to the udunits library standard. I > don't remember the rationale for not permitting offsets from a base time to > use a short types, which would seem to be desirable if the bulk of data > in a file were relatively low-resolution time offsets. > > We also agreed that data providers can include other representations for > times as well (e.g. human readable). This is a general property of the NUWG > conventions, that they specify a minimum of what should be in a netCDF file > so that the generic applications we are developing will be able to deal with > it. We don't proscribe adding extra information to netCDF data; data > providers are free to include additional attributes, variables, and > dimensions for other purposes, but our applications won't use that > information. Furthermore, data providers are free to develop a more > restrictive set of conventions (e.g. requiring that a time variable always > be named "time") that our generic applications will be able to handle. So if I understand you, the convention is to always include a type double time variable, and optionally other variables that describe the time. So if I have monthly averaged data, my time variable could be month number, with units "months"? Then I provide an alternate description which is the month name? Generic application, would provide both descriptions to the user? The motivation for this presumably is to define a time ordering of the data? If these are true, then I agree with this convention. > > > 2) Time variables are indexed by another variable (can > > > be the unlimited dimension) I presume this means "you must define a coordinate variable for the time coordinate"? > > > 3) The names given to the time variables and the indexing variables > > > are not subject to convention > > > > How do you know which dimension is time? Also, if you want to use > > "coordinate variables" (often handy), then the variable name must = > > dimension name, by convention. > > Applications that need to know the name of the time variable will have to be > provided that information as an input, either in a table of associated > variable names, as an argument, as a clickable selection, etc. We decided > very early that we wanted NUWG conventions to avoid explicit variable names, > where possible; the French name for the time variable should be acceptable > for French data, and applications that require specific variable names will > have to support a mapping between file variable names and names used in the > application. Again, more restrictive conventions that require explicit > variable names should work fine with this set of conventions. Here I continue to disagree. Time is important enough that we have a few conventions about it, such as requiring a coordinate variable of type double. So you know that the generic application needs to figure out which coordinate is time. So you live in Quebec and know you better damn well not assume English. So you define a lookup table that specifies that the real name of the time variable is. The way that you do that is you specify the keyword "TIME", and the name of the time variable. Ooops! there we are, using english again. Anyway the point is, you need to tell a generic application which is the time coordinate, and you might as well make some convention about it. How about a global attribute named time, whose value is the name of the time dimension? That way all of the file is in the native language of the user, and only one place has this conventional name? (By the way, I dont care if its English, but it has to be fixed for all users, no matter what language). You already have English keywords like "data", and "variables" etc in the netcdf file. ... > > Comment: possible problem: do multiple coordinate reference variables > > create unique instances, such as ref_time/valid_time, or do they > > describe alternate possibilities, such as a "secs since..." and a > > descriptive string? > > I assumed the former. Alternate human-readable representations can appear > in auxiliary variables or attributes (e.g. "text_time"). > > > A better example, from the Summary: > > > ... > > > This specifies that for grid point u(1,2,1,1), the value of the > > > "z" dimension (vertical level) can be found in p(1,2,1,1) or in > > > vpt(1,2,1,1). > > > > does this mean that both p and vpt are needed, or just one? are they > > both valid? > > You're right that this is not clear from the Summary. My assumption is that > it means that the values of p and vpt are used to uniquely determine the > value of z at each point of the grid. How z is to be determined from p and > vpt or even how this information is to be represented in the file is not > clear to me. I think it will work to assume that both are needed, and their combination is unique. Even when they are really synonyms, I think that interpretation is valid. I would say in this example, that the "z coordinate" IS the combination (p, vpt). The projection function would be responsible for mapping (p, vpt) to altitude in km if that was desired. This also works for the hybrid coordinate system, which is the most complicated vertical coordinate I have seen. ... > > You seem to actually be trying to define a structure. Why tie > > it together with an artificial dimension called "nav"? > > Because the netCDF model is incapable of directly representing structures; > it can only directly represent scalars and arrays and has no nesting > capabilities. But an artificial dimension can be used to associate a > collection of named variables into something isomorphic with a structure. > The dimension name can then be used much like a structure name, to identify > the cluster of variables that use it. This still doesn't support nested > structures, because a dimension can't have a dimension, but we think it's > adequate so far. The limitations of supporting a fully-functional Fortran77 > interface prevent adding nested structures to the netCDF model (though > netCDF actually supports types such as short and byte that can't be used in > strictly conforming Fortran77 programs). ok, I get it, artificial dimensions are really structures. > > > > 2) All grid variables have the "navigation_dim" attribute > > > > Why not just a global attribute? Are you trying to > > seperate the real data from the navigation data? It seems to me > > that a more natural way to do it is through the dimensions. If a > > field looks like var(lon, lat, level) then it's "defined on the grid". > > navigation_dim can't be a global attribute because we want to be able to > store variables defined on multiple grids in the same file. For example, > we want to be able to store a satellite image and model output data for the > same time in the same netCDF file, requiring two different navigation > dimensions for two very different sets of georeferencing parameters. I propose global attribute ok, with variable override, since in the majority of files there will be only one navigation_dim. > > > > 3) Which variables (content, not variable names) defined in > > > GRIB edition 1 document GDS octets 7-44 "Grid description". > > > (Table C). Content determined by which grid navigation described. > > > 4) A numeric ID listing the Grid Identification number and an originating > > > center ID from the GRIB edition 1 document must be included in the > > > navigation variables. Missing data values are OK for grids not > > > described by a GRIB document. > > > > I disagree: > > 1) makes file not self describing > > 2) GRIB variables are predefined. None of the variables I work > > with are GRIB variables. > > 3) too "GRIB-centric" > > > > I dont mind GRIB ID being one of the ways to specify a variable or > > its navigation. Im sure its quite convenient if you're translating GRIB > > to netcdf. > > Right. Actually we agreed on another navigation variable, "nav_model", > associated with each navigation that provides the context in which to > interpret all the other variable names: > > char nav_model(nav, nav_len) ; // navigation parameterization > nav_model:long_name = "navigation model name"; > ... > > and for GRIB-centric data, its value is: > > nav_model = "GRIB1" ; > > but for parameterizations based on the Federal Geographic Data Committee > Content Standards for Digital Geospatial Metadata it could be: > > nav_model = "FGDC-1994" ; > > and for parameterizations based on the geo-TIFF model, it might be > > nav_model = "geo-TIFF version 1" ; > > Notice that it is possible to use multiple navigation parameterizations > within the same netCDF file with this mechanism. yes, I think this makes sense, and allows future growth without breaking existing conventions. I withdraw my objections now that I understand what you are doing. Thanks for updating the WWW document to reflect your current convention. I dont have any problem with making GRIB one of the ways to describe grids, as long as its not the only way. > > We haven't yet agreed on how many values of nav_model our generic > applications should support, but we want to support at least "GRIB1". > Yesterday you gave us the idea of having a very simple nav_model used when > coordinate variables suffice to specify the georeferencing, for example with > simple lat/lon grids. Some suggestions for this parameterization were > nav_model = "simple" or "BDN" or "" or even the default interpretation when > a variable has no navigation dimension and hence no nav_model variable. I > think I like this last convention best, but we need to try it out. I'll be > proposing a simple lat/lon gridded file soon that uses one of these > conventions. > > > > 5) Ordering or naming of grid dimensions not subject to convention. > > > Dimensions defining grid variables defined by "x_dim" and "y_dim" > > > navigation variables > > > > How do you know what means what? Why not make some naming conventions > > for the dimensions? > > The GRIB1 georeferencing model assumes an "x" and "y" dimension for *some* > projections, so we require that the netCDF file make clear which netCDF > indices correspond to which GRIB1 indices in such cases. We try to avoid > requiring particular names when we can avoid it with indirection. Same problem as with time. You simply have to know which dimension is x, y and z, if you want to reference the data to the earth. Clearly these are part of the nav structure, and have to be clarified. Thinking of the recent radar example, which was using spherical coordinates, perhaps you want to define nav variables "dimen1", "dimen2", etc. whose values are the names of the dimensions. Its ok with me if you want to stick with "x_dim", "y_dim", and add "z_dim" to accomplish this, but you also have to say what order the projection function expects its argument. So lets just define x,y,z as the order in which the projection function expects its argument. A related GRIB question: knowing the GRIB nav structure, how do you find out what the world coords are for each grid cell? Is there a standard library, a lookup table or what? Because whatever technique you use, there is probably an implicit ordering of the coordinates that you just have to know. So lets make it explicit (by convention), so that the file is truly self-describing. ... > > Ultimately, if a great data fusion framework or "killer" applications are > developed that require an FGDC parameterization, data will be put in that > form, but right now NUWG feels that most of our constituency is more > familiar with the GRIB georeferencing, and we already are getting huge > volumes of data that use it. > > When everyone is more familiar with it, we should consider whether it's a > good candidate for an acceptable value for our "nav_model" variable. But > right now it seems somewhat incomplete and GIS- and USGS-centric. For > example, GRIB and BUFR are not mentioned, but DEM (USGS Digital Elevation > Model format) and DTED (Digital Terrain Elevation Data format) are. But on > page 39 netCDF is included, so maybe it's complete enough :-). > The FGDC "Spatial Reference Information", (section 4) seems to be the relevent subset for "nav" information. I would tentatively suggest we ignore the rest, including section 6 that lists data formats such as netCDF etc. There are a lot of fields in section 4 I dont completely understand. It has some satellite projections (Landsat), complete Map Projections, but apparently no radar spherical projections. Apparently it describes both the grid and the projection, but the projections are parameterized, and thus describe classes of projections. If I read Stackpole's GRIB document right, the grids described are fixed and not parameterized. (Does anyone understand differently?). If so, it cant be used for general georefrenced data. The FGDC clearly parameterizes their projections, and also seperates the grid description from the projection. I would prefer that the grid be described by coordinate and coordinate reference variables, and the projection only be described by the nav variable. However, if in order to support FGDC we had to put also the grid description into FGDC form, I think I could live with it. Since I'm not sure of the details of implementing the FGDC, its possible that the coordinate and coordinate reference variables might even satisfy what the FGDC means by "content standards for metadata". In summary, I think my proposals fall into three categories: 1) Make a general definition of how to create "coordinate reference variables", of which the (ref_time, valid_time) and hybrid vertical coordinates are instances. 2) Add enough conventions to make generic applications be able to georeference gridded data. 3) Seperate the grid description from the projection description. Parameterize the projection description and encode in the nav structure. use coordinate reference variables to describe the grid. I think that the nuwg intends to do the first two, and perhaps my ideas are merely to clarify that. The third is somewhat different from what exists now. To: caron@dilbert.acd.ucar.edu (John Caron) Cc: nuwg@comet.ucar.edu Subject: Re: Navigation Information Query Organization: UCAR Unidata Program Date: Thu, 27 Apr 1995 09:51:26 -0600 From: Russ Rew John, Before considering the rest of your proposal, I have to correct what I think may be a misunderstanding: You wrote: > ... Looking at Stackpole's summary of GRIB grids, I can > see how tempting it is to use their system. After all, there it all is, > nicely packaged and described. All the work is apparently done for > you. Its glaring weakness is that it describes very specific grids, rather > than a family of grids based on user-settable parameters. Such a family > would be exactly what I mean by a projection function. Note that the > projection function maps any x,y in the projection plane to lat,lon, so it > can handle any grid topology. The number of projections in common use is > reasonably small, but the number of different possible grids is infinite. and in a second posting: > If I read Stackpole's GRIB document right, the grids described are fixed and > not parameterized. (Does anyone understand differently?). If so, it cant be > used for general georefrenced data. The FGDC clearly parameterizes their > projections, and also seperates the grid description from the projection. I > would prefer that the grid be described by coordinate and coordinate > reference variables, and the projection only be described by the nav > variable. However, if in order to support FGDC we had to put also the grid > description into FGDC form, I think I could live with it. Since I'm not > sure of the details of implementing the FGDC, its possible that the > coordinate and coordinate reference variables might even satisfy what the > FGDC means by "content standards for metadata". Stackpole's summary of GRIB grids copies the WMO standard specification of a GRIB Grid Description Section (GDS) and adds some particular frequently-used NMC-specific grids that NMC identifies with a small integer grid ID. The GDS provides a way to describe infinite families of grids based on user-settable parameters. It's the parameterizations in the GDS for GRIB that I intended to use for grid descriptions. That's one of the reasons I stated at our last meeting that I wanted to change our primary reference to the WMO GRIB document rather than Stackpole's version of it with all the NMC-specific additions. The netCDF files we will produce from GRIB products have a grid parameterization based on the WMO GRIB GDS, even when an NMC grid ID is also available. There are some "International Exchange Grids" for which grid IDs have been assigned that are independent of the originating center, but my GRIB decoder always manufactures a GDS parameterization for these even when it's not sent with the GRIB product. Also for any of the infinite number of non-cataloged grids, a grid ID of 255 is used which specifically means the grid is specified in the GDS. As far as I can see, the GRIB GDS is very similar to the FGDC Content Standards for Geospatial Data parameterizations for grids. Each parameterization includes some families of grids not included by the other, but they both include all common projections. The GRIB1 GDS standard includes some unique parameterizations, such as the "quasi-regular" grids that are arguably not even grids, but they are already so commonly used and that we have to deal with them. If you want to see an on-line version of the WMO GRIB1 GDS specification look at: ftp://nic.fb4.noaa.gov/pub/nws/nmc/docs/gribguide/guide.txt which is a version of Stackpole's document without most of the NMC-specific additions. I don't know of any on-line copy of the original WMO standard. This is all independent of your suggestion to separate the description of the grid from the description of the projection. I'm still looking at that ... --Russ Organization: NCAR ACD From: caron@dilbert.acd.ucar.edu (John Caron) Subject: Re: Navigation Information Query To: russ@unidata.ucar.edu (Russ Rew) Date: Thu, 27 Apr 1995 10:31:20 -0600 (MDT) Cc: nuwg@comet.ucar.edu Thanks for clarifying that, that makes GRIB conventions infinitely more useable. Do you have an answer to this: > A related GRIB question: knowing the GRIB nav structure, how do you find out what the > world coords are for each grid cell? Is there a standard library, a lookup table or what? > Because whatever technique you use, there is probably an implicit ordering of the > coordinates that you just have to know. So lets make it explicit (by convention), so that Organization: UCAR Unidata Program From: Russ Rew To: caron@dilbert.acd.ucar.edu (John Caron) Cc: nuwg@comet.ucar.edu Subject: Re: Navigation Information Query Date: Thu, 27 Apr 1995 11:38:07 -0600 John, > A related GRIB question: knowing the GRIB nav structure, how do you find > out what the world coords are for each grid cell? Is there a standard > library, a lookup table or what? Because whatever technique you use, > there is probably an implicit ordering of the coordinates that you just > have to know. So lets make it explicit (by convention), so that I don't know of any freely-available standard library for finding out world coordinates based on GRIB1 GDS navigation. Maybe ATD/RDP's zebra package , has dealt with this problem? This is what the mythical "udgeo" library is intended to handle. Given a navigation (in the form of an open netCDF file id and a navigation dimension id), the udgeo library would provide a mapping between netCDF variable indices (in the form of a netCDF variable id and a corresponding vector of indices) and world coordinates (in some convenient form, e.g. UTM coordinates, perhaps plus time). Various convenience functions would permit mapping cross-sections of indices all at once to arrays of world coordinates, much like netCDF hyperslab access. Presumably it could be initialized to understand navigations in other forms than the navigation dimension of an open netCDF file, for example by parsing descriptions from a small navigation language that could be stored as a string. Steve Emmerson has proposed such a language and shown some of its advantages. Development of the udgeo library is on a list of possible future Unidata projects, but plans are not very far along and resources haven't yet been committed. I imagine that large parts of the necessary code are already available as components of USGS Fortran projection libraries, geoTIFF software, various large application packages, etc. We would consider adopting/adapting someone else's library or classes that already solved the problem, if we knew of any good candidates. As an aside, someone should probably be looking at the zebra conventions for storing "Meteorological DataChunk" objects and "Nspace DataChunk" objects in netCDF files, to see how they have dealt with coordinates and georeferencing conventions. --Russ To: caron@dilbert.acd.ucar.edu (John Caron) Cc: nuwg@comet.ucar.edu Subject: Re: More comments on netcdf "gridded" data conventions. Organization: UCAR Unidata Program Date: Mon, 01 May 1995 14:11:47 -0600 From: Russ Rew Still more on georeferenced gridded data conventions, responding to John Caron's response ... > So if I understand you, the convention is to always include a type double > time variable, and optionally other variables that describe the time. So > if I have monthly averaged data, my time variable could be month number, with > units "months"? Then I provide an alternate description which is the month > name? Generic application, would provide both descriptions to the user? > The motivation for this presumably is to define a time ordering of the data? > If these are true, then I agree with this convention. Yes, mostly, except that NUWG time variables are supposed to use units that are understood by the udunits library (see ). "months" is not a legal time unit for udunits, since it can't be converted into an exact number of seconds, like "fortnight" can. I don't think NUWG conventions have dealt with time variables such as "months" for monthly averaged data. I don't think the NUWG conventions should require that you store a month number as a double. In fact since type conversions such as integer to double are more easily handled than units conversions, I agree with you that it doesn't seem necessary to require by convention that time variables must be of type double, but that's currently the NUWG proposal. > > > > 2) Time variables are indexed by another variable (can > > > > be the unlimited dimension) > > I presume this means "you must define a coordinate variable for the time > coordinate"? Within reason, but I can imagine defining a dimension with the purpose of specifying only a time ordering of events, without specifying the time at which each event happened, in which case a time coordinate would be unnecessary. I think the NUWG conventions are intended for certain kinds of meteorological and oceanographic data, but that the NUWG conventions are only appropriate for a subset of all the data the might be stored in netCDF. > > > > 3) The names given to the time variables and the indexing variables > > > > are not subject to convention > > > > > > How do you know which dimension is time? Also, if you want to use > > > "coordinate variables" (often handy), then the variable name must = > > > dimension name, by convention. > > > > Applications that need to know the name of the time variable will have > > to be provided that information as an input, either in a table of > > associated variable names, as an argument, as a clickable selection, > > etc. We decided very early that we wanted NUWG conventions to avoid > > explicit variable names, where possible; the French name for the time > > variable should be acceptable for French data, and applications that > > require specific variable names will have to support a mapping between > > file variable names and names used in the application. Again, more > > restrictive conventions that require explicit variable names should work > > fine with this set of conventions. > > Here I continue to disagree. Time is important enough that we have a few > conventions about it, such as requiring a coordinate variable of type > double. So you know that the generic application needs to figure out > which coordinate is time. So you live in Quebec and know you better > damn well not assume English. So you define a lookup table that specifies > that the real name of the time variable is. The way that you do that is > you specify the keyword "TIME", and the name of the time variable. Ooops! > there we are, using english again. Anyway the point is, you need to tell > a generic application which is the time coordinate, and you might as well > make some convention about it. How about a global attribute named time, > whose value is the name of the time dimension? That way all of the file > is in the native language of the user, and only one place has this > conventional name? (By the way, I dont care if its English, but it has to > be fixed for all users, no matter what language). You already have > English keywords like "data", and "variables" etc in the netcdf file. Good points. I think my example of French vs. English missed the original motivation for not having rigid variable names in the NUWG conventions. NUWG wants multiple existing application packages that already have their own conventions for variable names to be able to access netCDF data. Rather than requiring existing applications and existing data archives to change, NUWG chose to avoid contention about names and just specify that tables would be used to specify variable name mappings. The idea was application-independence, not language-independence. If data from different source files is to be "fused" in an application, a table may be necessary for each input file, since they could use different names for the same variables. One problem with specifying the time variable with a global attribute is that there may be multiple time variables (e.g. the time of the 3-hour forecast for a 12Z model run may be the same as the time of the 12-hour forecast of a a 3Z model run, but these must be distinguished if the data for both are stored together in the same file). The COARDS conventions at similarly state that The names of coordinate variables are not standardized by these conventions (since data sets may in general contain multiple coordinate variables of the same orientation). By the way, the English keywords "data" and "variables" are not stored in the netCDF file and have no special significance in the netCDF library. They are used in the netCDF utilities ncdump and ncgen built on top of the library, but need not be used by data providers conforming to the NUWG conventions. But your point is valid because we have had to sacrifice language-independence in the library in a few other cases, e.g. the reserved attribute name "_FillValue". ... > > navigation_dim can't be a global attribute because we want to be able to > > store variables defined on multiple grids in the same file. For example, > > we want to be able to store a satellite image and model output data for the > > same time in the same netCDF file, requiring two different navigation > > dimensions for two very different sets of georeferencing parameters. > > I propose global attribute ok, with variable override, > since in the majority of files there will be only one navigation_dim. I think that would work OK. ... > > > How do you know what means what? Why not make some naming conventions > > > for the dimensions? > > > > The GRIB1 georeferencing model assumes an "x" and "y" dimension for *some* > > projections, so we require that the netCDF file make clear which netCDF > > indices correspond to which GRIB1 indices in such cases. We try to avoid > > requiring particular names when we can avoid it with indirection. > > Same problem as with time. You simply have to know which dimension is x, y > and z, if you want to reference the data to the earth. Clearly these are > part of the nav structure, and have to be clarified. Thinking of the recent > radar example, which was using spherical coordinates, perhaps you want to > define nav variables "dimen1", "dimen2", etc. whose values are the names of > the dimensions. Its ok with me if you want to stick with "x_dim", "y_dim", > and add "z_dim" to accomplish this, but you also have to say what order the > projection function expects its argument. So lets just define x,y,z as the > order in which the projection function expects its argument. We use "x" and "y" when the GRIB GDS specification uses "x" and "y" (e.g. for polar stereographic projections, parameterized in terms of Nx, Ny, Dx, Dy, ...) and "i" and "j" when the GRIB GDS specification uses "i" and "j" (e.g. latitude/longitude grids, parameterized in terms of Ni, Nj, Di, ...). For a different georeferencing model, we would presumably name the parameters using names specific to a standard description of that model. The use of parameter names from a standard reference avoids specifying their order by a NUWG-specific convention. ... > In summary, I think my proposals fall into three categories: > > 1) Make a general definition of how to create "coordinate reference > variables", of which the (ref_time, valid_time) and hybrid vertical > coordinates are instances. > 2) Add enough conventions to make generic applications be able to > georeference gridded data. > 3) Seperate the grid description from the projection > description. Parameterize the projection description and encode > in the nav structure. use coordinate reference variables to > describe the grid. > > I think that the nuwg intends to do the first two, and perhaps my > ideas are merely to clarify that. The third is somewhat different from what > exists now. Yes, and I think I'm convinced that separation of grid descriptions from projection descriptions can simplify things. However, if we are relying on a standard parameterization such as the GRIB edition 1 standard that doesn't separate these, we end up with a much more complicated task because we then have to develop the mechanisms ourselves instead of pointing to the external reference. I think the FGDC standard doesn't separate these much better than the GRIB parameterization, but I may be wrong about that. If there is a standard for grid descriptions and projections that has a clean separation, then you may be right that the benefits would be worthwhile. --Russ