Re: [netcdf-java] GRIB variable name changes in 4.3

To: John Caron <caron@xxxxxxxxxxxxxxxx>
Subject: Re: [netcdf-java] GRIB variable name changes in 4.3
From: Don Murray <don.murray@xxxxxxxx>
Date: Mon, 27 Feb 2012 15:43:35 -0700
Authentication-results: mr.google.com; spf=pass (google.com: domain of don.murray@xxxxxxxx designates 10.236.79.202 as permitted sender) smtp.mail=don.murray@xxxxxxxx

Hi John and Ethan-

As I have discussed with you at length privately, I am not in favor ofthis change. This will break every IDV bundle that points to GRIB datain a local file or on a TDS server. This will also affect users of theTDS on the NCDC NOMADS servers who access data either through scripts orthe IDV. It's not a simple matter of users just picking new names andresaving the bundles when the bundles are stored on remote servers orused in a classroom setting.

Below, for the benefit of the list, are my arguments for using the humanreadable variable names in the previous netCDF-Java 4.3 beta release:


<quote>

I believe keeping the human readable variable names (as in the previous4.3 release - with slight modifications) is much preferable and backwardcompatible. I understand your reasons for wanting to change, but whilethat makes the programmer's life easier, it makes the user's (and otherprogrammers') life harder.

For example, from a user perspective, with your changes, I'm going tohave to modify 50 or more bundles that are on my local machines(including the NOAA viz wall) or stored on RAMADDA servers which willtake several days. I'm also going to have to modify the customizationsto my IDV parameter tables that I've made over the past 7 years.

From a programmer's perspective, here are the impacts of your changesto the IDV:


 - bundles which use the variable name for lookup
 - data aliases used for derived quantities

- parameter aliases used for automatically assigning color tables,contour intervals and units- User guide and workshop documentation and examples will need to beupdated

For the past 7 or so years, IDV users have been able to access realtimeGRIB datasets and have had stability in using and interchanging thosedatasets. For example, I have a bundle:


http://motherlode.ucar.edu/repository/entry/get/GFS%2080%20km.xidv?entryid=9f77ca66-2264-4f8b-a460-e02fb42606ea

which has displays of 500 hPa geopotential heights, sea level pressureand precipition from the GFS 80km data. These are simple, commonly usedparameters. The IDV has a DataAlias table that equates the variablename Geopotential_height with a canonical name of HGT which is used topresent derived quantities to the user of thickness and geostrophicwind. It also uses this name to assign a color table, unit and contourlevels for any display created for the variable Geopotential height.Same idea goes for Pressure_reduced_to_MSL and Total_precipitation. Itdoesn't matter whether I go to the GFS 80 km (grib1) or the GFS .5degree global (grib2), or even a NAM 80km dataset. I can apply thebundle and use the same information to get the same type of display.

Under the scheme in the previous version of 4.3beta, Geopotential_heightwill change to Geopotential_height_Pressure, Pressure_reduced_to_MSLwill change to Pressure_reduced_to_MSL_Msl and Total_precipitation willchange to one of:


Total_precipitation_Surface_12_Hour_Accumulation
Total_precipitation_Surface_1_Hour_Accumulation
Total_precipitation_Surface_3_Hour_Accumulation
Total_precipitation_Surface_6_Hour_Accumulation
Total_precipitation_Surface_Mixed_intervals_Accumulation

From the IDV perspective, the DataAlias and ParameterDefaults usepatterns and case insensitive, so this should not be a problem becausethe old names would match into the new names. For the bundles, thiswill be problem, but one that can be dealt with on the IDV ornetCDF-Java side with a paramater lookup as discussed at the recent IDVDevelopers teleconference and which is outlined from the IDV perspectivehere:


https://mcidasv.ssec.wisc.edu/issues/11

With the new naming:

VAR_%d-%d-%d[_error][_L%d][_layer][_I%s_S%d][_D%d][_Prob_%s]

The three variables would have different names depending on whether theycame from a grib1 or grib2 dataset. This would require the Unidata IDVprogrammers to redo all the alias and parameter default tables andrequire a more complicated lookup just to find the 500 hPa geopotentialheight, sea level pressure and total_precipitation field depending onthe dataset used. I think providing consistency between grib1 and grib2datasets at the very least is an important consideration - in the end,it's all GRIB. GEMPAK and McIDAS (as well as wgrib2 and NCL) create thesame names for their variables independent of whether they came fromGrib1 or 2.

I fully support the notion of adding in the level information to thevariable name as is the case for Geopotential_height. I know forvariables like Temperature in the 4.2 scheme can provide differentresults depending on whether your grib files had a mixture of 2D and 3Dvaraibles (Temperature = the one on pressure levels) or just 2Dvariables (Temperature = the one on height above ground level). Iunderstand the problems it creates on both the netCDF-Java/TDS side andsometimes the IDV side (e.g. creating derived quantities) and think thatthis change can be handled pretty well on the IDV side.

I support adding the accumulation interval for parameters likeTotal_precipitation above because now some variables have a mixture ofthe different types of intervals.

One of your arguments is that over time, names change and it's difficultto maintain tables. While that may be true for lesser variables, Iwould suggest that the most commonly used variable names rarely change(Temperature, geopotential height, relative humidity, u and v windcomponents, etc). Unidata has always been in the business ofmaintaining tables and that's part of the job it does to support theuser community. While it's not easy, it is a necessary function of theservices that Unidata provides. And, changing the names just pushes thework off to others at Unidata. Perhaps Unidata could look at havingcommon tables used by all it's software for consistency. Or perhapsUnidata could work with the NCL group and use their lookup tables?

In the end, I would like to see the netCDF-Java library evolve to suitthe needs of the data providers, while also maintaining as much backwardcompatibility for the end users and software developers who rely on it.I think a lot of the ancillary information can be provided throughvariable attributes as it is in 4.2 (description, table number,Discipline/Category/Parmeter, GRIB GDS/PDS information) as NCL does, butleave human readable variable names.

</quote>

Outside the IDV, I have been using the netCDF-Java library inconjunction with PyNIO to convert grib2 data to netCDF. I use thehuman-readable netCDF-Java 4.2 variable names on my output files insteadof the PyNIO names because I believe that the users of my output wouldprefer to see those than something like VAR_0-0-0_L6_I6_Hour_S194.


Don Murray


On 2/27/12 11:17 AM, John Caron wrote:

To all:

The CDM / netCDF-Java library version 4.3 (and also TDS version 4.3) is
considering a radical change in the way that GRIB variables are named.
Instead of nice human readable names like

float Temperature(time=1, lat=361, lon=720);

they are now like

float VAR_0-0-0_L6_I6_Hour_S194(time=1, lat=361, lon=720);

with "human readable names" in the long_name:

:long_name = "Temperature (6_Hour Average) @ Maximum wind level";

The reasons for this change are that the "nice human readable names"
come from external GRIB tables, that is, the names are not in the files
themselves. GRIB table parameter names have no requirement to be unique
nor simple nor unchanging, i.e. they have no requirement to be suitable
as netCDF variable names. Maintainers of GRIB tables often make minor
changes to GRIB names, correcting typos or otherwise improving the
readability of the name. In some cases, the GRIB names are completely
changed. When the CDM starts to use new versions of the tables, the
variable names can and do change. Since calls to access data use the
name of the variable, many things break if the name changes.

Any GRIB to netCDF translation software is in the position of either
hand-maintaining the tables to prevent names from changing (and fixing
duplicates or unsuitable names), or doing something else. Hand
maintaining GRIB tables is not a viable option due to resource
constraints. The something else is to give variables unique names based
only on the information actually in the file. The NCL package has
adopted a similar scheme:

http://www.ncl.ucar.edu/Document/Manuals/Ref_Manual/NclFormatSupport.shtml#GRIB


More background on this problem is here:

http://www.unidata.ucar.edu/staff/caron/papers/GRIBarchivals.pdf

Another aspect of this problem is that errors were found in version 4.2
with GRIB tables, with handling GRIB time intervals and ensemble data,
as well as with the algorithm for generating names when multiple
variables from the same parameter are in the same file. About 1 in 5
variable names (in the NCEP IDD data) need to change to fix these
problems. In reviewing how variable names are created, and how GRIB
tables are handled, these other problems became clear. Rather than
fixing the problem piecemeal, we are trying to make one big change all
at once, then do our best to not let this happen again.

The main impact this will have is probably on:
1) scripts or IDV bundles that have a GRIB variable name in them;
hopefully a one-time change will fix this.
2) interactive applications that are built on top of the CDM. For GRIB,
users will need to see the long_name, not the variable name, to know
what they want. However, the CDM presents a uniform interface for all
files, not just GRIB, so the application can't assume that the long_name
is even present. So the application should present both the variable
name and the long_name (if it exists) to the user when selecting variables.

We think that this change, though painful, is a necessary way forward,
but we want to get input from users, and especially application
developers. The latest 4.3 snapshot has these changes, please try it out
and let us know what you think, and how it will affect you. Post your
comments to these email lists so the entire discussion can be public.

thanks

John, Ethan

_______________________________________________
netcdf-java mailing list
netcdf-java@xxxxxxxxxxxxxxxx
For list information or to unsubscribe, visit:
http://www.unidata.ucar.edu/mailing_lists/


--
Don Murray
NOAA/ESRL/PSD and CIRES
303-497-3596
http://www.esrl.noaa.gov/psd/people/don.murray/

Follow-Ups:
- Re: [netcdf-java] GRIB variable name changes in 4.3
  - From: John Caron

References:
- [netcdf-java] GRIB variable name changes in 4.3
  - From: John Caron

2012 messages navigation, sorted by:
1. Thread
2. Subject
3. Author
4. Date
5. ↑ Table Of Contents
Search the netcdf-java archives: