Re: [thredds] [netcdf-java] GRIB variable name changes in 4.3

To: John Caron <caron@xxxxxxxxxxxxxxxx>
Subject: Re: [thredds] [netcdf-java] GRIB variable name changes in 4.3
From: Don Murray <don.murray@xxxxxxxx>
Date: Tue, 28 Feb 2012 08:45:59 -0700
Authentication-results: mr.google.com; spf=pass (google.com: domain of don.murray@xxxxxxxx designates 10.100.84.20 as permitted sender) smtp.mail=don.murray@xxxxxxxx

From John's responses it seems like this new naming convention has beendecided. Is there any point in more discussion? If it has beendecided, what facilities are going to be provided in the GridDataTypeAPI to look up a variable based on the description and other attributesso someone can ALWAYS get back the same variable. How stable will thedescriptions be or will they always be changing. What are theattributes that can be used to definitively give back the same variableeach time?

- Is there a TDS server that is running with 4.3 that we can look at tosee visually what the changes are in the various output options?

- beyond IDV, this will also affect RAMADDA since names harvested under4.2 will no longer be valid. It will also require programming effortfor the subsetting facilities in RAMADDA to present a more human face.


Don

On 2/27/12 4:51 PM, John Caron wrote:

Hi Don:

On 2/27/2012 3:43 PM, Don Murray wrote:

Hi John and Ethan-

As I have discussed with you at length privately, I am not in favor of
this change. This will break every IDV bundle that points to GRIB data
in a local file or on a TDS server. This will also affect users of the
TDS on the NCDC NOMADS servers who access data either through scripts
or the IDV. It's not a simple matter of users just picking new names
and resaving the bundles when the bundles are stored on remote servers
or used in a classroom setting.


I realize its a deep problem for the IDV, but its also an opportunity to
figure out how to gracefully evolve bundles when things change, which
they do.


Below, for the benefit of the list, are my arguments for using the
human readable variable names in the previous netCDF-Java 4.3 beta
release:

<quote>
I believe keeping the human readable variable names (as in the
previous 4.3 release - with slight modifications) is much preferable
and backward compatible. I understand your reasons for wanting to
change, but while that makes the programmer's life easier, it makes
the user's (and other programmers') life harder.

In the long-term, if we get the fundamentals right, everyone's life gets
easier.


For example, from a user perspective, with your changes, I'm going to
have to modify 50 or more bundles that are on my local machines
(including the NOAA viz wall) or stored on RAMADDA servers which will
take several days. I'm also going to have to modify the customizations
to my IDV parameter tables that I've made over the past 7 years.

From a programmer's perspective, here are the impacts of your changes
to the IDV:

- bundles which use the variable name for lookup
- data aliases used for derived quantities
- parameter aliases used for automatically assigning color tables,
contour intervals and units
- User guide and workshop documentation and examples will need to be
updated

For the past 7 or so years, IDV users have been able to access
realtime GRIB datasets and have had stability in using and
interchanging those datasets. For example, I have a bundle:

http://motherlode.ucar.edu/repository/entry/get/GFS%2080%20km.xidv?entryid=9f77ca66-2264-4f8b-a460-e02fb42606ea


which has displays of 500 hPa geopotential heights, sea level pressure
and precipition from the GFS 80km data. These are simple, commonly
used parameters. The IDV has a DataAlias table that equates the
variable name Geopotential_height with a canonical name of HGT which
is used to present derived quantities to the user of thickness and
geostrophic wind. It also uses this name to assign a color table, unit
and contour levels for any display created for the variable
Geopotential height. Same idea goes for Pressure_reduced_to_MSL and
Total_precipitation. It doesn't matter whether I go to the GFS 80 km
(grib1) or the GFS .5 degree global (grib2), or even a NAM 80km
dataset. I can apply the bundle and use the same information to get
the same type of display.

Under the scheme in the previous version of 4.3beta,
Geopotential_height will change to Geopotential_height_Pressure,
Pressure_reduced_to_MSL will change to Pressure_reduced_to_MSL_Msl and
Total_precipitation will change to one of:

Total_precipitation_Surface_12_Hour_Accumulation
Total_precipitation_Surface_1_Hour_Accumulation
Total_precipitation_Surface_3_Hour_Accumulation
Total_precipitation_Surface_6_Hour_Accumulation
Total_precipitation_Surface_Mixed_intervals_Accumulation

From the IDV perspective, the DataAlias and ParameterDefaults use
patterns and case insensitive, so this should not be a problem because
the old names would match into the new names. For the bundles, this
will be problem, but one that can be dealt with on the IDV or
netCDF-Java side with a paramater lookup as discussed at the recent
IDV Developers teleconference and which is outlined from the IDV
perspective here:

https://mcidasv.ssec.wisc.edu/issues/11

With the new naming:

VAR_%d-%d-%d[_error][_L%d][_layer][_I%s_S%d][_D%d][_Prob_%s]

The three variables would have different names depending on whether
they came from a grib1 or grib2 dataset. This would require the
Unidata IDV programmers to redo all the alias and parameter default
tables and require a more complicated lookup just to find the 500 hPa
geopotential height, sea level pressure and total_precipitation field
depending on the dataset used. I think providing consistency between
grib1 and grib2 datasets at the very least is an important
consideration - in the end, it's all GRIB. GEMPAK and McIDAS (as well
as wgrib2 and NCL) create the same names for their variables
independent of whether they came from Grib1 or 2.

There is simply no way to maintain grib1 and grib2 name compatibility,
because of the table-driven nature of GRIB, and the fact that they use
different tables.

Again, along with the problem, its also an opportunity to rethink how
the aliases and color tables etc are done. Its possible I can add other
attributes that will make this easier.

I do apologize for this fiasco. Ive just spent most of the last 4-6
months trying to dig our way out of this hole.


I fully support the notion of adding in the level information to the
variable name as is the case for Geopotential_height. I know for
variables like Temperature in the 4.2 scheme can provide different
results depending on whether your grib files had a mixture of 2D and
3D varaibles (Temperature = the one on pressure levels) or just 2D
variables (Temperature = the one on height above ground level). I
understand the problems it creates on both the netCDF-Java/TDS side
and sometimes the IDV side (e.g. creating derived quantities) and
think that this change can be handled pretty well on the IDV side.

I support adding the accumulation interval for parameters like
Total_precipitation above because now some variables have a mixture of
the different types of intervals.

One of your arguments is that over time, names change and it's
difficult to maintain tables. While that may be true for lesser
variables, I would suggest that the most commonly used variable names
rarely change (Temperature, geopotential height, relative humidity, u
and v wind components, etc). Unidata has always been in the business
of maintaining tables and that's part of the job it does to support
the user community. While it's not easy, it is a necessary function of
the services that Unidata provides. And, changing the names just
pushes the work off to others at Unidata. Perhaps Unidata could look
at having common tables used by all it's software for consistency. Or
perhaps Unidata could work with the NCL group and use their lookup
tables?


We cant maintain tables for all centers. We could try to do so for
just NCEP, but its probably not the right thing to do. It sucks
resources that we dont have. It makes NCEP GRIB files different from
non-NCEP GRIB files. Really, we have to rethink this, not hack in
lookup tables that will never be 100% right.

NCL has adopted a similar variable naming scheme for similar reasons.


In the end, I would like to see the netCDF-Java library evolve to suit
the needs of the data providers, while also maintaining as much
backward compatibility for the end users and software developers who
rely on it. I think a lot of the ancillary information can be provided
through variable attributes as it is in 4.2 (description, table
number, Discipline/Category/Parmeter, GRIB GDS/PDS information) as NCL
does, but leave human readable variable names.
</quote>

Outside the IDV, I have been using the netCDF-Java library in
conjunction with PyNIO to convert grib2 data to netCDF. I use the
human-readable netCDF-Java 4.2 variable names on my output files
instead of the PyNIO names because I believe that the users of my
output would prefer to see those than something like
VAR_0-0-0_L6_I6_Hour_S194.


A very nice (but not unchanging) human readable string is in the
long_name. I understand its a pain to change to using that, but once you
make that change, I think your objections above should be resolved. The
trick will be to have both the long_name and the (unchanging) variable
name.

I'll be glad to work with the IDV team to help wherever I can.

Once again, I apologize for this trouble.

John


--
Don Murray
NOAA/ESRL/PSD and CIRES
303-497-3596
http://www.esrl.noaa.gov/psd/people/don.murray/

Follow-Ups:
- Re: [thredds] [netcdf-java] GRIB variable name changes in 4.3
  - From: Glenn Rutledge

References:
- [thredds] GRIB variable name changes in 4.3
  - From: John Caron
- Re: [thredds] [netcdf-java] GRIB variable name changes in 4.3
  - From: Don Murray
- Re: [thredds] [netcdf-java] GRIB variable name changes in 4.3
  - From: John Caron

2012 messages navigation, sorted by:
1. Thread
2. Subject
3. Author
4. Date
5. ↑ Table Of Contents
Search the thredds archives: