Re: [netcdf-java] [thredds] GRIB variable name changes in 4.3

To: John Caron <caron@xxxxxxxxxxxxxxxx>, THREDDS community <thredds@xxxxxxxxxxxxxxxx>
Subject: Re: [netcdf-java] [thredds] GRIB variable name changes in 4.3
From: Glenn Rutledge <glenn.rutledge@xxxxxxxx>
Date: Tue, 28 Feb 2012 13:43:51 -0500
Authentication-results: mr.google.com; spf=pass (google.com: domain of glenn.rutledge@xxxxxxxx designates 10.224.72.138 as permitted sender) smtp.mail=glenn.rutledge@xxxxxxxx
John and Community-
While I do not represent the NCDC Archive, for the NCDC NOMADS systems and
our users, I must agree that the changes John is proposing will facilitate
the long term use of grib data.  While painful to (existing) client
(software | decoders), the proposed change will allow our users (with a
more scalable way) to -better find and use our data.  I'll suggest that if
this is adopted, NOMADS servers could provide both 4.2 and 4.3 versions to
(give software developers time to adapt) allow the client-side to adapt.

(Existing) Grib (tables) changes very little- once a value is in (a WMO
master table) WMO grib- it stays.  New (additions and local
tables/extensions) variables seem to be the cause of most of these issues.
I agree that WMO needs to (create) have a master registry, a web service,
and a ban on local tables or 'custom entries' (in all but the most extreme
cases) entirely.   NOAA needs to elevate this issue to WMO- far above my
pay grade. Not everything I've read in these exchanges about grib appears
100% accurate IMO.  There are many issues at play here: grib1-to-grib2
conversion problems, bad headers, scarce documentation- among other
issues.  This is where the archive tries to address such issues and provide
value.  Currently, NCDC does not officially  archive the decoders or tables
(we do have them in our subversion repository).  btw- the table version is
stored in the grib metadata. The table version applies to all the tables.
(Recent WMO stance.)  Local tables are indexed by version and site.

What impact this will have I suspect is not yet fully known (It would be
similar to the GRIB1 to GRIB2 transition), and in a sense pushes off this
problem (away from the servers) to the developers/clients.  The NOMADS Team
has a consensuses- the long term benefit here is clear and recommend the
proposed path forward.  However- we don't have the burden of many code
changes either.

When the code is available we could stand up a TDS v.4.3 for parallel
testing.
Glenn R and the NOMADS Team.


On Tue, Feb 28, 2012 at 10:45 AM, Don Murray <don.murray@xxxxxxxx> wrote:

> From John's responses it seems like this new naming convention has been
> decided.  Is there any point in more discussion?  If it has been decided,
> what facilities are going to be provided in the GridDataType API to look up
> a variable based on the description and other attributes so someone can
> ALWAYS get back the same variable.  How stable will the descriptions be or
> will they always be changing.  What are the attributes that can be used to
> definitively give back the same variable each time?
>
> - Is there a TDS server that is running with 4.3 that we can look at to
> see visually what the changes are in the various output options?
>
> - beyond IDV, this will also affect RAMADDA since names harvested under
> 4.2 will no longer be valid.  It will also require programming effort for
> the subsetting facilities in RAMADDA to present a more human face.
>
> Don
>
>
> On 2/27/12 4:51 PM, John Caron wrote:
>
>> Hi Don:
>>
>> On 2/27/2012 3:43 PM, Don Murray wrote:
>>
>>> Hi John and Ethan-
>>>
>>> As I have discussed with you at length privately, I am not in favor of
>>> this change. This will break every IDV bundle that points to GRIB data
>>> in a local file or on a TDS server. This will also affect users of the
>>> TDS on the NCDC NOMADS servers who access data either through scripts
>>> or the IDV. It's not a simple matter of users just picking new names
>>> and resaving the bundles when the bundles are stored on remote servers
>>> or used in a classroom setting.
>>>
>>
>> I realize its a deep problem for the IDV, but its also an opportunity to
>> figure out how to gracefully evolve bundles when things change, which
>> they do.
>>
>>
>>> Below, for the benefit of the list, are my arguments for using the
>>> human readable variable names in the previous netCDF-Java 4.3 beta
>>> release:
>>>
>>> <quote>
>>> I believe keeping the human readable variable names (as in the
>>> previous 4.3 release - with slight modifications) is much preferable
>>> and backward compatible. I understand your reasons for wanting to
>>> change, but while that makes the programmer's life easier, it makes
>>> the user's (and other programmers') life harder.
>>>
>> In the long-term, if we get the fundamentals right, everyone's life gets
>> easier.
>>
>>
>>
>>> For example, from a user perspective, with your changes, I'm going to
>>> have to modify 50 or more bundles that are on my local machines
>>> (including the NOAA viz wall) or stored on RAMADDA servers which will
>>> take several days. I'm also going to have to modify the customizations
>>> to my IDV parameter tables that I've made over the past 7 years.
>>>
>>> From a programmer's perspective, here are the impacts of your changes
>>> to the IDV:
>>>
>>> - bundles which use the variable name for lookup
>>> - data aliases used for derived quantities
>>> - parameter aliases used for automatically assigning color tables,
>>> contour intervals and units
>>> - User guide and workshop documentation and examples will need to be
>>> updated
>>>
>>> For the past 7 or so years, IDV users have been able to access
>>> realtime GRIB datasets and have had stability in using and
>>> interchanging those datasets. For example, I have a bundle:
>>>
>>> http://motherlode.ucar.edu/**repository/entry/get/GFS%2080%**
>>> 20km.xidv?entryid=9f77ca66-**2264-4f8b-a460-e02fb42606ea<http://motherlode.ucar.edu/repository/entry/get/GFS%2080%20km.xidv?entryid=9f77ca66-2264-4f8b-a460-e02fb42606ea>
>>>
>>>
>>> which has displays of 500 hPa geopotential heights, sea level pressure
>>> and precipition from the GFS 80km data. These are simple, commonly
>>> used parameters. The IDV has a DataAlias table that equates the
>>> variable name Geopotential_height with a canonical name of HGT which
>>> is used to present derived quantities to the user of thickness and
>>> geostrophic wind. It also uses this name to assign a color table, unit
>>> and contour levels for any display created for the variable
>>> Geopotential height. Same idea goes for Pressure_reduced_to_MSL and
>>> Total_precipitation. It doesn't matter whether I go to the GFS 80 km
>>> (grib1) or the GFS .5 degree global (grib2), or even a NAM 80km
>>> dataset. I can apply the bundle and use the same information to get
>>> the same type of display.
>>>
>>> Under the scheme in the previous version of 4.3beta,
>>> Geopotential_height will change to Geopotential_height_Pressure,
>>> Pressure_reduced_to_MSL will change to Pressure_reduced_to_MSL_Msl and
>>> Total_precipitation will change to one of:
>>>
>>> Total_precipitation_Surface_**12_Hour_Accumulation
>>> Total_precipitation_Surface_1_**Hour_Accumulation
>>> Total_precipitation_Surface_3_**Hour_Accumulation
>>> Total_precipitation_Surface_6_**Hour_Accumulation
>>> Total_precipitation_Surface_**Mixed_intervals_Accumulation
>>>
>>> From the IDV perspective, the DataAlias and ParameterDefaults use
>>> patterns and case insensitive, so this should not be a problem because
>>> the old names would match into the new names. For the bundles, this
>>> will be problem, but one that can be dealt with on the IDV or
>>> netCDF-Java side with a paramater lookup as discussed at the recent
>>> IDV Developers teleconference and which is outlined from the IDV
>>> perspective here:
>>>
>>> https://mcidasv.ssec.wisc.edu/**issues/11<https://mcidasv.ssec.wisc.edu/issues/11>
>>>
>>> With the new naming:
>>>
>>> VAR_%d-%d-%d[_error][_L%d][_**layer][_I%s_S%d][_D%d][_Prob_%**s]
>>>
>>> The three variables would have different names depending on whether
>>> they came from a grib1 or grib2 dataset. This would require the
>>> Unidata IDV programmers to redo all the alias and parameter default
>>> tables and require a more complicated lookup just to find the 500 hPa
>>> geopotential height, sea level pressure and total_precipitation field
>>> depending on the dataset used. I think providing consistency between
>>> grib1 and grib2 datasets at the very least is an important
>>> consideration - in the end, it's all GRIB. GEMPAK and McIDAS (as well
>>> as wgrib2 and NCL) create the same names for their variables
>>> independent of whether they came from Grib1 or 2.
>>>
>> There is simply no way to maintain grib1 and grib2 name compatibility,
>> because of the table-driven nature of GRIB, and the fact that they use
>> different tables.
>>
>> Again, along with the problem, its also an opportunity to rethink how
>> the aliases and color tables etc are done. Its possible I can add other
>> attributes that will make this easier.
>>
>> I do apologize for this fiasco. Ive just spent most of the last 4-6
>> months trying to dig our way out of this hole.
>>
>>
>>
>>> I fully support the notion of adding in the level information to the
>>> variable name as is the case for Geopotential_height. I know for
>>> variables like Temperature in the 4.2 scheme can provide different
>>> results depending on whether your grib files had a mixture of 2D and
>>> 3D varaibles (Temperature = the one on pressure levels) or just 2D
>>> variables (Temperature = the one on height above ground level). I
>>> understand the problems it creates on both the netCDF-Java/TDS side
>>> and sometimes the IDV side (e.g. creating derived quantities) and
>>> think that this change can be handled pretty well on the IDV side.
>>>
>>> I support adding the accumulation interval for parameters like
>>> Total_precipitation above because now some variables have a mixture of
>>> the different types of intervals.
>>>
>>> One of your arguments is that over time, names change and it's
>>> difficult to maintain tables. While that may be true for lesser
>>> variables, I would suggest that the most commonly used variable names
>>> rarely change (Temperature, geopotential height, relative humidity, u
>>> and v wind components, etc). Unidata has always been in the business
>>> of maintaining tables and that's part of the job it does to support
>>> the user community. While it's not easy, it is a necessary function of
>>> the services that Unidata provides. And, changing the names just
>>> pushes the work off to others at Unidata. Perhaps Unidata could look
>>> at having common tables used by all it's software for consistency. Or
>>> perhaps Unidata could work with the NCL group and use their lookup
>>> tables?
>>>
>>
>> We cant maintain tables for all centers. We could try to do so for
>> just NCEP, but its probably not the right thing to do. It sucks
>> resources that we dont have. It makes NCEP GRIB files different from
>> non-NCEP GRIB files. Really, we have to rethink this, not hack in
>> lookup tables that will never be 100% right.
>>
>> NCL has adopted a similar variable naming scheme for similar reasons.
>>
>>
>>
>>> In the end, I would like to see the netCDF-Java library evolve to suit
>>> the needs of the data providers, while also maintaining as much
>>> backward compatibility for the end users and software developers who
>>> rely on it. I think a lot of the ancillary information can be provided
>>> through variable attributes as it is in 4.2 (description, table
>>> number, Discipline/Category/Parmeter, GRIB GDS/PDS information) as NCL
>>> does, but leave human readable variable names.
>>> </quote>
>>>
>>> Outside the IDV, I have been using the netCDF-Java library in
>>> conjunction with PyNIO to convert grib2 data to netCDF. I use the
>>> human-readable netCDF-Java 4.2 variable names on my output files
>>> instead of the PyNIO names because I believe that the users of my
>>> output would prefer to see those than something like
>>> VAR_0-0-0_L6_I6_Hour_S194.
>>>
>>
>> A very nice (but not unchanging) human readable string is in the
>> long_name. I understand its a pain to change to using that, but once you
>> make that change, I think your objections above should be resolved. The
>> trick will be to have both the long_name and the (unchanging) variable
>> name.
>>
>> I'll be glad to work with the IDV team to help wherever I can.
>>
>> Once again, I apologize for this trouble.
>>
>> John
>>
>>
> --
> Don Murray
> NOAA/ESRL/PSD and CIRES
> 303-497-3596
> http://www.esrl.noaa.gov/psd/**people/don.murray/<http://www.esrl.noaa.gov/psd/people/don.murray/>
>
> ______________________________**_________________
> thredds mailing list
> thredds@xxxxxxxxxxxxxxxx
> For list information or to unsubscribe,  visit:
> http://www.unidata.ucar.edu/**mailing_lists/<http://www.unidata.ucar.edu/mailing_lists/>
>



-- 
Glenn K. Rutledge
Meteorologist/Physical Scientist
NOMADS Team Leader
National Climatic Data Center
Asheville, NC 28801
(828) 271-4097
nomads.ncdc.noaa.gov
Follow-Ups:
- Re: [netcdf-java] [thredds] GRIB variable name changes in 4.3
  - From: Don Murray
References:
- [netcdf-java] GRIB variable name changes in 4.3
  - From: John Caron
- Re: [netcdf-java] GRIB variable name changes in 4.3
  - From: Don Murray
- Re: [netcdf-java] GRIB variable name changes in 4.3
  - From: John Caron
- Re: [netcdf-java] GRIB variable name changes in 4.3
  - From: Don Murray
2012 messages navigation, sorted by:
1. Thread
2. Subject
3. Author
4. Date
5. ↑ Table Of Contents
Search the netcdf-java archives: