Re: [bufrtables] Summary report on the suitability of GRIB/BUFR for archiving data

Hello Everyone,

To expand on some previous points...

   * The WMO does maintain machine-readable versions of the tables, for
     both BUFR and GRIB, at
     http://www.wmo.int/pages/prog/www/WMOCodes/TDCFtables.html  Thanks
     to Paolo for pointing this out in his earlier email.  Different
     versions of the tables can be downloaded from this site.  For BUFR
     at least, table version 13 is a superset of all previous versions,
     so it can be used to decode BUFR messages from all previous
     versions.  This is no longer true beginning with version 14, where
     it's possible that some deprecated items have now been removed or
     that some descriptor characteristics have been modified from
     previous versions of the table.  So for BUFR, decoding centers
     should maintain copies of tables 13, 14 and onward in whatever
     format(s) are required by their local processing software.
   * John rightly points out that the proper use of this table version
     number by message originators would eliminate the problem outlined
     in his paper.  In my experience, this problem stems mostly from a
     casual attitude by originators in ensuring they've used the proper
     version number in their messages.  Many originators use software
     where this number is often hardcoded and so becomes, at best, an
     afterthought.  This is an education issue that WMO is working hard
     to address among its members.
   * There's also a concerted effort among members to develop BUFR
     templates for certain types of commonly-reported data such as
     SYNOP, BUOY, TEMP/PILOT, CLIMAT, etc.  This is a by-product of
     WMO's ongoing migration from these old alphanumeric fixed-field
     formats to BUFR.  The list of templates is available at
     
http://www.wmo.int/pages/prog/www/WMOCodes/TemplateExamples.html#Regulations
     , and while their use isn't mandatory, it does make things a lot
     simpler for downstream codes which have to interpret the decoded
     output, and which is another point that John made in his paper.
   * If anyone has a requirement for a new BUFR or GRIB2 descriptor
     which they feel would be reasonable to propose as a new official
     WMO descriptor (vs. just using their own local descriptor number),
     please let me know.  I represent the U.S. to the WMO codes group
     which reviews and approves these types of requests.  Depending on
     the nature of the request, there are fast-track procedures
     available which can lead to formal approval within a matter of 2-3
     months.
   * As originally envisioned, BUFR and GRIB2 weren't designed to be
     formats for archive storage, but rather for efficient real-time
     exchange of meteorological data.  Nevertheless, this doesn't mean
     they can't be used as archive formats.  We do this here at NCEP,
     and the approach we use involves storing a copy of the applicable
     table with each archived dataset.  Note that, for BUFR at least,
     table information can be encoded into BUFR messages using
     descriptors from Class 0 of Table B.  When this is done, the
     necessary table information can be easily retained alongside the
     data in a very compact and efficient manner, using one or two
     additional BUFR messages at the head of each archived file.  Such
     an approach could even be used when exchanging real-time data sets
     between centers, at the cost of one or two additional BUFR
     messages.  This would eliminate the problem of receiving centers
     having to "guess" whether the table version number in each data
     message was encoded properly.
   * In my opinion, when everyone follows the rules (e.g. using
     official descriptors with proper table version numbers), the
     process works very well.  The trick of course is to get everyone
     (and their software) to pay attention to the rules.  But this is
     true of any format and is not unique to BUFR and GRIB.

With best regards,
-Jeff

On 3/29/2011 6:26 AM, Enrico Zini wrote:
[resending because the first attempt apparently did not make it to the
list]

On Wed, Mar 09, 2011 at 12:10:31PM -0700, John Caron wrote:

Apologies for the long hiatus on this list.

I have written a  brief report about BUFR/GRIB with a (possibly
controversial) recommendation. Feel free to forward to anyone who
might be interested.

http://www.unidata.ucar.edu/staff/caron/bufr/Summary.html
Hello,

from the experience[1][2][3] I have with BUFR messages, I see a few
problems with your proposal:

  1. it would imply that BUFR decoding can only happen when/where there
     is network connectivity and the central server is working. I am not
     comfortable in tying a long lived archive to the existance of a 3rd
     party server;
  2. alternatively, the archive needs to store and maintain up to date an
     entire mirror of all the tables mentioned by all the BUFRs it
     contains, and that more or less what we already have, barring the
     proposal to standardise a file format for storing tables.
     But if you retrofit the system that we have now with a standard file
     format for tables and a working central repository, you basically
     fix it without the need for hash codes;
  3. 16bits (0-65535) are imo not that big a hash space: when you allow
     everyone to create new tables at will, things may degenerate
     quickly.

But the biggest problem I have is this: you do need to maximise reuse of
BUFR table codes, otherwise the problem of making sense of the decoded
data is not machine computable anymore.

I am maintaining software that not only decodes BUFR bulletins, but also
tries to make sense of them: for example, it can understand that a given
decoded value is a temperature, that it is sampled at a given vertical
level and that it went through a given kind of statistical processing.
That is, it can decode a bulletin and say:

   "There is a temperature reading at 2 meters above ground, maximum over
   12 hours."

This interpreted information can be used by meteorologists without
having to be aware that temperatures can come as B12001, B12101, B12111,
B12112, B12114..B12119 or what else. Where I work, the possibility to do
this is considered a very valuable resource, as it allows to uniformly
compare readings from different sources.

If you have a process where data sharing across centers has to use some
well standardised, well known tables (as well as some reasonable
standards, or even just practices, for laying out BUFR templates), you
can code (I have coded) that sort of interpretation in software. If
instead anyone can at any point start distributing BUFRs that can use
any B code they want to represent temperature, then the only way to make
sense of a decoded bulletin is to have it personally read by an
experienced meteorologist.

Even if you don't want machine interpretation of the bulletins, if the
lifetime of the archive is long enough then its data can potentially
outlive the availability of experienced meteorologists who can remember
how to make sense of them.

To have a long lived archive, IMO what is needed are pervasive
standards, stable over time. Instead of designing for chaos, I'd rather
see how to make coordination work: propose a standard file format for
distributing tables; propose the creation of a repository where to
download the WMO standard table; propose a process for submission of new
table entries, akin to what happens with submissions of new code points
to UTF-8, or new locales to ISO. My feeling is that something like UTF-8
is more like the kind of thing to model BUFR tables on.

Of course chaos should still be supported, because scientists have to
have full freedom of experimentation. But there are already local table
numbers that can be used for that, and after the experiments are
successful the new entries can be submitted to a new version of the
shared tables, so that the shared language can grow.


[1] http://www.arpa.emr.it/dettaglio_documento.asp?id=2927&idlivello=64
[2] http://www.arpa.emr.it/dettaglio_documento.asp?id=514&idlivello=64
[3] http://www.arpa.emr.it/dettaglio_documento.asp?id=1172&idlivello=64

Ciao,

Enrico