[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[TIGGE #UHK-328521]: Corrupted product at ECMWF via LDM ?



Manuel, et. al.,

Many apologies for not responding to your inquiry.  Your original message
came at a strange time for us: during our training workshop series which
was followed by vacations by some of us.

> Don't worry, the faulty GRIB was not at encoding. Doug sent his version
> of the field and it was correct (same as yours).

This implies that the field was created correctly at CMA and transferred
correctly by the LDM from CMA to NCAR.  It is most likely, therefore,
that the field was correctly transmitted to ECMWF.  It would seem that
any problems with the GRIB message in particular would have occurred
locally either because of a problem with the local LDM queue or in the
processing of the product locally.  Over the 12 years that the LDM has
been in use in the Unidata community, we have had only a couple of reports
of corrupted data products.  In both cases, the problem turned out to be
some problem local to the receiving machine, not with the LDM.  The main
reason for this is that the LDM does not do anything with the contents
of a product other than send them as a byte stream from the upstream to
the downstream.  The processing of the product relies on the underlying
TCP transport working correctly, which it should given the error detection
schemes in place.

> Therefore, I have only two candidates for causing such corruption:
> either LDM or the disk. Corruption could have happened when copying data
> from the network into the LDM product queue or copying data from the LDM
> product queue to disk. We scanned the logs but couldn't find any problem
> with the disk subsystem...

It is feasible that there was some sort of a network glitch (router, etc.)
that cause corruption of a byte in the GRIB message, but this would be highly
unlikely.  It is possible nonetheless.

> I did report the problem to Unidata support
> (address@hidden), but have not heard anything from
> them. I did not get the automatic e-mail that assigns a token for
> incident reports. I copy them this e-mail again.

Again, I apologize for the silence.

Questions:

- Have you seen other instances of corrupted data products since the one
  you reported?

- did you take remedial action to fix/avoid the problem (e.g., remaking
  the LDM queue)

Cheers,

Tom
****************************************************************************
Unidata User Support                                    UCAR Unidata Program
(303) 497-8642                                                 P.O. Box 3000
address@hidden                                   Boulder, CO 80307
----------------------------------------------------------------------------
Unidata HomePage                       http://www.unidata.ucar.edu
****************************************************************************


Ticket Details
===================
Ticket ID: UHK-328521
Department: Support IDD TIGGE
Priority: Emergency
Status: Closed