Hi Gilbert, Ben, et. al.,
First, there were problems with the NEXRAD2 feed out of idd.unidata.ucar.edu
yesterday. That problem should have been resolved yesterday evening.
re:
>Thanks, Ben, for the legwork. Keep checking...I have a hunch someone's
>clock is off and they're spitting out a duplicate and late feed, or
>there's bad latency somewhere.
I agree with the possibility that the machines running the LDM at one
or more Nexrads could have clocks that are off. As far as the Nexrad
spitting out duplicate products, however, I feel that this is
unlikely. What is more likely is that there is a "recirculation" of
some NEXRAD2 products in the IDD, and the products with very high
latencies are ones that had been sent previously but were no longer in
a relay node's LDM queue (the duplicate product detection and rejection
relies on a product still being in the LDM queue).
>I can't check UNIDATA's latency graphs
>because they are down right now with "500" errors on their website:
>
>http://www.unidata.ucar.edu/cgi-bin/rtstats/topoindex?tree
>
>Then click on "NEXRAD2" and wait forever for the 500/misconfiguration
>error to show.
We will look into this.
>So here's a request for LDM-6.7.2 and all future versions: That upon an
>ldmadmin start, by default (though it can be turned off, in, say
>ldmadmin-pl.conf, with a stern warning that this is a bad idea, but can
>accomodate those having serious issues that they cannot rectify at the
>time), it pings a NIST time server(s). If the server clock is off
>more than 10 seconds, the LDM aborts and returns with a message that your
>clock is X hours, minutes and seconds off, and needs to be adjusted.
>Additionally, as a secondary option, it should check once a day at a time
>you determine that it is less than 10 seconds (or under a minute) off. If
>it is, the LDM can fire off an email message to root (or other user(s)),
>letting you know the problem. I'll be happy to test it. A commercial
>group of radar-display programs will exit with an error message if your
>clock is off by 10 seconds or more. I really like that, too. If your
>radar images or data are being delayed by clocks that are off, that's a
>very bad thing(tm).
I am not sure that this is a good idea especially the LDM aborting part.
Even if one's clock is off, products will still flow eventually. Having
the LDM exit/not start would preclude one from getting products altogether.
What is better yet is the local LDM administrator reviewing the contents
of their LDM log files periodically. When the clock on the upstream
is off in the future, products inserted in its LDM queue will trigger
log messages noting that the product is in the future.
>There's no excuse not to run NTP on *nix, or do an "ntpdate" in crontab
>once or twice a day if you are running an LDM or display software.
Setting up and running NTP is so easy, that there is no reason not
to use this on every machine on which the LDM is running. ntpdate
works, but it is a poor substitute for ntpd.
>Also, let me be clear: I am not saying that *this* is the problem we're
>seeing, but it would help make this issue easier to solve.
Agreed.
>Back to your regularly scheduled panic attack, already in progress.
:-)
Cheers,
Tom
--
+----------------------------------------------------------------------------+
* Tom Yoksas UCAR Unidata Program *
* (303) 497-8642 (last resort) P.O. Box 3000 *
* yoksas@xxxxxxxxxxxxxxxx Boulder, CO 80307 *
* Unidata WWW Service http://www.unidata.ucar.edu/*
+----------------------------------------------------------------------------+