[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20030828: NNEXRAD Latency?

>From: Gerry Creager N5JXS <address@hidden>
>Organization: Texas A&M University -- AATLT
>Keywords: 200308281358.h7SDw0Ld016206 IDD latency pqact

Hi Gerry,

>I'm still seeing significant latencies on LDM data.  Looks like I'm 
>seeing 30 min-2 hr latencies on NIDS at this time on mesodata.cs.tamu.edu

I just took a look at the real time stats IDS|DDPLUS, FSL3, and
NNEXRAD  latencies mesodata.cs.tamu.edu is reporting back to the UPC:




Aside from what looks like some network dings (on all plots) where the
latencies did increase to greater than 1000, 2000, and 3000 seconds,
the latencies being reported are no where near 30 minute (1800 seconds)
to 2 hours (7200 seconds).  The question is what happened during the
network dings?

>Am I nuts, or is this really happening?  I'm basing my thoughts on the 
>timestamps on the various files.  I can arrange access to mesodata for 
>you if that's needed, to try to resolve this...

The latency reported is the difference between the time a product was
injected into the IDD and the clock time upon receiption on the
receiving machine.

>I'm probably going to get bigbird.tamushsc.edu up today.  I'll bring it 
>up feeding, for now, from atm.geo.nsf.gov, and check on the issues that 
>may arise with feeding from LSU, if any.

I will insure that LSU's relay machine, seistan.srcc.lsu.edu, is setup
to allow you to get as many feeds as they have available.  This will
_not_ include WSI, NLDN, PCWS, CONDUIT, or CRAFT.

>bigbird will be a relay LDM 
>server, not caching data... so I can feed mesodata from that and see if 
>we're able to diminish latencies that way.

OK.  The impression I got from looking at the latency plots listed
above is that things are working reasonably well on mesodata for
all times except when there is some sort of network glitch.

>Right now, my near-real-time radar scripting isn't working because 
>nothing is even close to being current when the scripts run.  So I'm 
>looking for ways to speed up things.

Since the latency plots are indicating that you are receiving the
data in a timely manner, the other possibility is that pqact is
running way behind on getting the products out of the queue.  You
can test this by sending two USR2 signals to pqact to put it into
debug logging mode and then watching output in ~ldm/logs/ldmd.log.
One of the things you will see listed is how long it is taking pqact
to get a product out of the queue.  If this number is high, and I
am betting that it is, you should create a separate pqact process
for the NNEXRAD data.  We run multiple pqact invocations on an internal
machine (dual Athlon 2400+) that is ingesting and processing _all_

exec    "pqact -f NNEXRAD /usr/local/ldm/etc/GEMPAK/pqact.gempak_nexrad"
exec    "pqact -f ANY-NNEXRAD-CRAFT-NIMAGE /local/ldm/etc/GEMPAK/pqact.gempak_de
exec    "pqact -f MCIDAS /local/ldm/etc/GEMPAK/pqact.gempak_images"
exec    "pqact -f WMO /local/ldm/etc/GEMPAK/pqact.gempak_nwx"
exec    "pqact -f WMO|SPARE|CONDUIT /local/ldm/etc/GEMPAK/pqact.gempak_upc"
exec    "pqact -f CRAFT /local/ldm/etc/GEMPAK/pqact.gempak_craft"

What I am suggesting -- if your pqact is running way behind -- is
to add an exec line for the NNEXRAD feed.  This would bring your 
'exec "pqact"' entries in ldmd.conf up to two:

exec    "pqact -f ANY-NNEXRAD"
exec    "pqact -f NNEXRAD /usr/local/ldm/etc/pqact.conf.nexrad"

You would then take all NNEXRAD processing actions from your 
~ldm/etc/pqact.conf file and move them to ~ldm/etc/pqact.conf.nexrad.
You should insure that you remove the NNEXRAD feed processing from
your original ldmd.conf pqact action, otherwise it will continue to
process the data as it is doing now.

WARNING:  if you sent one or two USR2 signals to pqact to increase
the verbosity of logging, you _must_ remember to send an additional
USR2 signal to it to return back to regular logging.  Of course,
stopping and restarting the LDM will accomplish the same thing.

>I'm going to move mesodata, as well, closer to the campus core network 
>today, if things go well.  That *may* help, but I don't think it'll be 
>of much impact.

Given that your latencies are not bad (aside for those network glitch
periods), I am betting that the problem you are seeing is related to
pqact running behind.

>I'm also going to start running MRTG against the 
>routers nearest mesodata, so that I can keep up with what's going on on 
>our campus.

Sounds good.

>Any help you can offer would be most appreciated.

Take a look at the latency plots, and then take a look at debug pqact
logging.  I think that we will see that pqact is laboring to keep up
with the data, so adding a separate NEXRAD pqact action will help


>From address@hidden Thu Aug 28 08:44:55 2003


re: network dings
>Stupid Network Administrator tricks when trying to upgrade a piece of 
>Cisco hardware.  He had an extended down time when the gear failed and 
>didn't have a backup plan.  And, no it wasn't me!

re: latency is difference in inject and reception times

re: make sure seistan is setup to feed tamu machines
>OK.  Thanks.

re: latencies appear to be small outside of network dings

re: run a separate pqact for NNEXRAD data
>OK.  I'll add that early afternoon.  I've just been informed of a 
>meeting that can potentially kill the rest of my morning.

re: moving mesodata
>OK.  Moving the system (and probably renaming it) is still in the cards, 
>both logistically and politically.

re: up the logging level for pqact

>Turns out I've gotta debug my logging, too.  Last logs postted to 
>~ldm/data/logs are from January!  Shows how long things have been pretty 

>Gerry Creager -- address@hidden
>Network Engineering -- AATLT, Texas A&M University     
>Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.847.8578
>Page: 979.228.0173
>Office: 903A Eller Bldg, TAMU, College Station, TX 77843