[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20020730: LDM latency problem - followup (cont.)



>From: Mike Voss <address@hidden>
>Organization: SJSU
>Keywords: 200207302019.g6UKJL905705 IDD latency

Mike,

>No smoke here :-)

I am relieved ;-)

>Well, at first I suspected that aeolus was having latency issues and so I
>sent this email to Larry last Thursday:

  >---------- Forwarded message ---------- 
  >Date: Thu, 25 Jul 2002 07:59:52 -0700 (PDT) 
  >From: Mike Voss <address@hidden> 
  >To: address@hidden 
  >Subject: ldm feed
  >Hi Larry, 
  >My primary feed is Arizona, but I have been feeding from your machine 
  >recently. Since about Sunday (7-21) the HDS and MCIDAS data has been 
  >inconsistent...I believe because of latency issues. I've been trying to 
  >figure out if my machine is not handling the load or what. When I do a 
  >notifyme on aeolus I get the following: 
  >----snip---- 
  >Jul 25 14:45:13 notifyme[25457]: NOTIFYME(aeolus.ucsd.edu): OK 
  >Jul 25 14:45:14 notifyme[25457]: 925 20020725144514.924 IDS|DDPLUS 
  >171 FPUS73 KEAX 251444 /pNOWMCI 
  >Jul 25 14:45:14 notifyme[25457]: 13639 20020725144514.409 NNEXRAD 
  >40018202 SDUS54 KEWX 251444 /pN0VEWX 
  >Jul 25 14:45:14 notifyme[25457]: 15347 20020725144514.509 NNEXRAD 
  >40018203 SDUS54 KEWX 251444 /pN0SEWX 
  >Jul 25 14:45:14 notifyme[25457]: 616 20020725144514.530 NNEXRAD 
  >40018205 SDUS52 KJAX 251439 /pNVLVAX 
  >Jul 25 14:45:14 notifyme[25457]: 7051 20020725144514.580 NNEXRAD 
  >40018207 SDUS72 KJAX 251439 /pN1VJAX 
  >Jul 25 14:45:14 notifyme[25457]: 9776 20020725144514.661 NNEXRAD 
  >40018210 SDUS24 KLUB 251439 /pN2RLBB 
  >Jul 25 14:45:14 notifyme[25457]: 6560 20020725144514.708 NNEXRAD 
  >40018212 SDUS76 KLOX 251440 /pN1VVTX 
  >Jul 25 14:45:14 notifyme[25457]: 109 20020725144515.146 IDS|DDPLUS 
  >181 SAUS45 KCYS 251444 /pMTRBFU 
  >Jul 25 14:45:14 notifyme[25457]: 8559 20020725144516.069 HDS 184 
  >SDUS83 KGRB 251441 /pDPAGRB 
  >----snip---
  >....which looks good. But when I add on a time, ie,: 
  >---snip----- 
  >rossby:~>notifyme -vl - -h aeolus.ucsd.edu -o 11000 
  >Jul 25 14:54:16 notifyme[25479]: Starting Up: aeolus.ucsd.edu: 
  >20020725115056.124 TS_ENDT {{ANY, ".*"}} 
  >Jul 25 14:54:16 notifyme[25479]: NOTIFYME(aeolus.ucsd.edu): OK 
  >Jul 25 14:54:16 notifyme[25479]: 5926 20020725115056.237 IDS|DDPLUS 
  >699 SXUS56 KSCS 251100 /pSCNGA 
  >Jul 25 14:54:16 notifyme[25479]: 3478 20020725115056.249 IDS|DDPLUS 
  >700 SXUS56 KSCS 251100 /pSCNIA 
  >Jul 25 14:54:16 notifyme[25479]: 7484 20020725115056.284 IDS|DDPLUS 
  >702 SXUS86 KSGX 251150 /pOMRSGX 
  >Jul 25 14:54:17 notifyme[25479]: 6520 20020725115056.141 NNEXRAD 
  >39914748 SDUS35 KGGW 251148 /pN3SGGW 
  >Jul 25 14:54:17 notifyme[25479]: 6652 20020725115056.183 NNEXRAD 
  >39914749 SDUS23 KFSD 251148 /pN2SFSD 
  >Jul 25 14:54:17 notifyme[25479]: 2537 20020725115056.202 NNEXRAD 
  >39914750 SDUS35 KMSO 251146 /pNVWMSX 
  >Jul 25 14:54:17 notifyme[25479]: 612 20020725115056.457 HDS 704 
  >SFUS41 KWBC 251149 
  >-----snip--- 

  >..which tells me there is data over three hours old in your queue, maybe 
  >that is on purpose.
  >Anyway, I just though I would check and see if all your data is coming in 
  >in a timely manner? 
  >Thanks, 
  >Mike
  >----end of forwarded message

I agree with your reasoning in asking Larry if it was by design that his
queue has HRS data that is 3 hours old.

>Larry responded that things looked good on his end and he copied support:
>http://www.unidata.ucar.edu/glimpse/idd/5776

Like I said, I wasn't in the loop until I sent you my first email.  Right
now we are holding workshops, so lots of folks are involved elsewhere
(mine starts on this coming Monday).

>I was on vacation last week, and didn't pursue this any further. Now here we
>are. 
>
>LDM access to rossby:
>
>First ssh to metsun1.met.sjsu.edu as "ldm" and xxx
>then ssh to rossby from metsun1 as "ldm" and xxx

OK, I'm on.

>Sorry for the sorry state of my config files, I've been blasting away on
>them trying different options.

No problem.

>Notes:
>
>-I did change the ldmd.conf on rossby allow all the HDS in.

I see this.

>- the RPC errors in ldmd.conf are recent...I believe since I upgraded to LMD-5
> .2 this morning.

OK.

>- notifyme does not seem to work right now on the local host...gives an rpc er
> ror in the log. "ldmadmi watch" works fine....

Hmm...  Not good.

>- I'll be looking at this stuff from home tonight to see how the 00Z HDS flood
>  is handled

I see LOTS of RECLASS messages in your ~ldm/logs/ldmd.log file.

>cheers, and thanks for the help!

You have to hold up the thanks until I do something.

Tom

>From address@hidden Wed Jul 31 09:29:27 2002
>Subject: Re: 20020730: 20020730: LDM latency problem - more followup

Tom,

Here is some notifyme output from this morning, I think it sheds light
on the problem. When I do a notifyme on aeolus.ucsd.edu I only need to
go out to an offset of 200 seconds to get data:

rossby:~>notifyme -vl - -h aeolus.ucsd.edu -f HDS -o 200
Jul 31 15:19:09 notifyme[10452]: Starting Up: aeolus.ucsd.edu: 
20020731151549.730 TS_ENDT {{HDS,  ".*"}}
Jul 31 15:19:10 notifyme[10452]: NOTIFYME(aeolus.ucsd.edu): OK
Jul 31 15:19:11 notifyme[10452]:     9102 20020731151550.668     HDS 396  
JUSA42 KWNO 311400
Jul 31 15:19:11 notifyme[10452]:     5969 20020731151550.769     HDS 404  
SDUS84 KMOB 311512 /pDPAEVX
Jul 31 15:19:11 notifyme[10452]:     5126 20020731151550.868     HDS 406  
SDUS82 KTBW 311513 /pDPATBW


But on rossby.met.sjsu.edu I need to go out to 3700 seconds before I
get anything:

rossby:~>notifyme -vl - -f HDS -o 3400 -T 30
Jul 31 15:25:13 notifyme[10771]: Starting Up: localhost: 20020731142833.144 
TS_ENDT {{HDS,  ".*"}}
Jul 31 15:25:13 notifyme[10771]: NOTIFYME(localhost): OK
Jul 31 15:25:44 notifyme[10771]: Timed out after 30 seconds inactivity
Jul 31 15:25:44 notifyme[10771]: Disconnect


rossby:~>notifyme -vl - -f HDS -o 3700 -T 30
Jul 31 15:26:12 notifyme[10784]: Starting Up: localhost: 20020731142432.116 
TS_ENDT {{HDS,  ".*"}}
Jul 31 15:26:12 notifyme[10784]: NOTIFYME(localhost): OK
Jul 31 15:26:12 notifyme[10784]:     3264 20020731142436.272     HDS 045  
YPID91 KWBF 311200 /mNGM
Jul 31 15:26:12 notifyme[10784]:     6930 20020731142436.289     HDS 046  
YPQD91 KWBF 311200 /mNGM
Jul 31 15:26:12 notifyme[10784]:     2050 20020731142436.301     HDS 047  
YPND91 KWBF 311200 /mNGM

This all tells me the latency is between me an aeolus....correct?

I will be in a meeting all morning. Feel free to log into to rossby as
indicated on the prior email.

Cheers,
Mike

>From address@hidden Wed Jul 31 13:40:07 2002
>Subject: Re: 20020730: LDM latency problem - followup (cont.)

Tom,

Sorry to keep spamming you all, I know your busy with the work shops. I
found something interesting, (I don't know why this didn't jump out at
me before). I have our ingest machine on a MRTG, which clearly shows a
big drop off in data flow 10 days ago..look at the "monthly" graph:
http://130.65.81.201/mrtg/130.65.80.62.4007.html

This may indicate a router problem on our end....I have our network
folks investigating this now.  cheers,

Mike