[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: hung downstream LDM



Hey Justin.  Sorry to hear from you, again.  :-)

>Date: Fri, 28 Oct 2005 08:51:03 -0400
>From: Justin Cooke <address@hidden>
>Organization: NOAA
>To: Steve Emmerson <address@hidden>
>Subject: Re: hung downstream LDM

The above message contained the following:

> TOC upgraded their version of LDM to 6.4.2 on Wednesday morning 
> (10/26/05). 
> 
> We were receiving NEXRAD2 data without any issues until 7:55Z this 
> morning (10/28/05), here are the last entries in the ldmd.log related to 
> NEXRAD2:
> 
> Oct 28 07:55:51 b2n1 pqact[1503298] INFO:    27657 20051028075446.421 
> NEXRAD2 572016  L2-BZIP2/KGWX/20051028074951/572/16
> Oct 28 07:55:51 b2n1 140.90.85.102[1052760] INFO:    10231 
> 20051028075550.951 NEXRAD2 373002  L2-BZIP2/KLSX/20051028075531/373/2
> Oct 28 07:55:51 b2n1 pqact[1503298] INFO:    10231 20051028075550.951 
> NEXRAD2 373002  L2-BZIP2/KLSX/20051028075531/373/2
> Oct 28 07:55:53 b2n1 140.90.85.102[1052760] INFO:    14491 
> 20051028075551.234 NEXRAD2 106037  L2-BZIP2/KNKX/20051028075329/106/37
> Oct 28 07:55:53 b2n1 pqact[1503298] INFO:    14491 20051028075551.234 
> NEXRAD2 106037  L2-BZIP2/KNKX/20051028075329/106/37
> 
> I restarted our LDM at 11:10Z and the NEXRAD2 feed resumed:
> 
> Oct 28 11:10:52 b2n1 140.90.85.102[1052762] NOTE: Starting Up(6.4.2.4): 
> 140.90.85.102:388 20051028101052.996 TS_ENDT {{NEXRAD2,  ".*"}}
> Oct 28 11:10:53 b2n1 140.90.85.102[1052762] INFO: No matching 
> data-product in product-queue
> Oct 28 11:10:53 b2n1 140.90.85.102[1052762] NOTE: LDM-6 desired 
> product-class: 20051028101053.061 TS_ENDT {{NEXRAD2,  ".*"}}
> Oct 28 11:10:53 b2n1 140.90.85.102[1052762] INFO: Connected to upstream 
> LDM-6 on host 140.90.85.102 using port 388
> Oct 28 11:10:53 b2n1 140.90.85.102[1052762] NOTE: Upstream LDM-6 on 
> 140.90.85.102 is willing to be a primary feeder
> Oct 28 11:10:54 b2n1 140.90.85.102[1052762] INFO:    32578 
> 20051028101054.444 NEXRAD2 954036  L2-BZIP2/KBLX/20051028100834/954/36
> Oct 28 11:10:54 b2n1 140.90.85.102[1052762] INFO:    10097 
> 20051028101054.727 NEXRAD2 454015  L2-BZIP2/KICT/20051028100624/454/15
> Oct 28 11:10:54 b2n1 140.90.85.102[1052762] INFO:    23493 
> 20051028101055.084 NEXRAD2 550014  L2-BZIP2/KCLE/20051028100907/550/14
> Oct 28 11:10:54 b2n1 140.90.85.102[1052762] INFO:     6310 
> 20051028101055.248 NEXRAD2 945023  L2-BZIP2/KMSX/20051028100927/945/23
> ....
> 
> Once we saw that we had stopped receiving NEXRAD2 from the upstream site 
> I did a "ps -ef | grep PID" using the pid of our downstream rpc.ldmd 
> process for NEXRAD2...again there was a "<defunct>" process with a 
> parent PID of the process I searched for.  The same thing has occurred 
> each time we stop receiving NEXRAD2 data through our rpc.ldmd process 
> when it is in verbose mode.

I just recalled that a downstream LDM will fork(2) itself when trying to
write a log message under some rather unusual circumstances (i.e.,
normally, it shouldn't try to do this).  I'll investigate.

Unfortunately, I'm sick at the moment, so my thinking's not the
greatest.  If you want to have a look, search for "fork" in the file
"src/ulog/ulog.c".

> When I preformed the LDM restart the defunct process was cleared.
> 
> TOC is continuing to use version 6.4.2 of LDM.

It can't hurt (knock on wood :-).

> Any thoughts on where to go from here?

Can you get me the logfile entries from 140.90.85.102 for the same time
period?

> Thanks,
> 
> Justin

Regards,
Steve Emmerson