[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[LDM #WOM-274153]: Reproducible bug in LDM 6.13.10



Gilbert,

Thanks for sending this in.

> Last night, NCEP had a problem with the fiber MRMS feed and failed over
> from NCEP to Boulder. When this happened, LDM 6.13.10 on NFS01 dropped the
> EXP feed, which the MRMS feed comes across as. NCEP got it back working
> again, but the moment NCEP went down, the EXP feed stopped writing to disk
> on NFS01, even though the feed came back up shortly thereafter.
> 
> All of the other feeds (IDSDDPLUSNEXRAD3|NEXRAD2, etc) on NFS01 were fine.
> When we restarted the LDM after remaking the queue on NFS01, the EXP feed
> came back. For now, we had to switch NFS01 back to LDM 6.13.6, which is NOT
> experiencing this issue. We didn't deal with doing a gdb, sorry. We wanted
> to get back on the air

I don't blame you.

> I am going to test something. I am going to shut down the LDM  on LDM01,
> which also causes everything to go down on NFS01, AFTER I change NFS01's
> ldmd.conf, to feed the "old" way, with a primary and backup, instead of
> with two primaries, and see if that crashes the feeds.
> 
> ****Does test****
> 
> OK, bingo, I have something. So, before, on NFS01, using LDM 6.13.10, we
> were requesting feeds like this in ldmd.conf (I'll call this "method 1"):
> 
> request ANYFEED .* internalserverldm01
> request ANYFEED .* externalserver
> 
> Then, based on your and Darrell's statements, I changed everything to do it
> this way (I'll call this method 2"):
> 
> request ANYFEED ".*" internalserverldm01
> request ANYFEED .* externalserver

Uh... these two different methods are effectively identical, actually. The 
quotation marks around the "internalserverldm01" request don't do anything: 
they're superfluous. The scanner for the LDM configuration-file will use them 
to delimit the pattern field and then discard them.

Did you mean to have a request line like this?

    request ANYFEED (.*) internalserverldm01"

Parentheses aren't ignored; consequently, the two patterns will be considered 
different and both feeds will start and stay in primary transfer mode.

> But, using method 2, if an upstream server (either one) from NFS01 went
> down completely for any reason, any and all feedtypes NFS01 that used to be
> received from the dead server would stop writing to disk,
> even though the other server was working and feeding just fine. It
> apparently switches the feeds, but it just won't write the products to disk.
> 
> So: I just restarted LDM01 after I changed everything in ldmd.conf to using
> method 1. Guess what: NFS01's FEEDS DIDN"T CRASH! Everything remained up,
> and writing to disk as expected.
> 
> Can you reproduce this?

If the patterns in the request entries are like what you wrote, then it won't 
make any difference because they're effectively the same.

The problem doesn't appear to be the feeds crashing but, rather, the relevant 
pqact(1) process stops processing data-products. Unfortunately, we're not 
seeing that here.

I'll see about looking into the differences between 6.13.6 and 6.13.8, however.

Regards,
Steve Emmerson

Ticket Details
===================
Ticket ID: WOM-274153
Department: Support LDM
Priority: High
Status: Closed
===================
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata 
inquiry tracking system and then made publicly available through the web.  If 
you do not want to have your interactions made available in this way, you must 
let us know in each email you send to us.