[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[LDM #WOM-274153]: Reproducible bug in LDM 6.13.10



Gilbert,

> >> ldmadmin stop
> >> ldmadmin delqueue
> >> ldmadmin clean
> >
> > I take it there were no problems with these commands.
> 
> Nope.

Good.

> > There were no messages from the downstream LDM processes on nfs01 about the 
> > connection to the upstream LDM process being broken?
> 
> No. Is that supposed to happen? I would assume so, but I've never done a dual 
> primary feed request until you told me to do so.

When a downstram LDM process has to reconnect, it logs a NOTICE-level message.

> > What are the pqact(1) actions that aren't writing to disk? FILE or PIPE?
> 
> FILE for sure. Given that I get alerts that all the feeds are down, I'm 
> thinking the PIPEs are dead too.

OK.

> >> There are NO log entries on nfs01 showing anything out of the
> >> ordinary; everything appears to be normal. But, nothing is writing to disk,
> >> even though an "ldmadmin watch" on nfs01 shows all feeds apparently coming
> >> in just fine.
> >
> > Including IDS|DDPLUS?
> 
> Yep!
> 
> >> But, this command in LDM's crontab on nfs01:
> >>
> >> 1,6,11,17,21,26,32,36,41,47,51,56 * * * * /bin/bash -l -c 'wasReceived -f
> >> "WMO|NIMAGE|NGRID|NEXRAD3" -o 180' || /bin/mail -s 'NOAAPORT data has not
> >> been received in the last 3 minutes on nfs01' address@hidden,
> >> address@hidden,address@hidden,address@hidden
> >> </dev/null
> >>
> >> Gets me the dreaded alert via text and emails:
> >>
> >> NOAAPORT data has not been received in the last 3 minutes on nfs01
> >
> > This is inconsistent with your assertion that an "ldmadmin watch" on nfs01 
> > shows all feeds continuing to arrive.
> 
> Correct, which is why this is so bizzarre.

I'd say impossible.

> > So the downstream LDM on nfs01 that requests IDS|DDPLUS from 
> > idd.aos.wisc.edu stops inserting such products into the queue?
> 
> Apparently so. I can see them come across, but they just don't write into the 
> queue anymore.

But, you said an "ldmadmin watch" shows IDS|DDPLUS products. That command just 
watches the end of the queue for newly-inserted products. If it shows them, 
then they were inserted.

> Now, if I tell the LTM to stop on NFS01, and then I tell it to start again, 
> it says the queue is corrupt and it won't start.

The LDM stops cleanly?

> > Are the clocks on all the systems correct?
> 
> Yep.
> 
> > When this happens, can you attach gdb(1) to one of the pqact(1) processes 
> > that isn't writing anything to disk and send me a stack trace?
> 
> Can you give me a command line example, please?

gdb -p <pid>, where <pid> is the process identifier of the pqact(1) process. 
Then execute the command "where" followed by "q" to exit.

Regards,
Steve Emmerson

Ticket Details
===================
Ticket ID: WOM-274153
Department: Support LDM
Priority: High
Status: Closed
===================
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata 
inquiry tracking system and then made publicly available through the web.  If 
you do not want to have your interactions made available in this way, you must 
let us know in each email you send to us.