[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[LDM #NHG-751993]: Weird LDM 6.13.6 issue on a NOAAport ingest server



Hey Gilbert,

The fact that an "ldmadmin watch" showed NNEXRAD products means that those
products we're being inserted into the product-queue. That command uses the
pqutil(1) utility to watch the end of the queue and log matching products.
So NNEXRAD products were being successfully received and inserted.

If they weren't being relayed, then the downstream LDM process  must have
terminated. If the upstream LDM process crashed, then the LDM server would
have logged that fact. Also, the downstream LDM process would have 
re-connected if the upstream LDM failed and another upstream LDM process
would have been created.

Check the downstream LDM log file.

If this happens again, use the uldbutil(1) utility on the upstream system
to list all the upstream LDM processes. If there isn't one for the
NNEXRAD products, then the downstream LDM process terminated.

> I have a vexing problem that I hope you can help me with. Over the last few
> days, twice---the LDM stopped relaying the NNEXRAD feed from our NOAAport
> satellite dish server. Doing an ldmadmin watch, the LDM was clearly working
> on our ingestor, and the WMO headers for the NNEXRAD feed were still flying
> by. I investigated and found:
> 
> 1. No error or noteworthy messages in /home/ldm/var/logs/ldmd.log
> 2. No error or noteworthy messages in /var/spool/messages
> 3. No issues on our other LDM servers, which switched to a backup when it
> stopped relaying
> 4. No core dump files (the other data besides NNEXRAD kept being relayed,
> so the LDM didn't crash, it's like the process that handled it just died)
> 5. We are losing about 500 packets per day off our dish
> 6. The Novra receiver has the latest firmware on it
> 
> I am mildly suspecting two things:
> 
> 1. The LDM NOAAport ingester received a garbled Level3 file, and that
> caused an ingestor process to hang
> 2. We changed to using a ramdisk (vs. storing them in /dev/shm) a few weeks
> ago
> 
> This stinks because I have no more information, no other data to show you.
> There simply isn't any. Should we recompile the LDM on the ingestor in
> debug mode (I need to recall how to do that), and then let it run until it
> chokes again?

Regards,
Steve Emmerson

Ticket Details
===================
Ticket ID: NHG-751993
Department: Support LDM
Priority: Normal
Status: Closed
===================
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata 
inquiry tracking system and then made publicly available through the web.  If 
you do not want to have your interactions made available in this way, you must 
let us know in each email you send to us.