[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[LDM #TCW-702523]: downstream LDM server not receiving all data (e.g. nexrad) from local LDM/NOAAPORT ingest systems



Hi Gregg,

First, I drove your email into our inquiry tracking system where multiple
people can see the exchanges and comment if/when needed.

Second, the comments below were being written by me at the same time that
Steve was composing his response.  I will send this reply as there are
some things that I touch on that Steve did not...

re:
> I have an odd LDM quirk, a downstream LDM server (i.e. rhesrv22.spc.noaa.gov)
> is requesting everything from two upstream LDM servers ingesting NOAAPORT
> data (i.e. sbn1.spc.noaa.gov and sbn2.spc.noaa.gov).  However, the
> downstream LDM server is NOT receiving any of the NEXRAD/nids products.

First comment:

The REQUEST lines you show below are not for "everything", they are for
everything in the compound feed type 'WMO'.  'WMO' is defined as:

WMO == HDS|IDS|DDPLUS

In particular, it does NOT include the NEXRAD3 feed, and this is the
feed that the great majority of NEXRAD Level 3 products are put into
by the NOAAPort ingest routines.

re:
> Yes the upstream sbn1/sbn2 servers are the ones we worked on earlier
> this summer.

OK, this is good to know.

re:
> When I run ldmadmin watch on sbn1/sbn2 I see single site
> radar data entries, when I "kill -USR2" the ldmd process I see entries
> (like pasted farther below) where the upstream LDM is "sending" products
> to the downstream (i.e. rhesrv22) LDM server.  However, the downstream
> rhesrv22 LDM server never receives/sees the single site radar data.

OK.

re:
> I've tried several different types of "REQUEST" entries (see immediately
> below).  Also, just a tad farther down you can see the upstream LDM server,
> SBN2 in this case, sees the request for WMO ..
> 
> Downstream RHESRV22 LDM server, ldmd.conf entries with "set limit" to show
> tab characters:
> 
> # NEW SPC SBN Ingest systems SBN1 and SBN2 using Unidata noaaportIngester$
> # replacing the legacy Northrup Grumman Acq_server/Acq_client software.$
> #REQUEST^IWMO^I".*"^Isbn1.spc.noaa.gov$
> #REQUEST^IWMO^I".*"^Isbn2.spc.noaa.gov$
> #REQUEST^IWMO ".*" sbn1.spc.noaa.gov$
> #REQUEST^IWMO ".*" sbn2.spc.noaa.gov$
> REQUEST^IWMO^I.*^Isbn1.spc.noaa.gov$
> REQUEST^IWMO^I.*^Isbn2.spc.noaa.gov$

These are the REQUEST lines that I was commenting about above; they do not
include the NEXRAD3 feed.

re:
> Snippet from SBN2 LDMD.LOG showing connection from rhesrv22 (140.90.173.93):
> 
> 20200901T194154.391586Z rhesrv22.spc.noaa.gov(feed)[25081] up6.c:up6_run:445  
>                  NOTE  Starting Up(6.13.11/6): 20200901184153.378583 TS_ENDT 
> {{WMO, ".*"}}, SIG=35a37ec4a8013024e4ca12da7ab6949f, Primary
> 20200901T194154.391644Z* rhesrv22.spc.noaa.gov 
> <http://rhesrv22.spc.noaa.gov>*(feed)[25081] up6.c:up6_run:448 NOTE  topo:  
> *rhesrv22*.spc.noaa.gov {{WMO, (.*)}}
> 20200901T213625.633357Z 140.90.173.93(noti)[8632] forn5_svc.c:forn_5_svc:468  
>         NOTE  Starting Up(6.13.11/5): 20200901213625.624379 TS_ENDT {{*ANY, 
> ".*"*}}
> 20200901T213625.633413Z 140.90.173.93(noti)[8632] forn5_svc.c:forn_5_svc:471  
>         NOTE  topo:  140.90.173.93 *ANY*
> 20200901T213732.634086Z 140.90.173.93(noti)[8632]   forn5_svc.c:noti5_sqf:273 
> ERROR SDUS62 KCHS 012133 /pNCZCLX: RPC: Unable to receive
> 20200901T213732.634148Z 140.90.173.93(noti)[8632] forn5_svc.c:forn_5_svc:554  
>         ERROR pq_sequence failed: Input/output error (errno = 5)
> 20200901T213732.634197Z 140.90.173.93(noti)[8632]   ldmd.c:cleanup:192 NOTE  
> Exiting
> 20200901T213732.635402Z ldmd[1778]                  ldmd.c:reap:177 NOTE  
> child 8632 exited with status 1

OK, the 'topo' message for {{WMO, (.*)}} shows that SBN2's LDM should send
all products in the WMO datastream to the downstream machine, rhesrv22.

re:
> [ldmcp@sbn2 ~/logs]$
> 
> When I run notifyme on rhesrv22 to an upstream LDM server (e.g. sbn2) it
> does see the radar data in the sbn2 queue:
> 
> notifyme -v -x -l- -h sbn2.spc.noaa.gov

Hmm...

re:
> ...
> 
> Sep 01 22:00:42 notifyme[156591] INFO:
> f863de65721c3e1167cd0dcd503bf0a5    31694 20200901220023.641 NEXRAD3 
> 157442609  SDUS22 KTAE 012157 /pN2UTLH !nids/
> 
> ...
> 
> I'm not sure how the complete invocation of uldbutil is supposed to be, but
> when I run notifyme on rhesrv22 and uldbutil on sbn2 I get the following:
> 
> [ldmcp@sbn2 ~/logs]$ *uldbutil*
> 
> 25081 6 feeder rhesrv22.spc.noaa.gov 20200901184153.378583 TS_ENDT {{WMO, 
> ".*"}} primary
> 11789 5 notifier rhesrv22.spc.noaa.gov 20200901220001.137216 TS_ENDT {{ANY, 
> ".*"}} alternate

This looks OK.

re:
> [ldmcp@sbn2 ~/logs]$
> 
> and when I run it moments later on sbn1 I get the following, note the "Is
> LDM running?"
> 
> [ldmcp@sbn1 ~/logs]$ uldbutil
> 
> 20200901T220221.109751Z uldbutil[17665]             uldb.c:sm_setShmId:1069 
> NOTE  No such file or directory
> 20200901T220221.109925Z uldbutil[17665]             uldbutil.c:main:99 NOTE  
> The upstream LDM database doesn't exist. Is the LDM running?

This is indicating that the shared memory segment that is created and used by
the LDM is not being found by 'uldbutil'.  I think that you would also see
that it does not exist by running 'ipcs'.

I suggest restarting the LDM on SBN2 and then immediately checking to make
sure that the shared memory segment used by the LDM exists (via an
'uldbutil' invodation and by running 'ipcs').

The question that immediately comes to my mind is why the shared memory
segment that is created by the LDM upon startup is nowhere to be found?

Question:

Is it possible that there was some process that was run that cleared
out shared memory segments?  The "classic" case of something like this
happening was on systems where GEMPAK was also being run in the 'ldm'
account, and someone setup a process to delete shared memory segments
as a cleanup measure for GEMPAK.

> [ldmcp@sbn1 ~/logs]$
> 
> LDM on both SNB1 and SBN2 is running, and confirmed via "ldmadmin
> isrunning" and looking at output of $status, as well as a list of
> processes (see farther below).

I don't believe that 'ldmadmin isrunning' checks to see if the
shared memory segment exists.  Steve can comment on this with more
authority.

re:
> All of these LDM servers sit on the inside of a firewall and there is no
> network shaping that I'm aware of.  I was reading some of the support
> emails and noticed sometimes downstream LDMs didn't receive products
> because there was too big of a time delay and other cases there was
> network traffic shaping going on.

Correct.  One of the settings in the LDM registry is to set the
maximum latency for products.  When that latency is exceeded, the
products are received but never put into the receiving LDM's
queue.  One way that the maximum latency could be exceeded is
for the clocks on the downstream and upstreams machines to be
off as much as or more than the max latency parameter that is
set in the downstream machine's LDM registry.

I see that you included the output of 'ldmadmin config' run on
both SBN1 and SBN2.  I did not see (or missed seeing) the same
output run on the downstream, rhesrv22.

re:
> Do you have any suggestions for me to look at to troubleshoot why
> the downstream LDM server is NOT getting all of the products from the
> upstream LDM server (i.e. in particular the nexrad data, but all the
> data is needed)?

Please run 'ldmadmin config' on rhesrv22 and send us the results.
Also, please restart the LDM on SBN1 and report back the results
of 'uldbutil' run on it after rhesrv22 has re-issued its REQUEST
for the WMO feed.

Cheers,

Tom
--
****************************************************************************
Unidata User Support                                    UCAR Unidata Program
(303) 497-8642                                                 P.O. Box 3000
address@hidden                                   Boulder, CO 80307
----------------------------------------------------------------------------
Unidata HomePage                       http://www.unidata.ucar.edu
****************************************************************************


Ticket Details
===================
Ticket ID: TCW-702523
Department: Support LDM
Priority: Normal
Status: Open
===================
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata 
inquiry tracking system and then made publicly available through the web.  If 
you do not want to have your interactions made available in this way, you must 
let us know in each email you send to us.