[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[LDM #UAK-912261]: data flow problems



Karen,

Is Dontpanic's LDM requesting data from the IP address associated with
Pluto's NIC that's on the internal network?

Try a notifyme(1) to both the explicit IP addresses associated with
Pluto (e.g., "notifyme -vl- -h 172.16.20.32".  Does it work correctly
with one of the addresses but not the other?

The log messages you sent show overlapping notifyme(1) processes.
Was that intentional (not that it should make any difference)?

> Okay, I've installed ldm-6.6.5.  I stopped/rebuilt queue/started and I'm
> still seeing the same kind of problems.  I did see a strange error when
> I was trying to rsync some data off the machine (that hasn't been sent
> downstream yet).  I replaced the NIC for the downstream systems.
> 
> BTW:  The is machine has 2 NICs.  A private network to our upstream data
> feeds, and one on our internal network here at NSSL.  The connection to
> the upstream systems has been fine, but the downstream is the one that
> seem(ed) flakey.  Putting in a new network card doesn't seem to have
> mattered.  I still see the data fine in an ldmadmin watch, but notifyme
> still keeps getting errors and reconnecting.
> 
> Eventually donwstream machines seem to connect up properly and be happy
> for awhile, but it can take 20-30 minutes.  Some downstream machines get
> connected, but others seem to be unable to get data -- or take a very
> long time to connect, and even then they might only see 1 radar... which
> makes no sense as they are the same feedtype -- the only difference is
> the radar ID in the filename and I'm *not* using it as part of the
> pattern in either the request or the allow.
> 
> No, the logging is usually going to ldmd.conf, but sometimes is appears
> to get stuck.  At that time it still logs to /var/log/messages.  Maybe
> it's just not flushing?
> Yes, I ran the make install_setuids as root, and the ownership/sticky
> bits are set properly.
> 
> dontpanic and towel are both running 6.4.5.  They connect to a number of
> other machines -- upstream and downstream -- and I've been running them
> for quite a while.
> 
> The allow entries are super simple on pluto.  They are:
> 
> allow   ANY ^((localhost|loopback)|(127\.0\.0\.1\.?$))
> allow   EXP     dontpanic.protect.nssl
> allow   EXP     isis.protect.nssl
> allow   EXP     towel.protect.nssl
> 
> And as I indicated at the beginning of towel wasn't getting data now,
> but dontpanic is.  Actually towel started getting data at 20:37 UTC,
> pluto's ldm was restarted at 20:19 UTC.  And yes al of my systems are
> running ntpd and the clocks are current.
> 
> Also the notifyme on pluto (to localhost) is still getting errors after
> 25 minutes:
> 
> May 28 20:30:40 pluto localhost.localdomain(noti)[12428] ERROR:
> nullproc5(localhost.localdomain): RPC: Unable to receive
> May 28 20:30:40 pluto localhost.localdomain(noti)[12428] NOTE: Exiting
> May 28 20:31:01 pluto localhost.localdomain(noti)[12736] NOTE: Starting
> Up(6.6.5/5): 20080528202009.680 TS_ENDT {{ANY,  ".*"}}
> May 28 20:31:01 pluto localhost.localdomain(noti)[12736] NOTE: topo:
> localhost.localdomain ANY
> May 28 20:31:22 pluto localhost.localdomain(noti)[12807] NOTE: Starting
> Up(6.6.5/5): 20080528203122.890 TS_ENDT {{ANY,  ".*"}}
> May 28 20:31:22 pluto localhost.localdomain(noti)[12807] NOTE: topo:
> localhost.localdomain ANY
> May 28 20:36:06 pluto localhost.localdomain(noti)[12736] ERROR:
> nullproc5(localhost.localdomain): RPC: Unable to receive
> May 28 20:36:06 pluto localhost.localdomain(noti)[12736] NOTE: Exiting
> May 28 20:36:36 pluto localhost.localdomain(noti)[12807] ERROR:
> nullproc5(localhost.localdomain): RPC: Unable to receive
> May 28 20:36:36 pluto localhost.localdomain(noti)[12807] NOTE: Exiting
> May 28 20:36:48 pluto localhost.localdomain(noti)[13220] NOTE: Starting
> Up(6.6.5/5): 20080528203122.890 TS_ENDT {{ANY,  ".*"}}
> May 28 20:36:48 pluto localhost.localdomain(noti)[13220] NOTE: topo:
> localhost.localdomain ANY
> May 28 20:42:03 pluto localhost.localdomain(noti)[13220] ERROR:
> nullproc5(localhost.localdomain): RPC: Unable to receive
> May 28 20:42:03 pluto localhost.localdomain(noti)[13220] NOTE: Exiting
> 
> I did notice some errors about resolution problems on one of my startups
> today, so I put dontpanic and towel in the /etc/hosts on the machine so
> it would not have to use DNS, but it hasn't made any difference.
> 
> > The original REQUEST entry in Dontpanic's LDM configuration-file is
> > invalid because it doesn't have a pattern.  The proper format is
> > "REQUEST <feedset> <pattern> <host>[:<port>] [<OK_re> [<NOT_re>]]",
> > where the square brackets denote optional fields.
> 
> Doh!  I put that in from memory, and my memory has holes. :)
> 
> request   EXP (.*) 172.16.20.32
> 
> and last week it was working just fine.
> 
> Hmmmm.... any more ideas???
> 
> --
> -------------------------------------------
> 
> There are 2 kinds of people in the world:
> 
> 1) Those who can extrapolate from incomplete data.
> 
> -------------------------------------------
> address@hidden
> 
> Phone:  405-325-6982
> Cell: 405-834-8559
> SAIC/Systems Analyst
> National Severe Storms Laboratory


Regards,
Steve Emmerson

Ticket Details
===================
Ticket ID: UAK-912261
Department: Support LDM
Priority: Normal
Status: On Hold