[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[LDM #UAK-912261]: data flow problems



Karen,

> ldmadmin watch on pluto shows data from both sources (KTLX and KFDR) :
> 
> May 28 15:32:25 pqutil:  3263157 20080528151525.039     EXP 000
> /data/2008/05/28/KTLX_RVP.20080528.153106.vcp11.3.lipc.gz
> May 28 15:32:41 pqutil:  3257255 20080528153136.597     EXP 000
> /data/2008/05/28/KFDR_RVP.20080528.153135.vcp32.3.lipc.gz
> May 28 15:33:19 pqutil:  3247219 20080528151605.957     EXP 000
> /data/2008/05/28/KTLX_RVP.20080528.153147.vcp11.5.lipc.gz
> May 28 15:34:13 pqutil:  3238735 20080528151626.914     EXP 000
> /data/2008/05/28/KTLX_RVP.20080528.153208.vcp11.6.lipc.gz
> May 28 15:35:35 pqutil:  3243160 20080528153436.565     EXP 000
> /data/2008/05/28/KFDR_RVP.20080528.153435.vcp32.5.lipc.gz
> May 28 15:36:38 pqutil:  3283551 20080528151939.864     EXP 000
> /data/2008/05/28/KTLX_RVP.20080528.153521.vcp11.1.lipc.gz
> 
> But the notifyme shows only KFDR:
> 
> May 28 15:31:30 notifyme[12565]:   133035 20080528153129.350     EXP
> 000  wdssii/KFDR_RVP.20080528.153017.vcp32.2.N360.nc.gz
> May 28 15:32:41 notifyme[12565]:  3257255 20080528153136.597     EXP
> 000  /data/2008/05/28/KFDR_RVP.20080528.153135.vcp32.3.lipc.gz
> May 28 15:35:35 notifyme[12565]:  3243160 20080528153436.565     EXP
> 000  /data/2008/05/28/KFDR_RVP.20080528.153435.vcp32.5.lipc.gz
> May 28 15:37:29 notifyme[12565]:  3234282 20080528153612.344     EXP
> 000  /data/2008/05/28/KFDR_RVP.20080528.153610.vcp32.6.lipc.gz

That's very bizarre. 

The LDM on Pluto is version 6.0.14.  That's a very old version.  There
have been many, many bug fixes since then.  Any chance of upgrading it?

If you're willing, I can install the latest version for you in about
3 minutes.

> The ldmd.log doesn't appear to be getting info right now.  It doesn't
> have anything since 14:33 this morning.   This is what I see in
> /var/log/messages about the notifyme:

If the LDM isn't logging in the right place, then there's likely
a problem with the LDM installation.  Check /etc/syslog.conf: it
should have a "local0.none" entry for /var/log/messages and a
"local0.*" or "local0.debug" entry for the LDM log file.  Also
check that bin/hupsyslog is owned by root and setuid.

While you're at it, check that bin/rpc.ldmd is owned by root and
setuid as well.

> May 28 15:41:49 pluto localhost(noti)[12566]:
> /data/2008/05/28/KFDR_RVP.20080528.154034.vcp32.2.lipc.gz: RPC: Unable
> to receive
> May 28 15:41:49 pluto localhost(noti)[12566]: pq_sequence failed:
> Input/output error (errno = 5)
> May 28 15:41:55 pluto rpc.ldmd[6619]: child 12566 exited with status 1
> 
> Yesterday I only had one machine -- dontpanic --  connected to pluto it
> was not getting the KTLX data )and after a restart it got nothing for
> quite awhile.  This morning it is again only getting KFDR.
> 
> I am seeing a lot of thiskind of stuff about dontpanic in pluto's logs:
> 
> May 28 14:21:20 pluto dontpanic[7327]: ldm6_server.c:140: Restricting
> request: 20080528141716.251 TS_ENDT {{EXP,  "KTLX_RVP|KFDR_RVP"},{NONE,
> "SIG=2e2e9f9f258d88dffc085511401859ad"}} -> 20080528141716.251 TS_ENDT
> {{EXP,  "KTLX_RVP|KFDR_RVP"}}
> May 28 14:21:20 pluto dontpanic(feed)[7327]: up6.c:331: Starting
> Up(6.0.14/6): 20080528141716.251 TS_ENDT {{EXP,  "KTLX_RVP|KFDR_RVP"}}
> May 28 14:21:20 pluto dontpanic(feed)[7327]: topo:
> dontpanic.protect.nssl EXP
> May 28 14:28:55 pluto dontpanic(feed)[7327]: up6.c:288: nullproc_6()
> failure to dontpanic.protect.nssl: RPC: Unable to receive; errno =
> Connection reset by peer
> May 28 14:28:55 pluto dontpanic[7813]: ldm6_server.c:140: Restricting
> request: 20080528142755.210 TS_ENDT {{EXP,  "KTLX_RVP|KFDR_RVP"},{NONE,
> "SIG=1f138c21499af8668b193a19c884030a"}} -> 20080528142755.210 TS_ENDT
> {{EXP,  "KTLX_RVP|KFDR_RVP"}}
> May 28 14:28:55 pluto dontpanic(feed)[7813]: up6.c:331: Starting
> Up(6.0.14/6): 20080528142755.210 TS_ENDT {{EXP,  "KTLX_RVP|KFDR_RVP"}}
> May 28 14:28:55 pluto dontpanic(feed)[7813]: topo:
> dontpanic.protect.nssl EXP
> May 28 14:29:01 pluto rpc.ldmd[6619]: child 7327 exited with status 5
> May 28 14:33:09 pluto dontpanic(feed)[7813]: up6.c:288: nullproc_6()
> failure to dontpanic.protect.nssl: RPC: Unable to receive; errno =
> Connection reset by peer
> May 28 14:33:09 pluto rpc.ldmd[6619]: child 7813 exited with status 5

Pluto's LDM log messages regarding Dontpanic indicate that Dontpanic
is resetting (i.e., closing) the TCP connection.  What version LDM is
it running?

> However  I added another machine downstream to see if I could get the
> data there.  It is currently getting all of the data from pluto:
> 
> May 28 15:32:26 pqutil INFO:  3263157 20080528151525.039     EXP 000
> /data/2008/05/28/KTLX_RVP.20080528.153106.vcp11.3.lipc.gz
> May 28 15:32:42 pqutil INFO:  3257255 20080528153136.597     EXP 000
> /data/2008/05/28/KFDR_RVP.20080528.153135.vcp32.3.lipc.gz
> May 28 15:33:19 pqutil INFO:  3247219 20080528151605.957     EXP 000
> /data/2008/05/28/KTLX_RVP.20080528.153147.vcp11.5.lipc.gz
> May 28 15:34:13 pqutil INFO:  3238735 20080528151626.914     EXP 000
> /data/2008/05/28/KTLX_RVP.20080528.153208.vcp11.6.lipc.gz
> May 28 15:35:36 pqutil INFO:  3243160 20080528153436.565     EXP 000
> /data/2008/05/28/KFDR_RVP.20080528.153435.vcp32.5.lipc.gz
> May 28 15:36:38 pqutil INFO:  3283551 20080528151939.864     EXP 000
> /data/2008/05/28/KTLX_RVP.20080528.153521.vcp11.1.lipc.gz

The fact that another system can get all the data from Pluto indicates
that the problem lies with either Dontpanic or with the LDM configuration
on Pluto regarding Dontpanic.  What are the ALLOW entries in Pluto's LDM
configuration-file regarding Dontpanic?

> Basically I'm stumped.  DNS is working properly and the machines can
> ping/ldmping each other, and some of the data is getting through.  The
> request line on dontpanic was very simple.
> 
> request EXP 172.16.20.32

The original REQUEST entry in Dontpanic's LDM configuration-file is 
invalid because it doesn't have a pattern.  The proper format is
"REQUEST <feedset> <pattern> <host>[:<port>] [<OK_re> [<NOT_re>]]",
where the square brackets denote optional fields.

> I modified it to
> 
> request EXP "KTLX_RVP|KFDR_RVP" 172.16.20.32

The modified REQUEST entry is valid.

> to see if it made a difference, but it hasn't seemed to do any good.  In
> fact, I'm not even sure it's a good pattern...  but since pluto can't
> even get notifyme's of the KTLX data from itself.. I'm lost.
> 
> --
> -------------------------------------------
> 
> There are 2 kinds of people in the world:
> 
> 1) Those who can extrapolate from incomplete data.
> 
> -------------------------------------------
> address@hidden
> 
> Phone:  405-325-6982
> Cell: 405-834-8559
> SAIC/Systems Analyst
> National Severe Storms Laboratory

Regards,
Steve Emmerson

Ticket Details
===================
Ticket ID: UAK-912261
Department: Support LDM
Priority: Normal
Status: On Hold