[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 20011204: LDM Failover Issues



Hi Patrick, 

Weve been on blizzard for awhile now....

It loooks as if things are going OK now..?

Is this correct? Did you stop and re-start the ldm around 17Z?

We are thinking that ldmfail may fire up the ldm without allowing the ldm
to shut down gracefully..

Anything you did after you sent this message would be helpful..

Also FYI..your editor leaves a ^M at the end of your lines in
pqact.conf...this is not good.

Keep us posted..

Thanks,

-Jeff
____________________________                  _____________________
Jeff Weber                                    address@hidden
Unidata Support                               PH:303-497-8676 
NWS-COMET Case Study Library                  FX:303-497-8690
University Corp for Atmospheric Research      3300 Mitchell Ln
http://www.unidata.ucar.edu/staff/jweber      Boulder,Co 80307-3000
________________________________________      ______________________

On Wed, 2 Jan 2002, Patrick O'Reilly wrote:

> Hi Jeff -
> 
> Hope your holidays were good (if you're not still on 'em).  If so, I hope
> they are still good.  Anyway...
> 
> Stokes went kaput today, and the change I made to ldmfail didn't seem to
> erase the problem that data stopped being decoded.  Still came in great, but
> none showing up in the data directories.  You had me add my ldm paths right
> to the path variable in ldmfail.  My ldm home directory is /usr/local/ldm.
> Here's the section I changed in ldmfail:
> 
> 
> ############################################################################
> ###
> # END OF CONFIGURATION SECTION
> ############################################################################
> ###
> # identify ourselves and set up some extra stuff we will need
> $PROGNAME = "ldmfail" ;
> $lock_file = "/tmp/.ldmadmin.lck";
> 
> $primary = "missing" ;
> $failover = "missing" ;
> 
> # Dependencies:
> $ENV{ 'PATH' } =
> 
> "/bin:/sbin:/usr/local/bin:/usr/ucb:/usr/bsd:/usr/bin:/usr/local/ldm/bin:/us
> r/local/ldm/decoders:/usr/etc:/us
> r/ccs/bin:$ENV{ 'PATH' }" ;
> 
> 
> And here's a snippet from my ldmd.conf when things were awry:
> 
> 
> Jan 02 17:44:09 blizzard pqact[13027]: pipe_prodput: trying again
> Jan 02 17:44:09 blizzard pqact[13027]: pbuf_flush (4) write: Broken pipe
> Jan 02 17:44:09 blizzard pqact[13027]:
> pipe_dbufput: -closedecoders/dcgrib2-ddata/gempak/logs/dcgrib_radar.log-eGEM
> TB
> L=/export/home/gem
> Jan 02 17:44:09 blizzard pqact[13027]: child 13380 terminated by signal 9
> Jan 02 17:44:09 blizzard pqact[13027]: child 13379 terminated by signal 9
> Jan 02 17:44:21 blizzard pqact[13027]: pbuf_flush (4) write: Broken pipe
> Jan 02 17:44:21 blizzard pqact[13027]:
> pipe_dbufput: -closedecoders/dcgrib2-ddata/gempak/logs/dcgrib_radar.log-eGEM
> TB
> L=/export/home/gem
> Jan 02 17:44:21 blizzard pqact[13027]: pipe_prodput: trying again
> Jan 02 17:44:21 blizzard pqact[13027]: pbuf_flush (4) write: Broken pipe
> Jan 02 17:44:21 blizzard pqact[13027]:
> pipe_dbufput: -closedecoders/dcgrib2-ddata/gempak/logs/dcgrib_radar.log-eGEM
> TB
> L=/export/home/gem
> Jan 02 17:44:21 blizzard pqact[13027]: child 13382 terminated by signal 9
> Jan 02 17:44:21 blizzard pqact[13027]: child 13381 terminated by signal 9
> Jan 02 17:44:23 blizzard pqact[13027]: pbuf_flush (4) write: Broken pipe
> Jan 02 17:44:23 blizzard pqact[13027]:
> pipe_dbufput: -closedecoders/dcgrib2-ddata/gempak/logs/dcgrib_radar.log-eGEM
> TB
> L=/export/home/gem
> Jan 02 17:44:23 blizzard pqact[13027]: pipe_prodput: trying again
> Jan 02 17:44:23 blizzard pqact[13027]: pbuf_flush (4) write: Broken pipe
> Jan 02 17:44:23 blizzard pqact[13027]:
> pipe_dbufput: -closedecoders/dcgrib2-ddata/gempak/logs/dcgrib_radar.log-eGEM
> TB
> 
> 
> Any other suggestions would be great.  I think you mentioned putting the
> path to the decoders in the cron, but didn't give a specific example.  If
> you think this would be better, or have other fixes, let me know.  Once
> again, thank you from Icy Cornville (Iowa).
> 
> Patrick
> 
> ----- Original Message -----
> From: "Jeff Weber" <address@hidden>
> To: "Patrick O'Reilly" <address@hidden>
> Cc: "ldm-support" <address@hidden>
> Sent: Tuesday, December 04, 2001 1:49 PM
> Subject: Re: 20011204: LDM Failover Issues
> 
> 
> > Hello Patrick,
> >
> > The issue here, I believe, is an environment issue.
> >
> > ldmfail is a perl script, that will get executed via a borne shell.
> >
> > I suspect you are running in a c-shell (by the sea-shore).
> >
> > The borne shell will not grab the attributes(paths) that are in your
> > c-shell.
> >
> > Soooo, we can either place the path for the decoders in the cron (set
> > path, blah/blah/blah run ldmfail or you can "hack" your ldmfail program to
> > include the paths to your decoders.
> >
> > Check the "Dependencies"
> >
> > i.e. > from motherlode
> >
> >
> ############################################################################
> ##
> > # END OF CONFIGURATION SECTION
> >
> ############################################################################
> ###
> > # identify ourselves and set up some extra stuff we will need
> > $PROGNAME = "ldmfail" ;
> > $lock_file = "/tmp/.ldmadmin.lck";
> >
> > $primary = "missing" ;
> > $failover = "missing" ;
> >
> > # Dependencies:
> > $ENV{ 'PATH' } =
> >
> ".:/usr/ccs/bin:/opt/SUNWspro/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/b
> in:/opt/gnu/bin:/usr/openwin/bin:/opt/ldm/bi
> > n:/opt/ldm/util:/opt/ldm/decoders" ;
> >
> >
> > and if your install is the same as motherlode this should work.
> >
> > If your ldm dir tree is different, then the appropriate changes would need
> > to be made.
> >
> >
> > on lenny:
> >
> ############################################################################
> ###
> > # END OF CONFIGURATION SECTION
> >
> ############################################################################
> ###
> > # identify ourselves and set up some extra stuff we will need
> > $PROGNAME = "ldmfail" ;
> > $lock_file = "/tmp/.ldmadmin.lck";
> >
> > $primary = "missing" ;
> > $failover = "missing" ;
> >
> > # Dependencies:
> > $ENV{ 'PATH' } =
> >
> ".:/bin:/usr/bin:/opt/SUNWspro/bin:/usr/ccs/bin:/usr/local/ldm/bin:/usr/loca
> l/ldm/decoders:/usr/loc
> > al/bin:/usr/etc:/usr/ucb:/usr/local/gnu/bin" ;
> >
> >
> > notice on lenny:/usr/local/ldm/decoders
> >
> > and on motherlode:/opt/ldm/decoders
> >
> >
> > We are working on a more graceful ldmfail program, but that will be
> > months.
> >
> >
> > Hope this sheds some light on the subject.
> >
> > FYI...did not get your attachement.
> >
> > Thank you,
> >
> > -Jeff
> > ____________________________                  _____________________
> > Jeff Weber                                    address@hidden
> > Unidata Support                               PH:303-497-8676
> > NWS-COMET Case Study Library                  FX:303-497-8690
> > University Corp for Atmospheric Research      3300 Mitchell Ln
> > http://www.unidata.ucar.edu/staff/jweber      Boulder,Co 80307-3000
> > ________________________________________      ______________________
> >
> > On Tue, 4 Dec 2001, Unidata Support wrote:
> >
> > >
> > > ------- Forwarded Message
> > >
> > > >To: Unidata Support <address@hidden>
> > > >From: "Patrick O'Reilly" <address@hidden>
> > > >Subject: LDM Failover Issues
> > > >Organization: UCAR/Unidata
> > > >Keywords: 200112041640.fB4GeeN16636
> > >
> > > Hi there again!
> > >
> > > I have found that when the LDM fails over, whether it is to the failover
> > > host or back to the primary host, my hard drive fills up with errors, as
> > > data is no longer being decoded due to broken pipes, write errors, etc.
> > > I have attached a clip from a 13MB ldmd.log file to illustrate these
> > > messages.  I have found a support email that mentions this problem
> > > without telling how to fix it
> > > (http://www.unidata.ucar.edu/glimpse/ldm/3301).  The fix actually
> > > mentioned in the support email, I guess, is to comment out ldmfail in
> > > cron, if the primary host is reliable.  Have there been other reports of
> > > this with ldmfail and are there fixes?  Thanks!
> > >
> > > Patrick
> > >
> > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > Patrick O'Reilly             Support Scientist
> > > The STORM Project            address@hidden
> > > 208 Latham Hall              ph: 319-273-3789
> > > University of Northern Iowa
> > > Cedar Falls, IA 50614
> > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > >
> > > ------- End of Forwarded Message
> > >
> > >
> >
> 
>