[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 20010416: LDM failure (Connection reset by peer)



Hi Tom, 

Sorry to hear of your LDM woes.

I think it is wise for you to upgrade to 5.1.2

Hopefully that will address the failure issue.

Did the failures begin when you began requesting the full suite of data
from typhoon?

While it may not be disk space per se, the queue may have been corrupted
during your up/down time.

I guess we can watch it after you upgrade to 5.1.2 and diagnose it better
then.

Cheers,

-Jeff
____________________________                  _____________________
Jeff Weber                                    address@hidden
Unidata Support                               PH:303-497-8676 
NWS-COMET Case Study Library                  FX:303-497-8690
University Corp for Atmospheric Research      3300 Mitchell Ln
http://www.unidata.ucar.edu/staff/jweber      Boulder,Co 80307-3000
________________________________________      ______________________

On Mon, 16 Apr 2001, Tom Heinrichs wrote:

> Hi Steve, 
> 
> I don't thinks it's disk space--I've got about 45GB available on that RAID
> partition.
> 
> I'm actively beating on the end user issues with the programmer who is
> working with the data. It looks like we'll have it worked out in the next
> few days. At that point we can change to 5.1.2, recreate the queue and see
> how it runs. 
> 
> Because I've had two (anomolous) failures recently, I wanted to run this
> past you all in support. If the failures persist after the upgrade to
> 5.1.2, I'll contact you again for additional troublshooting advice.
> 
> Thanks for your help,
> Tom
> 
> 
> On Mon, 16 Apr 2001, Unidata Support wrote:
> 
> > 
> > 
> > Tom,
> > 
> > One possible thing to check is the amount of disk space you have available
> > on the machine you have created your product queue on.
> > 
> > Prior to ldm 5.1.2, when a queue was created, it did not physically
> > zero out the entire memory mapped file (which is a slow process).
> > As a result, if you try to create a 300MB queue, and only have 200MB
> > available, you will not get an error. The "ls -l" will appear to
> > show the entire file size, but the file system has not alloocated the space 
> > yet.
> > If the LDM is running when the queue space is needed past what is available,
> > your LDM will die. Since you are running LDM 5.0.8, this may be a problem.
> > 
> > Steve Chiswell
> > Unidata User Support
> > 
> > 
> > 
> > >From: Tom Heinrichs <address@hidden>
> > >Organization: UCAR/Unidata
> > >Keywords: 200104161906.f3GJ62L10350
> > 
> > >Hello all,
> > >
> > >I had the following problem over the weekend and lost my LDM feed:
> > >
> > >tail of ldmd.log:
> > >
> > >Apr 13 22:36:33 inisas02 pqexpire[26056]: > Recycled  31840.309 kb/hr (
> > >6573.200 prods per hour)
> > >Apr 13 22:41:34 inisas02 pqexpire[26056]: > Recycled  31876.624 kb/hr (
> > >6578.975 prods per hour)
> > >Apr 13 22:43:58 inisas02 typhoon[26059]: Connection reset by peer
> > >Apr 13 22:44:28 inisas02 typhoon[26059]: run_requester: 20010413224343.063
> > >TS_ENDT {{UNIDATA,  ".*"}}
> > >Apr 13 22:44:34 inisas02 rpc.ldmd[26055]: child 26059 terminated by signal
> > >11
> > >Apr 13 22:44:34 inisas02 pqact[26058]: Interrupt
> > >Apr 13 22:44:34 inisas02 pqbinstats[26057]: Interrupt
> > >Apr 13 22:44:34 inisas02 pqexpire[26056]: Interrupt
> > >Apr 13 22:44:34 inisas02 rpc.ldmd[26055]: Interrupt
> > >Apr 13 22:45:45 inisas02 rpc.ldmd[26055]: Terminating process group
> > >
> > >I'm running 5.0.8 at the current momement (although I'll be moving to
> > >back to 5.1.2 later this week once another issue with endusers is worked
> > >out).
> > >
> > >There is also a core dump in ~ldm dated April 13 22:44Z
> > >
> > >As part of my upgrade, I changed ldmd.conf to
> > >
> > >request UNIDATA
> > >        ".*"
> > >                typhoon.atmos.ucla.edu
> > >from:
> > >
> > >request HDS
> > >        ".*"
> > >                typhoon.atmos.ucla.edu
> > >
> > >LDM has run virtually flawlessly for the past year since I installed
> > >it. I've had this failure twice in a week now.
> > >
> > >Any ideas?
> > >
> > >Thanks,
> > >Tom
> > >
> > >
> > 
> > 
> 
>