[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 20030519: NLDN inject machine problems after upgrading to LDM-6?



Hi Tom,
     Yes there have been some problems on striker since
upgrading it to ldm-6.0.10. It appears to be related
to the ldm change... I'm tempted to go back to using
5.0.8 for now, but, would really rather be using the
current generation ldm. 
I just got back into town, so Kevin Tyle has been looking
into this, and I'll let him finish up with it. I was
away for 12 days, and my understanding is that the ldm
on striker had to be restarted twice during this time.
Likely Kevin will send you a more detailed response.
(Kevin, please do)
Thanks for your offer to help troubleshoot this problem.
Since you offered, we will likely take you up on this sooner
rather than later.
David
> 
> Hi David,
> 
> This morning, I was made aware of intermittant (?) problems on the NLDN
> IDD injection machine, striker.atmos.albany.edu, by Tom McDermott of
> SUNY Brockport:
> 
>   >On Mon, 19 May 2003, Unidata Support wrote:
>   >
>   >> We have had no reports from SUNY Albany about problems running the
>   >> LDM-6.0.1[01] on striker, so your report of a regular crash on it is
>   >> news to us.
>   >
>   >Well you can see evidence of it by examining the latencies for NLDN at
>   >Steve's rtstats page for pretty much any host.  To take one at random,
>   >'sundog.atmos.ucla.edu', they go off the chart starting around 20Z
>   >Saturday until around 1153Z today.  Albany is aware of the problem.  Here
>   >is what I received from David Knight regarding an earlier episode:
>   >
>   
> >-----------------------------------------------------------------------------
>   >From: David Knight <address@hidden>
>   >Date: Tue, 22 Apr 2003 12:28:19 +0000 (GMT)
>   >To: Tom McDermott <address@hidden>
>   >Cc: address@hidden
>   >Subject: Re: Unable to Connect to Striker
>   >
>   >Tom,
>   >     OK thanks. Looks like there is a problem with striker.
>   >I've rebooted it, and, you should be able to connect again.
>   >
>   >Kevin,
>   >Apr 22 11:45:50 striker rpc.ldmd[5138]: accept: Too many open files
>   >Looks like striker hit the limit on the number of files it can
>   >have open... Not sure exactly why yet...
>   >
>   >DAvid
>   >
>   >> Hi,
>   >>
>   >> vortex.esc.brockport.edu has been unable to connect to striker since
>   >1724Z
>   >> yesterday.
>   >>
>   >> Tom
> 
> Review of real time statistics pages from sites receiving NLDN data
> from striker shows that there was a data outage from around 19Z on the
> 17th until around 12Z today.
> 
> Did you see the same problem of "Too many open files" on striker
> today?  If so, has the soft limit for number of open files been upped
> from its default (e.g., on Solaris the soft limit is 128; the hard
> limit is 1024)?  If this has been increased, significantly (like to the
> max), is it possible that some other process(es) are opening files and
> not closing them properly (e.g., the process that creates NLDN products
> and injects them into the LDM queue)?
> 
> Is there anything we can do to help you troubleshoot this problem?  If
> yes, please let us know.
> 
> Tom Yoksas
> **************************************************************************** <
> Unidata User Support                                    UCAR Unidata Program <
> (303)497-8643                                                  P.O. Box 3000 <
> address@hidden                                   Boulder, CO 80307 <
> ---------------------------------------------------------------------------- <
> Unidata WWW Service              http://my.unidata.ucar.edu/content/support  <
> **************************************************************************** <
>