[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: errors on squall?



David Wojtowicz wrote:
> 
> Anne,
> 
>   Below are the last two days worth of "ldmadmin check" messages that
> I have sent to me each day.  The "breaking connection" messages
> appear to be due to timed ldmping's being done from remote hosts.
> The "non-zero status" messages seem to be due to pnga2area
> complaining about not being able to find the SATBAND and SATANNOT
> files even though I supply the path with cmd line options and it does
> the decoding anyway.
> 
> I do see similar "pq_sequence failed/ I/O error 5" messages for other
> hosts that we feed.
> 
> We do seem to be getting all of our data.
> 
> Could squall be overloaded, currently it has 19 downstream
> connections and 6 upstream ones?
> 
> Currently squall.atmos.uiuc.edu is running 88 percent idle
> load average: 1.07, 0.69, 0.56
> Running version number 5.1.3.
> LDM was restarted 2 time(s)
>         Last LDM restart at Nov 02 16:45:02
> 
> Critical LDM problems that need immediate attention:
> 
> Potential LDM Problems:
> Non-zero Status message occurred 1097 time(s).
>         Last one at:  Nov 03 00:11:01
> 'Breaking connection' message occurred 641 time(s).
>         Last one at:  Nov 02 22:06:49
>         For 128.117.13.119 it happened 165 time(s).
>         For 192.52.106.21 it happened 35 time(s).
>         For flood it happened 137 time(s).
>         For ldm it happened 105 time(s).
>         For motherlode it happened 62 time(s).
>         For thelma it happened 37 time(s).
>         For unidata it happened 100 time(s).
> 'RPC: Timed out' message occurred 9 time(s).
>         Last one at:  Nov 02 21:19:46
>         For aeolus(feed) it happened 5 time(s).
>         For climate(feed) it happened 2 time(s).
>         For mammatus(feed) it happened 1 time(s).
>         For zelgadis(feed) it happened 1 time(s).
> 'NULLPROC error' message occurred 45 time(s).
>         Last one at:  Nov 02 22:18:22
>         For aeolus.valpo.edu it happened 6 time(s).
>         For climate.geog.udel.edu it happened 9 time(s).
>         For mammatus.plymouth.edu it happened 8 time(s).
>         For zelgadis.geol.iastate.edu it happened 22 time(s).
> 
> Decoder LDM Problems:
> 
> LDM status report from the logs for the last 25 hours.
> 
> Currently squall.atmos.uiuc.edu is running 80 percent idle
> load average: 0.42, 0.59, 0.49
> Running version number 5.1.3.
> LDM was not restarted in the last 25 hours.
> 
> Critical LDM problems that need immediate attention:
> 
> Potential LDM Problems:
> Non-zero Status message occurred 462 time(s).
>         Last one at:  Nov 04 00:13:08
> 'Breaking connection' message occurred 296 time(s).
>         Last one at:  Nov 03 16:59:45
>         For flood it happened 55 time(s).
>         For ldm it happened 20 time(s).
>         For ldmarchive it happened 135 time(s).
>         For motherlode it happened 2 time(s).
>         For unidata it happened 84 time(s).
> 'RPC: Timed out' message occurred 3 time(s).
>         Last one at:  Nov 03 08:26:07
>         For climate(feed) it happened 3 time(s).
> 'NULLPROC error' message occurred 6 time(s).
>         Last one at:  Nov 04 00:09:15
>         For climate.geog.udel.edu it happened 3 time(s).
>         For mammatus.plymouth.edu it happened 1 time(s).
>         For zelgadis.geol.iastate.edu it happened 2 time(s).
> 
> Decoder LDM Problems:
> 

Hi David,

Thanks for the info.  Nothing jumps out at me.  

On motherlode, on 11/2 between 3:18Z and 5:11Z there were 194
disconnects and reconnects from squall.  They were like this one:

motherlode.ucar.edu% grep 29274 ldmd.log.4
Nov 02 03:47:47 motherlode.ucar.edu squall[29274]: Connection from
squall.atmos.uiuc.edu
Nov 02 03:47:47 motherlode.ucar.edu squall(feed)[29274]: Starting Up:
20011102033610.534 TS_ENDT {{HDS,  ".*"}}
Nov 02 03:47:47 motherlode.ucar.edu squall(feed)[29274]: topo: 
squall.atmos.uiuc.edu HDS
Nov 02 03:47:48 motherlode.ucar.edu squall(feed)[29274]: JUSA42 KWNO
020300: RPC: Remote system error (12)
Nov 02 03:47:48 motherlode.ucar.edu squall(feed)[29274]: pq_sequence
failed: I/O error (errno = 5)
Nov 02 03:47:48 motherlode.ucar.edu squall(feed)[29274]: Exiting
Nov 02 03:47:54 motherlode.ucar.edu rpc.ldmd[1696]: child 29274 exited
with status 1

72% of them were from connections for HDS, the rest were from the
connections for everything else.

Do you still have the portion of your log from that time period?  If so,
may I see it?  If not, I'll just wait and see if it happens again.

Anne
-- 
***************************************************
Anne Wilson                     UCAR Unidata Program            
address@hidden                 P.O. Box 3000
                                  Boulder, CO  80307
----------------------------------------------------
Unidata WWW server       http://www.unidata.ucar.edu/
****************************************************