[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[IDD #JRM-232824]: Loss of link to Unidata server



Hi James,

re:
> I left the office on a errand after typing out the service request, but
> data appears to have resumed just after I left.

Super!

re:
> Our Department sysadmin
> says there may be something wrong with one of the network cards on
> Indra (main weather server for us). He's going to check into that later
> on, and he'll correct the DNS reverse lookup problem on Horus (back up
> weather server). Thanks for pointing out the problem on Horus.

No worries.

re:
> Looking at the ldmd.conf file, I'm seeing  this for accessing 
> aeolus.ucsd.edu(failover site):
> Aug 11 13:58:20 indra aeolus.ucsd.edu[29917] NOTE: LDM-6 desired 
> product-class: 20140811205455.833 TS_ENDT {{GEM|FNMOC,  ".*"}}
> Aug 11 13:58:20 indra aeolus.ucsd.edu[29917] ERROR: Disconnecting due to LDM 
> failure; Upstream LDM didn't reply to FEEDME request; RPC:
> Authentication error; why = (authentication error 5)

Hmm...

re:
> This occurs even though on a few lines above, this appears:
> Aug 11 13:57:57 indra aeolus.ucsd.edu[29921] NOTE: Upstream LDM-6 on
> aeolus.ucsd.edu is willing to be an alternate feeder

This sounds like there could be a problem on aeolus.ucsd.edu.

re:
> It's not an issue at the moment (idd.unidata.ucar.edu is sending data
> without problems), but could this disconnection be related to Indra's
> possible, network card issue?

Possibly.  I will have to check the log files on aeolus.ucsd.edu to
see what it is reporting.

re:
> Additionally, is Unidata still handling general maintenance of
> Aeolus.ucsd.edu?

LDM maintenance yes; hardware/network/OS maintenance, no.

re:
> Several years ago, when I needed to contact UCSD about
> their weather server,  Jeff Weber(Unidata) was handling things remotely.

Yes, the folks at UCSD were happy to let us look after the LDM.  We
offered to do this to keep a west coast relay working smoothly.

I just logged onto 'aeolus' and found LOTS of the following warnings
in its ~ldm/log/ldmd.log file:

Aug 11 15:08:20 aeolus ldmd[7187] WARN: Couldn't resolve "128.97.77.43" to a 
hostname in 16.0987 seconds
Aug 11 15:08:20 aeolus ldmd[7185] NOTE: Denying connection from "128.97.77.43" 
because not allowed
Aug 11 15:08:20 aeolus ldmd[7187] NOTE: Denying connection from "128.97.77.43" 
because not allowed
Aug 11 15:08:20 aeolus ldmd[7186] WARN: Couldn't resolve "128.97.77.43" to a 
hostname in 16.1071 seconds

128.97.77.43 should be horus.atmos.ucla.edu, but the DNS, if fixed, has 
apparently
not propagate to ucsd.edu yet.  As soon as the DNS problem is cleared, feed 
REQUESTs
show again be ALLOWed.

As far as REQUESTs from 'indra' goes, I am seeing lots of the
following:

Aug 11 15:12:12 aeolus indra.atmos.ucla.edu(feed)[8118] NOTE: topo:  
indra.atmos.ucla.edu {{NGRID|NIMAGE, (.*)}}
Aug 11 15:12:32 aeolus indra.atmos.ucla.edu(feed)[8089] ERROR: Couldn't flush 
connection; nullproc_6() failure to indra.atmos.ucla.edu: RPC: Unable to 
receive; errno = Connection reset by peer

This indicates a network connection problem somewhere between and including
aeolus.ucsd.edu and indra.atmos.ucla.edu.  This could be caused by a
bad network interface on 'indra'.

Cheers,

Tom
--
****************************************************************************
Unidata User Support                                    UCAR Unidata Program
(303) 497-8642                                                 P.O. Box 3000
address@hidden                                   Boulder, CO 80307
----------------------------------------------------------------------------
Unidata HomePage                       http://www.unidata.ucar.edu
****************************************************************************


Ticket Details
===================
Ticket ID: JRM-232824
Department: Support IDD
Priority: Normal
Status: Closed