[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: LDM 5.1.4 - stops transfering from 1 site, while talking to others?



Anne Wilson wrote:
Joe Van Andel wrote:

With LDM 5.1.4 on Redhat 7.2, we've seen instances where data stops
flowing from 1 site, while still flowing from other sites.

For example, yesterday, syrah.atd.ucar.edu stopped getting our NEXRD2
feed from iita.rap.ucar.edu, even though the network was up and
iita.rap.ucar.edu was still up.  At the same time, syrah.atd.ucar.edu
was still receiving S-Pol gridded radar data, via linus.atd.ucar.edu

The data feed resumed once we restarted ldm on syrah.

Clearly, we don't want to have to monitor whether our data is flowing
and manually restart ldm if the data isn't arriving.

Has anyone else seen network error conditions that can only be resolved
by restarting ldm?

--
Joe VanAndel
National Center for Atmospheric Research
http://www.atd.ucar.edu/~vanandel/
Internet: address@hidden


Hi Joe,

Was there anything in your log or iita's log when this occurred?  It
would show if there was an actual disconnect.  I'm wondering if this
might be related to the work the UCAR network people are doing these
days in upgrading routers, although I don't know of anything that
happened specifically yesterday...

I don't have access to iita's log - and didn't see anything in my log.


Has this happened more than once?

Yes.


Regarding RH 7.2 vs. 7.3, perhaps Gilbert is right.  We've not
experienced this problem on our own 7.2 machine, but we haven't pushed
it much either.  We did, however, have another site that had trouble
with 7.2 that was fixed with going to 7.3.  Are you considering
upgrading the OS?

Sure, but since we're in the middle of the project, and syrah.atd.ucar.edu is a key machine, nobody will be very enthusiastic about taking the machine down to upgrade it.


Also, when this occurs you could run 'netstat | grep iita' on syrah to
see what state syrah thinks the connection is in.

Good idea, we'll try this if it happens again.


I'm sorry that I can't offer much help on this one.  A band-aid would be
a script that tracked input files (if you're actually filing the data)
that sends an email if there's a gap.  Let me know if you'd like
something like that, although that merely shifts the problem from
monitoring a data stream to monitoring email.

If this happens again and I'm around I would be happy to log in look
around if you would like me to.

Thanks, I appreciate the offer.


Anne



--
Joe VanAndel    
National Center for Atmospheric Research
http://www.atd.ucar.edu/~vanandel/
Internet: address@hidden