[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[LDM #ZOC-298338]: Missing data problems



Mark,

> Steve, here are the last few days of logs from syslog, there is a lot of
> extraneous info in here, but all LDM events are logged.  LDM1 feeds the
> Eastern and Southern Regions to our customers and LDM2 feeds CR and WR.

The log files indicate that connections are being broken fairly
routinely.  You might not be as well connected to the Internet
as you might wish.

> Here's the story so far.  About 2.5 to 3 months ago we began seeing
> files from the feeds being dropped at some point.  Alan Hall at NOAA
> pointed this out as he has a backup feed out west and correlates that
> feed with ours.  In effect he's getting only partial volume scans from
> us, parts of the entire Volume Scan are not there (some .bz2 files are
> missing).
> 
> up to that point, nothing had changed on either server, they had been
> running flawlessly for almost a year.  The LDM version at the time we
> began seeing the missing files problem was 6.4.1 on both servers running
> Fedora Core 4.  We haven't been updating the systems since it was
> working fine and they are firewalled with iptables.
> 
> As of now, they are still running FC4, but I've upgraded LDM to the
> latest (v6.6.3) and rebooted the servers for good measure.  We are still
> seeing the missing files problem and Alan tells me the problem is worse
> around midnight UTC (4am here in NC).
> 
> Alan reports that the number of missing files hasn't changed since the
> upgrade to 6.6.3, so I'm not certain there's not some wierd hardware
> issue causing the problem, however, I would like to eliminate as much as
> I can any issues with LDM if I can.
> 
> The servers are not used for anything but LDM, there's plenty of storage
> for the files, the CPU is not under load and there is more than 2GB of
> RAM free on each of the servers.  The network is not overburdened and
> the NICs are channel bonded at 100Mbps/full duplex, so I don't see that
> as a bottle neck.
> 
> Th reason why I'm summarizing is to get Lou up to speed on this, since I
> will be on vacation next week and he will be the one debugging this with
> you.  Thanks for all the help and hopefully we can find an answer to
> this soon.

The log files also contain a lot of messages like the following:

NOTE: Data-product with signature a01758212afea174f7960cf021ca068d wasn't found 
in product-queue

This indicates that a reconnection by a downstream LDM was unable to
start where it left-off.  This could be because the minimum residency
time of a data-product in the product-queue is too small.

It might take a while to diagnose the problem.  The diagnosis would
go much faster if I could log onto the LDM host in question as the
LDM user.  May I?


Regards,
Steve Emmerson

Ticket Details
===================
Ticket ID: ZOC-298338
Department: Support LDM
Priority: High
Status: On Hold