[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[LDM #IRP-426248]: More problems with dropping files.



Mark,

> I'm still trying to find the problem with files being 'dropped' or not
> being sent, or whatever it is this thing is doing.  Only one server is
> having the problem, but the symptoms are all different.  We've been
> seeing a problem with NCDC missing files at around 4:30 UTC each day,
> usually 80 to 100.  So, I've set up several test boxes in my network to
> see if I can narrow down the problem.  I have 3 servers involved in this
> testing.  One (LDM1) is the server feeding the ER and SR and is the
> system having the problem.  Another one (GreatSmokey) is a Fedora 7 box
> running behind our Juniper firewall (and not in the same subnet as LDM1)
> and is pulling only ER and SR data from LDM1, and LDMTest (running
> Fedora Core 6) which is on our network but in a completely different
> subnet from either LDM1 or GS.  I've setup pqact to pull the files to
> ~/nexradII/tmp directories on each server (and stored in there by site
> name eg nexradII/tmp/KABX/) and have a script that runs each night
> around 12:30 Eastern (4:30 UTC) that finds all files from the previous
> 24 hours, lists them and then scans the file names for missing files (by
> number).
> 
> Here's where it gets interesting:  LDM1 and GS have been running this
> setup for 5 days or so now and in each case I'm getting about 40-50
> missing files.  However, nearly all the files missing are not in the
> 4:30 UTC range as NCDC's problem has been, but are missing all over the
> 24 hour period being searched.  Also, neither GS nor LDM1 are missing
> the same number of files (or the same files even) in the scan.
> 
> I've not mentioned LDMTest because it is even stranger.  I've only been
> able to run it one day so far, but in that one day (yesterday) I missed
> 0 files from LDM1.  The pqact.conf file is identical in each case, the
> script used is identical, I've made them as close to identical as I can.
> I'm continuing the run on LDMTest over the next few days, but I cannot
> find any sort of pattern to be able to make any attempts at fixing it.
> LDM1 has ssh access for you already and I can do the same on the 2 test
> boxes quickly if necessary.
> 
> Do you guys have any more ideas on what might be causing this strange
> behaviour?

What's LDM1's fully-qualified hostname?

What's the LDM user's username and password?

You mentioned that GreatSmokey and LDM1 don't miss the same files.
If GreatSmokey gets its data from LDM1, then it should miss every
file that LDM1 misses.  Does it?

> Mark Haney
> 
> Sr. Systems Administrator
> 
> ERC Broadband

Regards,
Steve Emmerson

Ticket Details
===================
Ticket ID: IRP-426248
Department: Support LDM
Priority: High
Status: On Hold