[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[IDD #BKI-822215]: LDM feed sites

Hi Adam,

> My queue is set @ 2GB.

OK.  This may or may not be sufficient.  It depends on how you are
trying to process the data out of the LDM queue.

> The machine is a Dual Quad 2.66Ghz Zeon w/4Gig ram.  It has 2 15K SAS
> RAID1(hardware) for the OS/SWAP/Product queue and 6 10K 750Gig
> RAID0(hardware) Near-lineSAS (SATA drives with SAS bus) for data storage.  I
> would think this would support the product load....

Yes, _if_ the processing is being done efficiently.  For instance, if
you have a single, monolithic pattern-action file of actions that a
single instance of 'pqact' is trying to process for all data, then
I would venture that even your muscle system could miss processing some
of the data products.  The reason for this is that all entries in
a pattern-action file are compared against each product being processed
by the 'pqact' instance using that pattern-action file before the next
product is processed.  The listing I sent in the previous email shows
that your machine should be trying to process 187700 products per hour on
average.  Our experience is that it is fairly easy to miss processing
of some products if some tuning is not done.

> I'm not getting any errors in the ldmd.log file that would show a queue
> backup...

Very good.  This is an important, but not definitive, piece of information.
If the system was overloaded, you would get warning messages showing
that it is taking several/many seconds to process products.  If you are
not seeing that kind of message, it is likely that your machine is
performing nicely.

You can see the residency time of the oldest product in your LDM queue
using the LDM utility 'pqmon':

<as 'ldm'>
pqmon -vl-

The last value in the output line is the age of the oldest product
in seconds in your LDM queue (where the age is the difference in
current clock time and when that product was first added to  your

The residency time for the oldest gives a good measure for exactly
how long your 'pqact(s)' invocations have to process data out of
the queue and onto disk.

> However...I switched the CONDUIT feed to idd.cise-nsf.gov...and I am still
> seeing latencies on all the feeds.

I see that your CONDUIT latencies are MUCH lower than they were before:


This would seem to indicate that some hop in the connection from tornado
to idd.unidata.ucar.edu is introducing the latencies for those feeds
that are being requested from idd.unidata.ucar.edu.

> I have also been getting this message since I split the feeds:
> Jun 22 20:07:44 tornado idd.unidata.ucar.edu[28434] ERROR: readtcp(): EOF on 
> socket 4
> Jun 22 20:07:44 tornado idd.unidata.ucar.edu[28434] ERROR: one_svc_run(): RPC 
> layer closed connection
> Jun 22 20:07:44 tornado idd.unidata.ucar.edu[28434] ERROR: Disconnecting due 
> to LDM failure; Connection to upstream LDM closed
> Jun 22 20:07:44 tornado idd.unidata.ucar.edu[28434] NOTE: LDM-6 desired 
> product-class: 20090622190744.255 TS_ENDT {{UNIDATA,  ".*"},{NONE, 
> "SIG=c8e3a810e31760cfa7c611dfe27ffd90"}}
> Jun 22 20:07:44 tornado idd.unidata.ucar.edu[28434] NOTE: Upstream LDM-6 on 
> idd.unidata.ucar.edu is willing to be a primary feeder
> But only from the UNIDATA feed....

This is certainly weird.  I am CCing Steve Emmerson (our LDM developer/guru)
to see what he has to say about this situation.


- it might be interesting to take the feed(s) that are currently
  showing the highest latency and change them to be fed from
  idd.cise-nsf.gov or some other IDD relay node (e.g., pavan.srcc.lsu.edu)


- did you ever verify that you are allowed to feed from LSU:

<as 'ldm'>
notifyme -vl- -f ANY -h pavan.srcc.lsu.edu


Unidata User Support                                    UCAR Unidata Program
(303) 497-8642                                                 P.O. Box 3000
address@hidden                                   Boulder, CO 80307
Unidata HomePage                       http://www.unidata.ucar.edu

Ticket Details
Ticket ID: BKI-822215
Department: Support IDD
Priority: Normal
Status: Closed