[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[CONDUIT #GXA-551280]: Update on UW-Madison AOS Conduit/0.25 GFS



Hi Pete,

re:
> Here's the output of pqmon from a few times just now, in the middle of
> the 12 UTC GFS run coming in:
> 
> [ldm@idd ~]$ pqmon
> Aug 04 15:59:10 pqmon NOTE: Starting Up (8657)
> Aug 04 15:59:10 pqmon NOTE: nprods nfree  nempty      nbytes maxprods maxfree 
>  minempty    maxext  age
> Aug 04 15:59:10 pqmon NOTE: 241606   834  228148 23323422632   470587    7344 
>         0   3057664 1897
> Aug 04 15:59:10 pqmon NOTE: Exiting
> 
> [ldm@idd ~]$ pqmon
> Aug 04 16:06:05 pqmon NOTE: Starting Up (9239)
> Aug 04 16:06:05 pqmon NOTE: nprods nfree  nempty      nbytes maxprods maxfree 
>  minempty    maxext  age
> Aug 04 16:06:05 pqmon NOTE: 235049    54  235485 22884777296   470587    7344 
>         0  36648384 1573
> Aug 04 16:06:05 pqmon NOTE: Exiting
> 
> [ldm@idd ~]$ pqmon
> Aug 04 16:08:05 pqmon NOTE: Starting Up (9429)
> Aug 04 16:08:05 pqmon NOTE: nprods nfree  nempty      nbytes maxprods maxfree 
>  minempty    maxext  age
> Aug 04 16:08:05 pqmon NOTE: 244621    12  225955 23999634152   470587    7344 
>         0    286536 1637
> Aug 04 16:08:05 pqmon NOTE: Exiting

Very good.  Thanks for the spot check.

re:
> I've just set up ldm metrics, so we can take a look at that in the next
> day or two.

Sounds good.

re:
> I'll see about trying to get some bandwidth plots, I think I
> can do that with our interface to the switch it is connected to.

Our monitoring of the outbound bandwidth for the real server backend nodes
of our relay cluster, idd.unidata.ucar.edu, is what alerted us our
ability to service the number of existing being maxed out - the volumes
indicated hit a ceiling on a couple of nodes.  The net effect of this
is the same as is seen when "packet shaping" (artificial bandwidth
limiting) is in effect, and this, in turn, meant that some downstreams
were not getting all of the data that they were REQUESTing.  We found the
same sort of maxing out of a connection from the accumulator frontends
of our cluster to the real server backends.  This occurred when we
spun up our backup relay cluster, idd2.unidata.ucar.edu, as it doubled
the volume being sent through the accumulators Gbps Ethernet port.  Of
course, this would have not been a problem if our cluster nodes had
10 Gbps Ethernet interfaces.  We considered purchasing 10 Gbps Ethernet
cards for our existing machines, but we considered that this would be
a waste of money since the problem will go away when we refresh the
cluster hardware.

Cheers,

Tom
--
****************************************************************************
Unidata User Support                                    UCAR Unidata Program
(303) 497-8642                                                 P.O. Box 3000
address@hidden                                   Boulder, CO 80307
----------------------------------------------------------------------------
Unidata HomePage                       http://www.unidata.ucar.edu
****************************************************************************


Ticket Details
===================
Ticket ID: GXA-551280
Department: Support CONDUIT
Priority: Normal
Status: Closed