[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20040611: mysterious emo reboot and CRAFT volumes (please read!)



>From: Steve Chiswell <address@hidden>
>Organization: UCAR/Unidata
>Keywords: 200406111604.i5BG4Rp9025520 IDD CRAFT LDM pqbinstats rtstats

Chiz, Steve, Mike, Jeff, and John:

Chiz noted:
>I found the LDM not running on EMO this morning since about 11:30 last night.
>The machine was rebooted,

-- ouput from /var/log/messages.3.gz --

 ...

Jun 10 16:59:26 emo portmap[36910]: connect from 128.117.140.27 to 
getport(ldmd): request from unauthorized host
Jun 10 16:59:26 emo portmap[36911]: connect from 128.117.140.27 to 
getport(ldmd): request from unauthorized host
Jun 10 23:34:14 emo /kernel: Copyright (c) 1992-2002 The FreeBSD Project.
Jun 10 23:34:14 emo /kernel: Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 
1991, 1992, 1993, 1994

 ...

>and I see someone logging on from imogene after reboot

-- output from last --

 ...

chiz             ttyp0    laraine          Fri Jun 11 08:16 - 08:20  (00:03)
ldm              ttyp0    imogene          Thu Jun 10 23:42 - 23:50  (00:08)
reboot           ~                         Thu Jun 10 23:35
ldm              ttyp2    flip             Thu Jun 10 16:59 - 17:02  (00:02)
ldm              ttyp3    flip             Thu Jun 10 15:54 - 16:59  (01:04)

 ...

>but the LDM was not up this morning.

I noticed the ldmping failure when I was reading email on this
morning.  I logged onto emo and saw that the LDM had been started
shortly before I got on, so I knew you were taking care of the
restart.

>So, I remade the queue and restarted.  Looks like the nfs is hosed however.

John, can you look at this?

re: change to stats module called by pqbinstats and rtstats
>Atm is up and fine, so I don't think the restart of emo at 5 last night for
>the rtstats and pqbinstats changes was involved.

To fill folks in, Chiz found a problem in code used by both rtstats and
pqbinstats yesterday afternoon.  The problem was causing the stats
programs to undercount the volume of data in feeds when the number of
pieces in the feed exceeded 96.  This apparently only happens for CRAFT
data (now that there are more than 96 stations reporting), and is
especially bad when a machine is setup for redundant CRAFT feeds (like
emo and atm).  This "bug" (design flaw) has been causing CRAFT volume
plots to underestimate the volume in the feed ever since the 97th radar
was added to the feed.  Take a look at CRAFT volume on atm to see the
dramatic jump in volume being reported right after Chiz "fixed" the bug
(the mods may not be done), 23Z - 5Z:

http://my.unidata.ucar.edu/cgi-bin/rtstats/iddstats_vol_nc?CRAFT+atm.geo.nsf.gov

Chiz installed the fix on both emo (redundantly feeding from Purdue
and thelma) and atm (redundantly feeding from Max and emo).

NB: there was no problem in getting the data.  The only problem was in
counting the amount of data ingested.

One last comment:  The CRAFT feed is delivering over 1 GB per hour
during peak times.  Given that the CRAFT peaks last for more
consecutive hours than CONDUIT, I will assert that CRAFT may well have
take over first place for the datastream with the most volume!
And, it is slated to grow a BUNCH: Yowza!!

Cheers,

Tom
--
+-----------------------------------------------------------------------------+
* Tom Yoksas                                             UCAR Unidata Program *
* (303) 497-8642 (last resort)                                  P.O. Box 3000 *
* address@hidden                                   Boulder, CO 80307 *
* Unidata WWW Service                             http://www.unidata.ucar.edu/*
+-----------------------------------------------------------------------------+