[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20030522: 20030522: LDM tune-up request UCLA



James,

Steve Emmerson and I looked at your LDM system (sundog) Friday.
What we found was that even though data is arriving to your
system in a timely manner, the pqact process which is processing
the data is falling way behind. In fact, when we looked at the process,
pqact was running over 1 hour behind and data was being scoured out of your
queue before pqact could act on the data. I believe this is the problem
with you having missing data in your decoded files. Actually, you probably
could notice in plotting a surface map for example, that the current hour's
map would always be sparse, and you would have to look at last hours data
to see most observations.

We believe that the problem is related to your disk IO. In particular, you
are receiving every NNEXRAD product and FILE 'ing all the data. Your Sun
workstation may be able to handle this type of IO, but your
Linux box is waiting on its own disks.

We made the following modifications to your pqact.conf to test this
bottle neck theory:

1) Changed filing of all NNEXRAD products to just the 4 radars in southern
California.

2) For all other radars, storing only base reflectivity N0R.

On Friday afternoon, we observed that as soon as the amout of NNEXRAD data 
being 
attempted to write to disk was decreased, your pqact process caught up to 
current
in a very short time. We will relook at your system next week after the holiday.

On our PC that is handling our local decoding for testing GEMPAK, McIDAS and 
NetCDF decoders
with lots of IO, we found that a relatively inexpensive RAID card for the PC 
that allows the
data to be striped across several disk volumes improved our IO/wait. That is, 
the slowest
part of the computer is the mechanical disk head movement- and the RAID 
striping of data
allows data files to be interleaved across several disks.

If you need all NNEXRAD data to be filed, you may want to look at the above 
option
especially if you are using IDE disks instead of SCSI.

We will check back on Tuesday to see how the system is performing in steady 
state (without 
us having made recent changes to the pqact.conf file).

Steve Chiswell


>From: Steve Emmerson <address@hidden>
>Organization: UCAR/Unidata

>James,
>
>>Date: Thu, 22 May 2003 14:17:46 -0700 (PDT)
>>From: James Murakami <address@hidden>
>>Organization: University of California at Los Angeles
>>To: address@hidden
>>Subject: Re: 20030522: LDM tune-up request UCLA 
>
>The above message contained the following:
>
>> Yes, sundog was ingesting CONDUIT data before I shut if off
>> recently. However, the back-up server, typhoon (an old Ultra 1 Sun
>> workstation) handles CONDUIT and Noaaport together fine (same ldm.pq
>> size of 400 Mbytes).
>
>That surprises us.
>
>> Yet, gempak decoded files from the HRS feed are often incomplete on
>> sundog.
>
>I haven't seen anything obviously wrong.  There are some messages in the
>LDM logfile that indicate that pqact(1) had problems writing to a GEMPAK
>GRIB decoder.  I'll pass this on to our GEMPAK developer.
>
>Give us another day to look at your setup.
>
>Regards,
>Steve Emmerson
>