[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20050511: 20050510: LDM data feed question



Celia,

You may have seen some reconnections on thelma yesterday as the host was 
restarted.

Your LDM topology is receiving data from multiple upstream sources, and
your queue is generally too small to hold an hours worth of data
such that you are receiving data once in real time, and again
from the other host you are specifying in your request line after
the data has been scoured out of your queue.

This is an ongoing problem such as noted on Jan 6:
http://my.unidata.ucar.edu/cgi-bin/getfile?file=/content/support/help/MailArchives/idd/msg03680.html

To illustrate this, You can see the hourly data volume of the IDS|DDPLUS feed on
the idd.unidata.ucar.edu cluster:
http://my.unidata.ucar.edu/cgi-bin/rtstats/iddstats_vol_nc1?IDS|DDPLUS+uni4.unidata.ucar.edu

This graph shows that the hourly volume of IDS|DDPLUS is under 15MB most of the 
time, with
a few excusions up to 20MB.

Contrast this graph to your hosts chisel and level:
http://my.unidata.ucar.edu/cgi-bin/rtstats/iddstats_vol_nc1?IDS|DDPLUS+chisel.rap.ucar.edu
http://my.unidata.ucar.edu/cgi-bin/rtstats/iddstats_vol_nc1?IDS|DDPLUS+level.rap.ucar.edu

You will notice that Chisel and Level are receiving double the data volume for 
large time periods.

When you see jumps back and forth in latency such as rasp exibits in:
http://my.unidata.ucar.edu/cgi-bin/rtstats/iddstats_nc?IDS|DDPLUS+rasp.rap.ucar.edu

you are seeing the data come across the first time in real-time, and the second 
time
some 3000 to 6000 seconds later after the data has been scoured out of the 
local queue. 
You will notice that rasp is feeding from both chisel and level. 

Looking at Chisel shows it is feeding from both thelma as well as 
atm.geo.nsf.gov,
and showing the latency of products from atm.geo.nsf.gov is that of the data 
being
received a second time possibly due to using an ALTERNATE request entry, or 
having
a volume limitation on that connection:
http://my.unidata.ucar.edu/cgi-bin/rtstats/iddstats_nc?IDS|DDPLUS+chisel.rap.ucar.edu

The LDM on Level has requests to Chisel (and therefore getting the second trip 
products from
atm.geo.nsf.gov) and thelma:
http://my.unidata.ucar.edu/cgi-bin/rtstats/iddstats_nc?IDS|DDPLUS+level.rap.ucar.edu

From the above, The best path of action would be to remove your redundant data 
requests to
atm.geo.nsf.gov unless your LDM queue is large enough to hold 1 hours worth 
ofdata
for the combined data volume you are requesting. For chisel, you are now 
receiving
over 4GB per hour at host times:
http://my.unidata.ucar.edu/cgi-bin/rtstats/rtstats_summary_volume1?chisel.rap.ucar.edu+GRAPH

Level is very consistent at receiving 4GB per hour in total feeds:
http://my.unidata.ucar.edu/cgi-bin/rtstats/rtstats_summary_volume1?level.rap.ucar.edu+GRAPH

Note that to have a 4GB queue, having enough memory to map the entire queue 
would be the best case.
If you don't have enough memory to hold the entire queue, your machine will 
work very
hard at swapping/paging when you have a reconnection such as you describe in you
message below.

If you want to preserve some level of redundency, then I'd suggest having 
level.rap feed
from thelma.ucar.edu; and chisel.rap feed from idd.unidata.ucar.edu.
Your host rasp can then redundantly feed from chisel and level...but again, 
should have
a large enough queue.

Steve Chiswell
Unidata User Support



>From: Celia Chen <address@hidden>
>Organization: UCAR/Unidata
>Keywords: 200505112043.j4BKhGP3019420

>Dear support,
>
>Could you tell me what happened around 12:30pm (local time) today in the LDM w
> orld?
>There were no nexrad level-2 data coming in to RAL's 3 ldm hosts for about 10-
> 15
>minutes.  When the data started to flow in again the data was about 20 minutes
>  late.
>Again, I turned off the feed to the "new" machine. But it didn't have any noti
> ceable
>effect this time.  BTW, the "new" machine is really chisel with NO other proce
> ssing
>going on.
>
>Also, an update on what's going on here on level-2 data feed. I have been watc
> hing 
>the feed very closely.  Last night I turned on the feed to all 4 hosts. They a
> ll
>looked fine when I checked this morning. I then turned the feed off on the 3rd
>host (clamp) because I don't need it to feed anybody. I left the other three h
> osts 
>requesting the feed. Until about 12:45pm when I turned off chisel to see if it
>  will 
>again correct the late feed problem.  But it didn't.  It took rasp and level a
> nother
>30-40 minutes to recover.
>
>However, the statistics plots provided by you folks show some latency problem 
> during the
>night on all hosts. Looks like the gap/latency occurred around 00, 06, 12, or 
> 18Z most
>of the time.  Another interesting thing that I noticed about chisel is that th
> e "log(latency)"
>shows a different color plot as if chisel is receiving data from a group diffe
> rent sites since
>yesterday afternoon. (I turned off the feed for most of the afternoon yesterda
> y where the big
>gap is).  Please check the following plot and let me know if that is the case:
>
>http://my.unidata.ucar.edu/cgi-bin/rtstats/iddstats_nc?NEXRAD2+chisel.rap.ucar
> .edu+LOG
>
>NEXRAD2 is a very critical data set for many of my users within RAP. Lately I 
> have been
>puzzled by this on and off data latency probelm. I hope you can help me unders
> tand what is 
>going on and resolve the problem.
>
>Many thanks.
>
>Celia 
>
> 
>On Tue, May 10, 2005 at 03:58:15PM -0600, Celia Chen wrote:
>> 
>> There is no pqact processing on this "new" machine. The "pqact"
>> entry in ldmd.conf has been turned off before this happened. 
>> 
>> 
>> I have also turned off the pqact processing on level-2 data on rasp 
>> for sometime now. 
>> 
>> I will test this again when the data feed is not so critical ...
>> 
>> Thanks.
>> 
>> Celia
>> 
>> 
>> On Tue, May 10, 2005 at 03:29:19PM -0600, Unidata Support wrote:
>> > 
>> > Celia,
>> > 
>> > Are you doing any pqact processing on this new machine?
>> > We found previously that your pqact.conf processing on rasp.rap.ucar.edu
>> > was filing every single 100 radial chunk to a separate file, and that 
>> > machine was IO bound and unable to keep up. We made some suggestions, but 
> you
>> > did say you backed out of some of them due to your user need- so that
>> > problem we identified may still exist on your end.
>> > 
>> > If you turned on the feed on your new machine and are processing similar
>> > pqact.conf actions, you are going down the same path. I suggest that you c
> omment out
>> > your pqact entry in ldmd.conf for your relay test if it is critical that y
> ou
>> > relay from this machine for a demo at this moment.
>> > 
>> > If the new machine is not doing any processing and still falls behind in 
>> > relaying data, that would be strange.
>> > 
>> > Steve Chiswell
>> > Unidata User SUpport
>> > 
>> > 
>> > 
>> > >From: Celia Chen <address@hidden>
>> > >Organization: UCAR/Unidata
>> > >Keywords: 200505102049.j4AKnoP3014554
>> > 
>> > >Dear Support,
>> > >
>> > >I think I just found something interesting in LDM data feed and hope you 
> can h
>> > > elp me
>> > >explain this. 
>> > >
>> > >I have been getting NEXRAD2 data feed from thelma to 3 of my 4 ldm hosts.
>  I ha
>> > > ve been
>> > >having problems with two of the feeds some times in the last couple month
> s.  J
>> > > ust 
>> > >like the situations described below, one feed would be on time but the ot
> hers 
>> > > would
>> > >have latency problems during some period of time. 
>> > >
>> > >Today I decided to experiment with my 4th ldm host to see if it gets the
>> > >NEXRAD2 data feed directly from thelma. Because I may need to use that ma
> chine
>> > >  as the
>> > >primary LDM relay machine in the near future. I turned on the feed before
>  lunc
>> > > h and
>> > >it started to receive level-2 data.  However, a user was in my office whe
> n I c
>> > > ame back
>> > >from lunch. There is a 30 minutes latency on the level-2 data feed to her
>  down
>> > > stream
>> > >ldm host and she has a realtime demo in about 30 minutes. What is wrong w
> ith t
>> > > he feed?
>> > >
>> > >I right-a-way turned off the level-2 data feed on my 4th ldm host and wat
> ched 
>> > > the feed 
>> > >getting back on time in the next few minutes in front of our eyes. 
>> > >
>> > >
>> > >My question is: why the latency problem showed up after I turned on the 4
> th fe
>> > > ed. Is this
>> > >a local network problem or something else?
>> > >
>> > >Thanks in advance.
>> > >
>> > >Celia
>> > >
>> > >
>> > >
>> > >On Thu, May 05, 2005 at 06:06:33PM -0600, Celia Chen wrote:
>> > >> Dear Support,
>> > >> 
>> > >> I have a question on the time stamps that we see when we do ldmadmin wa
> tch:
>> > >> 
>> > >> May 05 23:07:18 pqutil:    23416 20050505220555.112 NEXRAD2 172007  L2-
> BZIP2
>> > > /KGJX/20050505220505/172/7
>> > >> May 05 23:07:18 pqutil:     6001 20050505220555.561 NEXRAD2 986005  L2-
> BZIP2
>> > > /KDDC/20050505220444/986/5
>> > >> May 05 23:07:18 pqutil:    13022 20050505220555.212 NEXRAD2 66012  L2-B
> ZIP2/
>> > > KAMX/20050505220506/66/12
>> > >> May 05 23:07:18 pqutil:    29997 20050505220555.128 NEXRAD2 916024  L2-
> BZIP2
>> > > /KGLD/20050505220419/916/24
>> > >> 
>> > >> I assume that the time on the left is the data arrival time, the time i
> n the
>> > >  middle is when the file
>> > >> was sent by the upstream host, and the time on the right is the time wh
> en th
>> > > e file was
>> > >> generated.
>> > >> 
>> > >> Both our LDM hosts rasp and level are getting NEXRAD2 data from thelma.
>   The
>> > > y both request the
>> > >> NEXRAD2 data directly from thelma.  For some reason, there was a huge
>> > >> (~ 1 hour) latency on rasp this afternoon, as you can see from the abov
> e lis
>> > > t. But somehow
>> > >> the same feed on level was very much on time.  Unfortunately I don't ha
> ve th
>> > > e watch on for the
>> > >> same time period on level. 
>> > >> 
>> > >> Is my assumption correct about the time stamps? If so, why do we see th
> e big
>> > >  difference on the
>> > >> time stamps (from thelma) between rasp and level, therefore the data la
> tency
>> > >  on rasp but not 
>> > >> on level?
>> > >> 
>> > >> Thanks in advance.
>> > >> 
>> > >> Celia.
>> > >
>> > --
>> > **************************************************************************
>> > Unidata User Support                                    UCAR Unidata Progr
>> > (303)497-8643                                                  P.O. Box 30
>> > address@hidden                                   Boulder, CO 803
>> > --------------------------------------------------------------------------
>> > Unidata WWW Service              http://my.unidata.ucar.edu/content/suppor
>> > --------------------------------------------------------------------------
>> > NOTE: All email exchanges with Unidata User Support are recorded in the
>> > Unidata inquiry tracking system and then made publicly available
>> > through the web.  If you do not want to have your interactions made
>> > available in this way, you must let us know in each email you send to us.
>
--
NOTE: All email exchanges with Unidata User Support are recorded in the
Unidata inquiry tracking system and then made publicly available
through the web.  If you do not want to have your interactions made
available in this way, you must let us know in each email you send to us.