[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[LDM #AFT-567406]: GOES-17 data filing question



Hi Mike,

re:
> OK, I'm back in the hunt here....
> 
> First thought, it's amazing Vulcan was doing so well with just a 500 MB
> queue; it was getting all the FullDisk images.

Yes, indeed!

re:
> And this also leads me to
> believe me we have plenty of bandwidth and horsepower if everything is
> working right.

I agree.  But, we would still like to learn more about the capabilities
of vulcan and charney.

re:
> But, I did change queue to 12GB on both Vulcan and
> Charney.  Vulcan and Charney both have 64 GB of RAM.

I recommend changing the LDM queue to be even larger, say 20GB or
even 32GB.  Exactly how large the queue should be is something
that has to be arrived at iteratively, so starting with 12GB may
be (marginally) OK, but I would still go up to at least 20GB.

re:
> These changes seem to have done nothing to improve performance though, and
> in fact things got worse today after being a bit better over the weekend.

Hmm... This is very hard to understand since running with a large enough
queue should allow large products to be inserted into the queue without
clearing out all of the products that were in the queue when the large
one(s) come in.  I think we will need to wait for a bit to understand
the full impact on increasing the queue size.

re
> I start recording local metrics.  And gnuplot is installed.  Do you all
> have any spare gnuplot scripts i could tailor to my system?  I could post
> the info on our web for review.

We don't have special scripts.  Instead, we simply run:

ldmadmin plotmetrics <-b ccyymmdd> <-e ccyymmdd>

where the value of the '-b' value is the date on which to start the
plot and the '-e' value is the date on which to stop the plot. We
almost never use the '-e' value since we are typically looking
for how a machine has done/is doing from some starting point to
"now".  If neither the '-b' or '-e' flags are set, plots for all
of the time that metrics have been collected will be generated.

re:
> As a general bandwidth test on Vulcan I get about 18 MB/s, so over 100
> mbps, which tells me our pipe is good enough and not the limitation:
> ------------
> [ldm@vulcan ~]$ curl -O http://ipv4.download.thinkbroadband.com/512MB.zip
> % Total    % Received % Xferd  Average Speed   Time    Time     Time
> Current
> Dload  Upload   Total   Spent    Left
> Speed
> 100  512M  100  512M    0     0  17.8M      0  0:00:28  0:00:28 --:--:--
> 19.1M
> -------------
> I will continue to work with our network folks to looks for possible
> sticking points.
> Other thoughts?

I think we need to wait a bit to see what the full impact of the
queue size change has been.

Musing:

You did remember to stop the LDM, delete and then remake the queue, and
then start it after changing the LDM registry value for queue size, of
course!?  If you didn't, you are still using the old, 500 MB queue.

Quick sanity test:

ldmadmin config

ls -alt ~ldm/var/queues

> thanks,m
> -Mike
> 
> 
> address@hidden> wrote:
> 
> > Hi Mike,
> >
> > re:
> > > Thanks for continuing to ponder this.
> >
> > No worries.  These weird things really nag me!
> >
> > re:
> > > Some feedback below, but first I
> > > want to boil this down to what I think are the most relevant
> > > characteristics of my issue.  On our machine Charney I stopped all the
> > > feeds Friday, then on Sunday morning I started feeding just Conus and
> > Full
> > > disk imagery.  And on Vulcan, I have a bunch of stuff coming in still,
> > > including only Conus and FullDisk for the ABI/L1 data.
> >
> > OK.
> >
> > re:
> > > Charney is exhibiting the exact same behavior and latency pattern:
> > >
> > > 1) All Conus is coming in since yesterday AM.  Only a few FullDisk images
> > >    are making it through, even when latency in near zero.
> > > 2) Latency shows same rough shape.
> > >   (we'll see what happens over the next few hours with campus traffic
> > ramping
> > >   up)
> > >
> > >
> > http://rtstats.unidata.ucar.edu/cgi-bin/rtstats/iddstats_nc?SATELLITE+vulcan.science.sjsu.edu
> > >
> > http://rtstats.unidata.ucar.edu/cgi-bin/rtstats/iddstats_nc?SATELLITE+charney.met.sjsu.edu
> > >
> > >
> > > To me this strongly suggests something general on the network, somewhere
> > > between Unidata and my servers and most likely at SJSU.
> > >
> > > Agree?
> >
> > Ordinarily, I would agree, but something you include below makes me want to
> > reserve judgment.
> >
> > re:
> > > Along one of the (many) other lines of investigation:
> >
> > > Both Vulcan and Charney configs included here:
> > >
> > > [ldm@vulcan ~]$ ldmadmin config
> > >
> > > hostname:              vulcan.science.sjsu.edu
> > > os:                    Linux
> > > release:               3.10.0-1062.1.2.el7.x86_64
> > > ldmhome:               /usr/local/ldm
> > > LDM version:           6.13.11
> > > PATH:
> > >
> > /usr/local/ldm/ldm-6.13.11/bin:/usr/local/ldm/decoders:/usr/local/ldm/util:/usr/local/ldm/bin:/usr/lib64/mpich/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/sbin:/home/gempak/GEMPAK7/os/linux64/bin:/home/gempak/GEMPAK7/bin
> > > LDM conf file:         /usr/local/ldm/etc/ldmd.conf
> > > pqact(1) conf file:    /usr/local/ldm/etc/pqact.conf
> > > scour(1) conf file:    /usr/local/ldm/etc/scour.conf
> > > product queue:         /usr/local/ldm/var/queues/ldm.pq
> > > queue size:            500M bytes
> > > queue slots:           default
> > > reconciliation mode:   do nothing
> > > pqsurf(1) path:        /usr/local/ldm/var/queues/pqsurf.pq
> > > pqsurf(1) size:        2M
> > > IP address:            0.0.0.0
> > > port:                  388
> > > PID file:              /usr/local/ldm/ldmd.pid
> > > Lock file:             /usr/local/ldm/.ldmadmin.lck
> > > maximum clients:       256
> > > maximum latency:       3600
> > > time offset:           3600
> > > log file:              /usr/local/ldm/var/logs/ldmd.log
> > > numlogs:               7
> > > log_rotate:            1
> > > netstat:               /bin/netstat -A inet -t -n
> > > top:                   /usr/bin/top -b -n 1
> > > metrics file:          /usr/local/ldm/var/logs/metrics.txt
> > > metrics files:         /usr/local/ldm/var/logs/metrics.txt*
> > > num_metrics:           4
> > > check time:            1
> > > delete info files:     0
> > > ntpdate(1):            /usr/sbin/ntpdate
> > > ntpdate(1) timeout:    5
> > > time servers:          ntp.ucsd.edu ntp1.cs.wisc.edu ntppub.tamu.edu
> > > otc1.psu.edu timeserver.unidata.ucar.edu
> > > time-offset limit:     10
> >
> > Quick comment:
> >
> > The LDM queue on vulcan is MASSIVELY undersized.  The current queue size
> > is 500MB
> > and the FullDisk Channel 02 images can be as large as 445MB!
> >
> > The very first thing that needs to happen is to make the LDM queue on
> > vulcan MUCH, MUCH larger.  Given the current amount of data being
> > REQUESTed:
> >
> > Data Volume Summary for vulcan.science.sjsu.edu
> >
> > Maximum hourly volume  23575.039 M bytes/hour
> > Average hourly volume   9862.106 M bytes/hour
> >
> > Average products per hour     138814 prods/hour
> >
> > Feed                           Average             Maximum     Products
> >                      (M byte/hour)            (M byte/hour)   number/hour
> > CONDUIT                5813.895    [ 58.952%]    19895.421    52265.535
> > SATELLITE              2446.310    [ 24.805%]     3717.159      369.884
> > HDS                    1239.508    [ 12.568%]     1785.246    38765.767
> > FNEXRAD                 108.688    [  1.102%]      137.162      105.233
> > UNIWISC                  94.174    [  0.955%]      145.057       50.628
> > NIMAGE                   84.159    [  0.853%]      130.801      120.488
> > IDS|DDPLUS               75.348    [  0.764%]       86.674    47076.907
> > LIGHTNING                 0.023    [  0.000%]        0.064       59.814
> >
> > The LDM queue should be at least 32 GB in size.
> >
> > Question:
> >
> > - does vulcan have enough RAM to make a large LDM queue without impacting
> >   other things running on the machine?
> >
> >   Another way of asking this is:  how much RAM does vulcan have?
> >
> > re:
> > > [ldm@charney current]$ ldmadmin config
> > >
> > > hostname:              charney.met.sjsu.edu
> > > os:                    Linux
> > > release:               3.10.0-1062.1.2.el7.x86_64
> > > ldmhome:               /usr/local/ldm
> > > LDM version:           6.13.11
> > > PATH:
> > >
> > /usr/local/ldm/ldm-6.13.11/bin:/usr/local/ldm/decoders:/usr/local/ldm/util:/usr/local/ldm/bin:/usr/lib64/mpich/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/sbin:/home/gempak/GEMPAK7/os/linux64/bin:/home/gempak/GEMPAK7/bin
> > > LDM conf file:         /usr/local/ldm/etc/ldmd.conf
> > > pqact(1) conf file:    /usr/local/ldm/etc/pqact.conf
> > > scour(1) conf file:    /usr/local/ldm/etc/scour.conf
> > > product queue:         /usr/local/ldm/var/queues/ldm.pq
> > > queue size:            500M bytes
> > > queue slots:           default
> > > reconciliation mode:   do nothing
> > > pqsurf(1) path:        /usr/local/ldm/var/queues/pqsurf.pq
> > > pqsurf(1) size:        2M
> > > IP address:            0.0.0.0
> > > port:                  388
> > > PID file:              /usr/local/ldm/ldmd.pid
> > > Lock file:             /usr/local/ldm/.ldmadmin.lck
> > > maximum clients:       256
> > > maximum latency:       3600
> > > time offset:           3600
> > > log file:              /usr/local/ldm/var/logs/ldmd.log
> > > numlogs:               7
> > > log_rotate:            1
> > > netstat:               /bin/netstat -A inet -t -n
> > > top:                   /usr/bin/top -b -n 1
> > > metrics file:          /usr/local/ldm/var/logs/metrics.txt
> > > metrics files:         /usr/local/ldm/var/logs/metrics.txt*
> > > num_metrics:           4
> > > check time:            1
> > > delete info files:     0
> > > ntpdate(1):            /usr/sbin/ntpdate
> > > ntpdate(1) timeout:    5
> > > time servers:          ntp.ucsd.edu ntp1.cs.wisc.edu ntppub.tamu.edu
> > > otc1.psu.edu timeserver.unidata.ucar.edu
> > > time-offset limit:     10
> >
> > The exact same comment goes for charney: its LDM queue is severely
> > undersized given the data being REQUESTed.
> >
> > Question:
> >
> > - the same question goes for charney as the one I posed above for
> >   vulcan:
> >
> >   How much RAM is installed on charney?
> >
> > re: Are you gathering system metrics using 'ldmadmain addmetrics'
> >
> > > No, but just started.  I'll look into gnuplot in a bit here.
> > > Unfortunately I have bunch of meetings this morning, so I'll be watching
> > at
> > > a distance.  My hunch is the latency will spike again around 0900 local
> > > because of campus Internet traffic, but we'll see.
> >
> > We can say without hesitation that until the LDM queues on both vulcan
> > and charney are made much larger, all bets are off wrt to proper processing
> > of received products.
> >
> > Cheers,
> >
> > Tom
> > --
> >
> > ****************************************************************************
> > Unidata User Support                                    UCAR Unidata
> > Program
> > (303) 497-8642                                                 P.O. Box
> > 3000
> > address@hidden                                   Boulder, CO
> > 80307
> >
> > ----------------------------------------------------------------------------
> > Unidata HomePage                       http://www.unidata.ucar.edu
> >
> > ****************************************************************************
> >
> >
> > Ticket Details
> > ===================
> > Ticket ID: AFT-567406
> > Department: Support LDM
> > Priority: Normal
> > Status: Open
> > ===================
> > NOTE: All email exchanges with Unidata User Support are recorded in the
> > Unidata inquiry tracking system and then made publicly available through
> > the web.  If you do not want to have your interactions made available in
> > this way, you must let us know in each email you send to us.
> >
> >
> >
> 
> 

Cheers,

Tom
--
****************************************************************************
Unidata User Support                                    UCAR Unidata Program
(303) 497-8642                                                 P.O. Box 3000
address@hidden                                   Boulder, CO 80307
----------------------------------------------------------------------------
Unidata HomePage                       http://www.unidata.ucar.edu
****************************************************************************


Ticket Details
===================
Ticket ID: AFT-567406
Department: Support LDM
Priority: Normal
Status: Open
===================
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata 
inquiry tracking system and then made publicly available through the web.  If 
you do not want to have your interactions made available in this way, you must 
let us know in each email you send to us.