[ldm-users] 20200423: Re: 20200423: Re: Efficiency of splitting pqacts

Hi Daryl,

On 4/23/20 1:36 PM, Herzmann, Daryl E [AGRON] wrote:
I am sure Unidata will correct my ignorance / incorrect details, but
my understanding is that an individual pqact process can only do 32
"things" at one time, or there's 32 slots available for work.

A _long_ time ago, the LDM used to only keep open a maximum of 32 file
descriptors.  A less, but still long time ago, Steve changed that to
used the system value for the number of open file descriptors.
Recently, we came to the conclusion that there being LOTS of open
file descriptors was a major cause for the length of time it took
to stop the LDM, at least, on our publicly facing servers
(lead.unidata.ucar.edu and atm.ucar.edu).  Actions like those that
append to an open file were simply not being closed because current
OSes allow for LOTS of open file descriptors.  Steve's solution was
to add code to the LDM that would close file descriptors after a
certain amount of time during which the writes were inactive.  The
best example of the kind of actions that I am referring to are ones
for model output that write all model fields for a single model time
step into a single file.  In these kinds of actions (FILE with no
-close flag), there is no way to know when all of the products to
be written into the output file have been received, so the file
descriptor stays open, and as I noted current OSes allow for a LOT
of open file descriptors.

re:
Now, the above depends on the action.  If you run `PIPE -close`,
the slot can be used for another product even with the PIPEd process
still running...  This type of action can lead LDM to DOSing the server
it is on as it will fire off as many PIPE'd processes that it can.

I'm not sure that this is the case, but Steve can certainly say yea/nea
on this.

re:
You old timers, like me, will recall the lock file fun Chiz wrote into
the GIF generation script of NIDS data for this reason.

If you are doing just FILE actions without a `-close`, there is some
benefit to spreading out the pqact.conf file into multiple files to
keep each pqact roughly touching 32 files each.  For example with
level2 data, dividing the radars into chunks like so:

exec    "pqact -p BZIP2/K[A-D] -f CRAFT /local/ldm/etc/pqact-craft.conf"
exec    "pqact -p BZIP2/K[E-H] -f CRAFT /local/ldm/etc/pqact-craft2.conf"
exec    "pqact -p BZIP2/K[I-K] -f CRAFT /local/ldm/etc/pqact-craft3.conf"
exec    "pqact -p BZIP2/K[L-O] -f CRAFT /local/ldm/etc/pqact-craft4.conf"
exec    "pqact -p BZIP2/K[P-R] -f CRAFT /local/ldm/etc/pqact-craft5.conf"
exec    "pqact -p BZIP2/K[S-Z] -f CRAFT /local/ldm/etc/pqact-craft6.conf"
exec    "pqact -p BZIP2/[A-J] -f CRAFT /local/ldm/etc/pqact-craft7.conf"
exec    "pqact -p BZIP2/[L-Z] -f CRAFT /local/ldm/etc/pqact-craft8.conf"

Behold, another caveat here.  While with the above, each pqact process has
its own uniquely named file, this file can be the same file on the filesystem
and managed with sym links.  They need to be unique to the pqact process so
that pqact can write its `.state` file to a unique location.

The question that Mike Z was asking was about the number of actions in
the pattern-action file.  If one uses the exact same pattern-action file
for each 'pqact' instance, and that pattern-action file has a lot of
actions, it will take longer for 'pqact' to work its way through the
actions.  This is true even if some/most of the actions are not executed
because their extended regular expression doesn't match the Product ID
for the product being acted upon.  Of course, actions that don't match
tend to be dealt with much faster than ones that do match.

re:
You should consider the processes being run, how long their lifetime is,
and your server's capacity.   If you have a bunch of long running GEMPAK
decoders that totals something less than 32 total, then just keep them
in one file but perhaps isolate that pqact process to just those tasks.

I agree with the sentiment expressed here, but I would caution that the
old 32 open file descriptor limit does not apply.

re:
So hold tight until Unidata corrects my above as FUD :)

Just having fun on a stay at home day :-)

Cheers,

Tom

_______________________________
From: ldm-users <ldm-users-bounces@xxxxxxxxxxxxxxxx> on behalf of Tom Yoksas 
<yoksas@xxxxxxxx>
Sent: Thursday, April 23, 2020 2:15 PM
To: ldm-users@xxxxxxxxxxxxxxxx
Subject: [ldm-users] 20200423: Re:  Efficiency of splitting pqacts

Hi Mike,

On 4/23/20 12:39 PM, Mike Zuranski wrote:
I'm wondering if there is a difference in speed/efficiency of the LDM,
or in system resource allocation, between grouping all my pqact
statements in one file vs. splitting them up into different pqact
files.

Since all actions in an LDM pattern-action file are processed
sequentially, there is a benefit to distributing actions in multiple
pattern-action files that are each processed by a separate 'pqact'
instance.

re:
Does LDM do anything differently or is it a wash either way?

No, each 'pqact' instance will work through the list of actions in
the pattern-action file that it works in sequence.  So, if one has
a monolithic pattern-action file with, say 10K actions, it will take
significantly longer than having 10 'pqact' instances operating
on pattern-action files that each have 100 actions.

re:
I vaguely remember this coming up at one point but I couldn't find any
documentation or old email threads about it.  I'm mostly just asking out
of curiosity, I don't have a specific problem that I'm trying to solve
or anything.  But if I were to redo my pqact organization I'm wondering
if there is a preferred methodology.

The best rule of thumb is to have multiple 'pqact' instances operating
on multiple pattern-action files when the list of actions to be
performed is large, or when some of the actions are slow.  There is no
"best practice" for, say, having only N actions in a pattern-action
file since the speed that the actions will be performed is a function
of how fast/slow each action is.  Sites invariably will need to do
their own tuning to find the right balance of speed and use of
resources (more 'pqact' instances will, of course, use more resources
like CPU, RAM, etc.).

Cheers,

Tom
--
+----------------------------------------------------------------------+
* Tom Yoksas                                      UCAR Unidata Program *
* (303) 497-8642 (last resort)                           P.O. Box 3000 *
* yoksas@xxxxxxxx                                    Boulder, CO 80307 *
* Unidata WWW Service                     http://www.unidata.ucar.edu/ *
+----------------------------------------------------------------------+

_______________________________________________
NOTE: All exchanges posted to Unidata maintained email lists are
recorded in the Unidata inquiry tracking system and made publicly
available through the web.  Users who post to any of the lists we
maintain are reminded to remove any personal information that they
do not want to be made public.


ldm-users mailing list
ldm-users@xxxxxxxxxxxxxxxx
For list information or to unsubscribe,  visit: 
https://www.unidata.ucar.edu/mailing_lists/

_______________________________________________
NOTE: All exchanges posted to Unidata maintained email lists are
recorded in the Unidata inquiry tracking system and made publicly
available through the web.  Users who post to any of the lists we
maintain are reminded to remove any personal information that they
do not want to be made public.


ldm-users mailing list
ldm-users@xxxxxxxxxxxxxxxx
For list information or to unsubscribe,  visit: 
https://www.unidata.ucar.edu/mailing_lists/


--
+----------------------------------------------------------------------+
* Tom Yoksas                                      UCAR Unidata Program *
* (303) 497-8642 (last resort)                           P.O. Box 3000 *
* yoksas@xxxxxxxx                                    Boulder, CO 80307 *
* Unidata WWW Service                     http://www.unidata.ucar.edu/ *
+----------------------------------------------------------------------+


  • 2020 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the ldm-users archives: