Re: [ldm-users] 20200423: Re: 20200423: Re: Efficiency of splitting pqacts

  • To: "ldm-users@xxxxxxxxxxxxxxxx" <ldm-users@xxxxxxxxxxxxxxxx>
  • Subject: Re: [ldm-users] 20200423: Re: 20200423: Re: Efficiency of splitting pqacts
  • From: "Herzmann, Daryl E [AGRON]" <akrherz@xxxxxxxxxxx>
  • Date: Fri, 24 Apr 2020 01:39:20 +0000
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=iastate.edu; dmarc=pass action=none header.from=iastate.edu; dkim=pass header.d=iastate.edu; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=7UGPo8GfIZpuZxWKacn/GK9aek5/THlzfJlIj6eUPTs=; b=UJBdyXucK7HVvhIrwOMArpXBXgBI8c53mtL0liipGHUoV2bfrXoTvdWIUtxp1+8W0Yb9DLo+VTKLuvivofQ0Al/htA1UtE54rSGyCllNnlAUzgsgGiqFkfYdUu8GBEBI2Eda9JZ2/ES9GNTwxXD9vj+tAhoOqExydVmJSc6RqGUQ3AXj/2cM44/5On/a5S2jno6eICG7o4JYz+FWli6Cf1j7QIp/36LYFDb4M3vQ+ayXLi4+hVfOfsP36o9LCSCZlmnvlVqYYRblIanejJoCl5xNhkHKoap/rmpeOSRgRKc285eQk/U+S3j4yqV4iK1vhnoJqZYzA46rNRYPALU41g==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=fzGJ4iJU6TdM+EH82CR5nB6X242p9pjpq3GR8cKh8ABmKS9c5FnwkyyQgWn3S9pes1g6owsEhnGnCpya/Poyt/TM4827LxJxJ/6gozQjYv/QZV1sSSkDJ7cnb1eBEiPKraFkmlJub4qvaEVJL9giAZCSzPIoV2o/dD8xw7TvF2EF2OnygFZ0jl1OYa3G8ra0muJqdaCM1YNEoCx7R3yPfR15QQLh/pbJc5A8KXG4uo20l8LxSbK+gQCgBmQpNZu4nptXxgWXHsqnWsWHz8o4rao+j41tKpKgCaL35u47qRzO7yZEU71VId1styLYnXWtKmop96LwvKoUARig9UdZfQ==
  • Authentication-results: spf=none (sender IP is ) smtp.mailfrom=akrherz@xxxxxxxxxxx;
Hi Tom,

Thanks for the response.  I'm an academic, let us do academic things.  I putz 
here on my RHEL8 64bit laptop and LDM 6.13.11.

Testing `PIPE -close`
=============

Lets create a fancy pants bash script, since python is 3x too slow.

$ cat 10xengineer.sh
# save data to the bit bucket
cat > /dev/null
# Wait a second, no 3600 seconds
sleep 3600

Lets create a fancy pants pqact entry

$ cat etc/fun.conf

EXP     ^(.*)$
        PIPE    -close  sh 10xengineer.sh \1

So in desperation, I wrote a python script to inject LDM with unique product 
names so that each file fires off a script from pqact.

$ cat bah.py

import datetime
import subprocess
import tempfile

with open('bah.txt', 'w') as fp:
    fp.write('hi')

for _ in range(1000):
    cmd = f"pqinsert -i -p '{datetime.datetime.now()}' bah.txt"
    subprocess.call(cmd, shell=True)

and so we fire up LDM and observe ye ole fire off first pqact entry process 
running :)

sh 10xengineer.sh _BEGIN_

So our numbers below will have 1001 processes involved.  Lets get going here 
and insert those 1000 products.

$ python bah.py
$

And so how many 10xengineer.sh processes we have now?

$ ps auxw | grep 10x  | wc -l
1001

Ah fun, I have DOS'd my LDM with bash scripts, hehe.  Lets kill all those and 
try something else.

Testing with just `PIPE`
==============

So now we adjust the pqact to drop the `-close` and so pqact should hold the 
file descriptor open until it closes it after that timeout that Steve mentioned.

$ cat etc/fun.conf

EXP     ^(.*)$
        PIPE    sh 10xengineer.sh \1

and now inject those 1000 files into LDM and behold....

$ ps auxw | grep 10x | wc -l
1001

Whoa, that's interesting.  Looking at the process's (pid 5092 is pqact) open 
file descriptors

$ lsof -p 5092 | grep pipe | wc -l
1001

So there I was wrong, there's no 32 limit anymore.  We had better test FILE too 
in order to satisfy reviewer #3

Testing `FILE -close`
============

Our pqact.conf file looks like so now:

$ cat etc/fun.conf

EXP     ^(.*)$
        FILE    -close  /tmp/daryl/\1

and start our LDM up and see our _BEGIN_ fun again.

$ ls /tmp/daryl
_BEGIN_

and so we inject 1000 products and observe nothing because there is a space in 
the LDM product name, hehe.  So we adjust our python script to remove the 
spaces and we find.

$ ls /tmp/daryl/ | wc -l
1001

and the pqact process has no open file descriptors.  So lets test without the 
close

Testing `FILE`
========

Our pqact now looks like:

$ cat etc/fun.conf

EXP     ^(.*)$
        FILE    /tmp/daryl/\1
$ rm -rf /tmp/daryl/

and we inject 1000 files again and observe they all got written

$ ls /tmp/daryl/ | wc -l
1001

and observe how many open file descriptors the pqact process has.

$ lsof -p 14633 | grep daryl | wc -l
1001

Again, so much for my naive "32 slots" life.  Well, I learned something today!

daryl



--
/**
 * daryl herzmann
 * Systems Analyst III -- Iowa Environmental Mesonet
 * https://mesonet.agron.iastate.edu
 */

________________________________________
From: ldm-users <ldm-users-bounces@xxxxxxxxxxxxxxxx> on behalf of Tom Yoksas 
<yoksas@xxxxxxxx>
Sent: Thursday, April 23, 2020 3:03 PM
To: ldm-users@xxxxxxxxxxxxxxxx
Subject: [ldm-users] 20200423: Re: 20200423: Re: Efficiency of splitting pqacts

Hi Daryl,

On 4/23/20 1:36 PM, Herzmann, Daryl E [AGRON] wrote:
> I am sure Unidata will correct my ignorance / incorrect details, but
> my understanding is that an individual pqact process can only do 32
> "things" at one time, or there's 32 slots available for work.

A _long_ time ago, the LDM used to only keep open a maximum of 32 file
descriptors.  A less, but still long time ago, Steve changed that to
used the system value for the number of open file descriptors.
Recently, we came to the conclusion that there being LOTS of open
file descriptors was a major cause for the length of time it took
to stop the LDM, at least, on our publicly facing servers
(lead.unidata.ucar.edu and atm.ucar.edu).  Actions like those that
append to an open file were simply not being closed because current
OSes allow for LOTS of open file descriptors.  Steve's solution was
to add code to the LDM that would close file descriptors after a
certain amount of time during which the writes were inactive.  The
best example of the kind of actions that I am referring to are ones
for model output that write all model fields for a single model time
step into a single file.  In these kinds of actions (FILE with no
-close flag), there is no way to know when all of the products to
be written into the output file have been received, so the file
descriptor stays open, and as I noted current OSes allow for a LOT
of open file descriptors.

re:
> Now, the above depends on the action.  If you run `PIPE -close`,
> the slot can be used for another product even with the PIPEd process
> still running...  This type of action can lead LDM to DOSing the server
> it is on as it will fire off as many PIPE'd processes that it can.

I'm not sure that this is the case, but Steve can certainly say yea/nea
on this.

re:
> You old timers, like me, will recall the lock file fun Chiz wrote into
> the GIF generation script of NIDS data for this reason.
>
> If you are doing just FILE actions without a `-close`, there is some
> benefit to spreading out the pqact.conf file into multiple files to
> keep each pqact roughly touching 32 files each.  For example with
> level2 data, dividing the radars into chunks like so:
>
> exec    "pqact -p BZIP2/K[A-D] -f CRAFT /local/ldm/etc/pqact-craft.conf"
> exec    "pqact -p BZIP2/K[E-H] -f CRAFT /local/ldm/etc/pqact-craft2.conf"
> exec    "pqact -p BZIP2/K[I-K] -f CRAFT /local/ldm/etc/pqact-craft3.conf"
> exec    "pqact -p BZIP2/K[L-O] -f CRAFT /local/ldm/etc/pqact-craft4.conf"
> exec    "pqact -p BZIP2/K[P-R] -f CRAFT /local/ldm/etc/pqact-craft5.conf"
> exec    "pqact -p BZIP2/K[S-Z] -f CRAFT /local/ldm/etc/pqact-craft6.conf"
> exec    "pqact -p BZIP2/[A-J] -f CRAFT /local/ldm/etc/pqact-craft7.conf"
> exec    "pqact -p BZIP2/[L-Z] -f CRAFT /local/ldm/etc/pqact-craft8.conf"
>
> Behold, another caveat here.  While with the above, each pqact process has
> its own uniquely named file, this file can be the same file on the filesystem
> and managed with sym links.  They need to be unique to the pqact process so
> that pqact can write its `.state` file to a unique location.

The question that Mike Z was asking was about the number of actions in
the pattern-action file.  If one uses the exact same pattern-action file
for each 'pqact' instance, and that pattern-action file has a lot of
actions, it will take longer for 'pqact' to work its way through the
actions.  This is true even if some/most of the actions are not executed
because their extended regular expression doesn't match the Product ID
for the product being acted upon.  Of course, actions that don't match
tend to be dealt with much faster than ones that do match.

re:
> You should consider the processes being run, how long their lifetime is,
> and your server's capacity.   If you have a bunch of long running GEMPAK
> decoders that totals something less than 32 total, then just keep them
> in one file but perhaps isolate that pqact process to just those tasks.

I agree with the sentiment expressed here, but I would caution that the
old 32 open file descriptor limit does not apply.

re:
> So hold tight until Unidata corrects my above as FUD :)

Just having fun on a stay at home day :-)

Cheers,

Tom

_______________________________
> From: ldm-users <ldm-users-bounces@xxxxxxxxxxxxxxxx> on behalf of Tom Yoksas 
> <yoksas@xxxxxxxx>
> Sent: Thursday, April 23, 2020 2:15 PM
> To: ldm-users@xxxxxxxxxxxxxxxx
> Subject: [ldm-users] 20200423: Re:  Efficiency of splitting pqacts
>
> Hi Mike,
>
> On 4/23/20 12:39 PM, Mike Zuranski wrote:
>> I'm wondering if there is a difference in speed/efficiency of the LDM,
>> or in system resource allocation, between grouping all my pqact
>> statements in one file vs. splitting them up into different pqact
>> files.
>
> Since all actions in an LDM pattern-action file are processed
> sequentially, there is a benefit to distributing actions in multiple
> pattern-action files that are each processed by a separate 'pqact'
> instance.
>
> re:
>> Does LDM do anything differently or is it a wash either way?
>
> No, each 'pqact' instance will work through the list of actions in
> the pattern-action file that it works in sequence.  So, if one has
> a monolithic pattern-action file with, say 10K actions, it will take
> significantly longer than having 10 'pqact' instances operating
> on pattern-action files that each have 100 actions.
>
> re:
>> I vaguely remember this coming up at one point but I couldn't find any
>> documentation or old email threads about it.  I'm mostly just asking out
>> of curiosity, I don't have a specific problem that I'm trying to solve
>> or anything.  But if I were to redo my pqact organization I'm wondering
>> if there is a preferred methodology.
>
> The best rule of thumb is to have multiple 'pqact' instances operating
> on multiple pattern-action files when the list of actions to be
> performed is large, or when some of the actions are slow.  There is no
> "best practice" for, say, having only N actions in a pattern-action
> file since the speed that the actions will be performed is a function
> of how fast/slow each action is.  Sites invariably will need to do
> their own tuning to find the right balance of speed and use of
> resources (more 'pqact' instances will, of course, use more resources
> like CPU, RAM, etc.).
>
> Cheers,
>
> Tom
> --
> +----------------------------------------------------------------------+
> * Tom Yoksas                                      UCAR Unidata Program *
> * (303) 497-8642 (last resort)                           P.O. Box 3000 *
> * yoksas@xxxxxxxx                                    Boulder, CO 80307 *
> * Unidata WWW Service                     http://www.unidata.ucar.edu/ *
> +----------------------------------------------------------------------+
>
> _______________________________________________
> NOTE: All exchanges posted to Unidata maintained email lists are
> recorded in the Unidata inquiry tracking system and made publicly
> available through the web.  Users who post to any of the lists we
> maintain are reminded to remove any personal information that they
> do not want to be made public.
>
>
> ldm-users mailing list
> ldm-users@xxxxxxxxxxxxxxxx
> For list information or to unsubscribe,  visit: 
> https://www.unidata.ucar.edu/mailing_lists/
>
> _______________________________________________
> NOTE: All exchanges posted to Unidata maintained email lists are
> recorded in the Unidata inquiry tracking system and made publicly
> available through the web.  Users who post to any of the lists we
> maintain are reminded to remove any personal information that they
> do not want to be made public.
>
>
> ldm-users mailing list
> ldm-users@xxxxxxxxxxxxxxxx
> For list information or to unsubscribe,  visit: 
> https://www.unidata.ucar.edu/mailing_lists/
>

--
+----------------------------------------------------------------------+
* Tom Yoksas                                      UCAR Unidata Program *
* (303) 497-8642 (last resort)                           P.O. Box 3000 *
* yoksas@xxxxxxxx                                    Boulder, CO 80307 *
* Unidata WWW Service                     http://www.unidata.ucar.edu/ *
+----------------------------------------------------------------------+

_______________________________________________
NOTE: All exchanges posted to Unidata maintained email lists are
recorded in the Unidata inquiry tracking system and made publicly
available through the web.  Users who post to any of the lists we
maintain are reminded to remove any personal information that they
do not want to be made public.


ldm-users mailing list
ldm-users@xxxxxxxxxxxxxxxx
For list information or to unsubscribe,  visit: 
https://www.unidata.ucar.edu/mailing_lists/


  • 2020 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the ldm-users archives: