[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
pqact not starting

Subject: pqact not starting
Date: Fri, 30 Nov 2001 11:46:34 -0700
Hi Brian,

Well, the one problem I thought I had remaining seems to have fixed
itself. That was that 'ldmadmin start' would start the ldm, but would
never return.  But now it seems to be behaving properly.

Here's what I did.  First, I modified .cshrc - if you look in there
you'll see the change I made to the PATH variable.  In running things
from the command line, it didn't seem that executables were being found
properly.  In looking at the environment, PATH was a little screwy. 
That could affect the running of the ldm.  

The other change I made was to the size of the product queue.  Look in
~/bin/ldmadmin and search for 'pq_size'.  You were still using the
default pqsize of 100Mb, which is too small for what you were
requesting.  I've never seen that problem display your particular
symptoms before, i.e., pqact being totally stymied.  But increasing the
queue size to 500Mb seems to have helped.  In order to do this, I
stopped the ldm, deleted the queue ('ldmadmin delqueue'), remade the
queue ('ldmadmin mkqueue') and then restarted the ldm.  When you upgrade
your ldm, you'll have to remember to migrate that change into your
latest ldmadmin script, i.e., change the default queue size that comes
with the distribution to the size that is appropriate for your site.

I also put 'exec pqact' back in ldmd.conf.  pqact now appears to be
starting and running properly.

Every time I start the ldm, I immediately follow that with an 'ldmadmin
tail'.  At the top of the new log that is created when you restart, you
should always see something like this, although not necessarily in this
exact order although it will be similar:  (I have annotated the log a
bit)

Nov 30 18:28:21 ldm rpc.ldmd[2560]: Starting Up (built: Aug 31 2000
11:48:33) 
Nov 30 18:28:21 ldm pqact[2562]: Starting Up 
                    ^^^^^  pqact started here
Nov 30 18:28:21 ldm pqbinstats[2561]: Starting Up (2560) 
Nov 30 18:28:21 ldm amelia[2563]: run_requester: Starting Up:
amelia.geol.iastate.edu 
Nov 30 18:28:21 ldm amelia[2563]: run_requester: 20011130182513.192
TS_ENDT {{ANY,  ".*"}} 
Nov 30 18:28:21 ldm stokes[2565]: run_requester: Starting Up:
stokes.metr.ou.edu 
Nov 30 18:28:21 ldm striker[2564]: run_requester: Starting Up:
striker.atmos.albany.edu 
Nov 30 18:28:21 ldm striker[2564]: run_requester: 20011130182437.852
TS_ENDT {{NLDN,  ".*"}} 
Nov 30 18:28:21 ldm stokes[2565]: run_requester: 20011130174744.645
TS_ENDT {{NNEXRAD,  "/p...(DVN)"}} 
                    ^^^^ all the above are spawning processes on remote
hosts and requesting certain data sets
Nov 30 18:28:21 ldm amelia[2563]: FEEDME(amelia.geol.iastate.edu):
reclass: 20011130182513.192 TS_ENDT {{FSL2|UNIDATA,  ".*"}} 
                    ^^^^^ amelia is only willing to send you FSL2 and
UNIDATA, not ANY
Nov 30 18:28:21 ldm amelia[2563]: FEEDME(amelia.geol.iastate.edu): OK 
Nov 30 18:28:22 ldm striker[2564]: FEEDME(striker.atmos.albany.edu): OK 
Nov 30 18:28:22 ldm stokes[2565]: FEEDME(stokes.metr.ou.edu): OK 
                    ^^^^^^^^^^^ all the above say your requests are
acceptable to the remote hosts
Nov 30 18:28:23 ldm ldm[2572]: Connection from ldm.iihr.uiowa.edu 
Nov 30 18:28:23 ldm ldm[2572]: Connection reset by peer 
Nov 30 18:28:23 ldm ldm[2572]: Exiting 
                    ^^^^^^^^^^ this connection from your localhost must
always be successful


I can now see products as they come in via 'ldmadmin watch'.  

I also checked out your current queue performance with pqmon:

[ldm@ldm ~]$ pqmon -i5
Nov 30 18:29:07 pqmon: Starting Up (2578)
Nov 30 18:29:07 pqmon: nprods nfree  nempty      nbytes  maxprods 
maxfree  minempty    maxext  age
Nov 30 18:29:07 pqmon:  44460     1   65402   240851720     44460       
1     65402 209151224 5398
Nov 30 18:29:12 pqmon:  44504     1   65358   240881168     44504       
1     65358 209121776 5403
Nov 30 18:29:17 pqmon:  44523     1   65339   240893496     44523       
1     65339 209109448 5408

This is with the new queue, which has been in use for over an hour so
the stats are fairly representative of a stable state.  Probably the
last field is the most useful for you.  It shows the age of the oldest
product in seconds.   Here we can see that you have about 1.5 hours
worth of data in your queue.  We recommend that you keep at least one
hour's worth, so you're in good shape now.

You should check the email to user ldm regularly.  As of now you have
3150 messages.   Most are unimportant.  But if something goes wrong in a
process invoked via cron, that's a likely place for a message to
appear.  It does appear that you are failing over now and then.  I did
not investigate this further, but you should take a look at those
messages and ensure that's working properly.

I'm still working on the NEXRAD feed from stokes.  It was on for a
while, but doesn't seem to be happening now.  I hope to know something
for sure by the end of the day.

Please let me know if you have further questions.

Anne
-- 
***************************************************
Anne Wilson                     UCAR Unidata Program            
address@hidden                 P.O. Box 3000
                                  Boulder, CO  80307
----------------------------------------------------
Unidata WWW server       http://www.unidata.ucar.edu/
****************************************************
Prev by Date: LDM: pq_del_oldest problem again
Next by Date: 20011130: KWSI station of origin and arbitrary directories in $LDMHOME
Previous by thread: Re: pqact not starting
Next by thread: Re: pqact not starting
Index(es):
- Date
- Thread