[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20030516: ldmadmin stop differences in LDM-6 (cont.)



>From: Chris Novy <address@hidden>
>Organization: SIU
>Keywords: 200305160023.h4G0NBLd007554 LDM-6 ldmadmin stop

Hi Chirs,

re: how long it takes the LDM-6 to exit

>Tonight's stop at 1PM took 6:21.  This afternoon it only :30.

The length of time it will take the LDM to stop will depend on the
speed of the link/responsiveness of either an upstream feeder or
downstream machine that is being fed.  The design of the shutdown
process is to not simply kill outright processes that have the queue
open for reading or writing, but, rather, to send them a notification
(the SIGTERM signal) that they recognize as a request to shutdown.
They will finish whatever they are doing -- including waiting on an
up/down stream connection -- before exiting.

>OS is SunOS 5.8.  Not sure what the compiler is. How do I tell?

During the 'configure' process of the build, a compiler is selected
from those accessible in the PATH of the user building the LDM.  On
Solaris, the attempt is made to use the 'c89' compiler if it exists
(i.e., if you have installed Sun's compilers).  The typical options we
see on Solaris systems is either c89 or gcc, and c89 will be chosen
instead of gcc unless one sets the CC environment variable to gcc
before running configure.

Finding the version of the compiler is as simple as:

% c89 -V

-- or --

% cc -V

-- or --

% gcc --version

>It's interesting that reverting back to previous version still results in 
>slow shutdown problem.  Ordinarily shutdown would only take :15 to complete 
>(based on ps -ef verification).

Again, the speed of the shutdown is a function of the feed connections,
not anything wrong in the code.  If we simply killed each rpc.ldmd,
the shutdown would be fast, but the user may be forced into the
situation of having to remake the queue which does two things that
are bad:

- making a new queue is slow (the queue and queue structures get zeroed
  as they are made)

- data requests after remaking a queue are for the previous hour's worth
  of all feeds.  This can load a user's network unnecessarily

>Would knowing which tasks are slow to quit help you any?  I could do a PS 
>when the system is up and then one while it's trying to shut down and we 
>could compare processes.

No, we already know that the process will be a child rpc.ldmd.  Since
the child is still alive, the parent rpc.ldmd will also be there since
the parent waits for all children to exit.  The most typical situation
is to have one child and the parent rpc.ldmd.  If, however, all feeds
are slow, you can end up with several child rpc.ldmds and the one
parent.

>I assume the order of process startup is dictated 
>by what's in ldmd.conf?

Yes, this is correct.  The order of shutdown, however, is goverened by
the individual processes themselves as they are making the
determination of when it is OK to exit.

I really don't think you are having a problem given the 30 seconds that
it took to shutdown in one attempt.

Oh, I forgot to mention that one other thing that ldmadmin does in
LDM-6 is flush the product queue from memory back to disk.  This is the
first thing that happens, and it does take a bit of time, but it should
not be excessive unless you are running with a really large queue (over
2 GB) or if your machine is heavily loaded by non-LDM-related
processing.

I am curious about how often you stop and restart your LDM.  The
typical way that the LDM is used is folks setup what data they want to
receive and which machines they are willing to feed, and then they turn
the LDM on and don't look at it for days/weeks/months.  I am not
kidding about the "months" comment; the NOAAPORT ingesters we are
running typically stay up for several months unless we decide to update
the LDM.

Cheers,

Tom