[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20020716: LDM disk full (but it's not) error (cont.)



>From:  Unidata Support <address@hidden>
>Organization:  SJSU
>Keywords: 200207151841.g6FIfxa29688 LDM McIDAS-XCD disk fill

Mike,

If you are running McIDAS-XCD, then I think I know the cause of your
problem and its solution.  If you are not running McIDAS-XCD, then
more investigation is necessary.

If you are running McIDAS-XCD, then the McIDAS script run by the LDM,
xcd_run, is probably at fault.  When this script is run at LDM startup
(from an 'exec "xcd_run MONITOR" line in ~ldm/etc/ldmd.conf) it sets up
logging to the file ~mcidas/workdata/XCD_START.LOG.  The idea behind
this file is to let the user see how often XCD data monitors are
restarted by the XCD supervisory routine startxcd.k.

It turns out that I had a logic error in this file that I only became
aware of very recently.

Check the copy of xcd_run on your system and see if it doesn't have
code that looks like:

 ...

# Send all messages to the log file and start the process

exec 2>> $MCLOG 1>&2
echo "Starting $@ at $time"

case $1 in
    DDS)     exec ingetext.k DDS;;
    IDS)     exec ingetext.k DDS;;
    PPS)     exec ingetext.k DDS;;
    DDPLUS)  exec ingetext.k DDS;;
    HRS)     exec ingebin.k  HRS;;
    HDS)     exec ingebin.k  HRS;;
    GRID)    exec ingebin.k  HRS;;
    MONITOR) rm -f $MCLOG
             exec startxcd.k
             ;;
    *)       echo "xcd_run action $1 incorrectly specified, failing..."
             ;;
esac

 ...


The error here is the deleting of the file referred to by MCLOG upon
startup of startxcd.k.  The reason this is an error is that all output
from things run by xcd_run have already been redirected to the MCLOG
file, so they will continue to write to that file descriptor even after
the file has been deleted.

The fix for this situation is to edit the copy of xcd_run you are using
with your LDM.  Change the above section to:

 ...

# Send all messages to the log file and start the process

if [ $1 = "MONITOR" ] ; then
  rm -f $MCLOG
  touch $MCLOG
fi

exec 2>> $MCLOG 1>&2
echo "Starting $@ at $time"

case $1 in
    DDS)     exec ingetext.k DDS;;
    IDS)     exec ingetext.k DDS;;
    PPS)     exec ingetext.k DDS;;
    DDPLUS)  exec ingetext.k DDS;;
    HRS)     exec ingebin.k  HRS;;
    HDS)     exec ingebin.k  HRS;;
    GRID)    exec ingebin.k  HRS;;
    MONITOR) exec startxcd.k;;
    *)       echo "xcd_run action $1 incorrectly specified, failing...";;
esac

 ...


After making this change, you will need to stop and restart your LDM:

ldmadmin stop
<wait until all LDM processes have exited>
ldmadmin start

Now, the MCLOG file will exist before all output is redirected to it
and things will work correctly.

Sorry for the problems...

Tom

>I believe that your problem does sound like a McIDAS file was still
>open. Even if the file is deleted, the OS won't get the space back 
>if the program still has the file handle open and is writing to it.
>
>Stopping LDM likely caused the xcd log file to close, so maybe that was 
>the culprit.
>
>Tom is out of the office currently, but can probably recreate your solution 
>and future avoidance when he returns.
>
>Steve Chiswell
>Unidata User Support
>
>
>>From: Mike Voss <address@hidden>
>>Organization: UCAR/Unidata
>>Keywords: 200207151841.g6FIfxa29688
>
>>Hello,
>>I'm running LDM 5.1.2 on Solaris 2.7. Recently I started getting a strange
>>problem. My /export/home disk will fill up...or claim to be filled up. I will
>>do the normal trouble shooting thinking the disk is really full, i.e. du -sk *
>>to see who is hogging all the space, but the disk is not really full. Then I
>>got a hunch it was associated with the LDM, and sure enough after I stopped 
>>and restarted the LDM the disk usage reported was back to normal. This has
>>happened 3 or four times now in the past few weeks. I perused the archives
>>and found http://www.unidata.ucar.edu/glimpse/ldm/3758 and 
>>http://www.unidata.ucar.edu/glimpse/ldm/3762
>>
>>This user seemed to have a similar problem, but the solution is not clear to 
>>me. It does seem like maybe a file getting written to gets deleted and is
>>really still there somehow. But why /export/home? My pqact.conf does not
>>write anything to /export/home...but MCIDAS does have the workdata directory
>>there...
>>
>> hmmm. Before giving you more symptoms, I'll stop at this point and see if
>>you have any ideas or hunches. Thank You.
>>
>>Mike
>>--------------------------
>>Mike Voss                                
>>Department of Meteorology               
>>San Jose State University                        
>>One Washington Square                      
>>San Jose, CA 95192-0104   
>>             
>>408.924.5204 voice
>>408.924.5191 fax   


>From address@hidden Mon Jul 29 12:21:20 2002
>Subject: Re: 20020716: LDM disk full (but it's not) error (cont.)

Tom,

I've been away on vacation :). I edited "xcd_run" as you suggested and
restarted the LDM. Thanks for the fix!

Cheers,
Mike