[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20020821: Disk Full - XCD_START.LOG not closing



>From: Mike Voss <address@hidden>
>Organization: SJSU
>Keywords: 200208211540.g7LFerK22645 McIDAS xcd_run

Mike,

>I had this problem before and made the changes you recommended in
>http://www.unidata.ucar.edu/glimpse/ldm/5197.

OK.  This should take care of the problem where the output from XCD processes
are attempting to write to a file that has been deleted.

>My disk has filled up a few times recently, and again last night because the
>XCD_START.LOG grew to be huge.

If XCD_START.LOG was huge AND it was being properly written to, it means
that there is a problem with one or more of the decoders or XCD setup
in the LDM.  Did you happen to take a look at the contents of XCD_START.LOG?

>I check the size of this file once in awhile and it seems to be of
>reasonable size.....and then all of a sudden it will be huge and fill
>up the disk. So, I'm looking for some more help in figuring out 
>what's happening...maybe I edited the xcd_run file incorrectly...?

If the logging is working correctly, then the contents of XCD_START.LOG
will tell us how many times one or more of the XCD data monitors is
being restarted.  It may also give us a clue as to why this is happening.

>Here is the section which you had me edit:
>-------
>
>if [ $1 = "MONITOR" ] ; then
>  rm -f $MCLOG
>  touch $MCLOG
>fi
>
>
>exec 2>> $MCLOG 1>&2
>echo "Starting $@ at $time"
>
>
>case $1 in
>    DDS)     exec ingetext.k DDS;;
>    IDS)     exec ingetext.k DDS;;
>    PPS)     exec ingetext.k DDS;;
>    DDPLUS)  exec ingetext.k DDS;;
>    HRS)     exec ingebin.k  HRS;;
>    HDS)     exec ingebin.k  HRS;;
>    GRID)    exec ingebin.k  HRS;;
>    MONITOR) exec startxcd.k;;
>    *)       echo "xcd_run action $1 incorrectly specified, failing...";;
esac
>----------------------------------------

This is correct.  I assume that the copy of xcd_run that you modified was
the one that is being used by the LDM.  As a first step in the investigation,
I would:

o login as 'ldm'
o run which xcd_run and make sure that the copy that is indicated is
  the one that you modified

>You can ssh into rossby.met.sjsu.edu if need be. You need to go through
>metsun1.met.sjsu.edu first. One thing to keep in mind is that I'm not
>really using MCIDAS currently.....so don't feel obligated to straighten
>it all out if it's a mess. :-)

I logged onto rossby and see that the XCD_START.LOG is filled with
indications of segmentation violations:

DMSFC        Starting: Surface Hourly (SAO) Metar Decoder
Program terminated, segmentation violation
startxcd.k: m0monexe - restart of :DMSFC                                    2531

This is where the problem lies and needs to be fixed.

I tried rebuilding dmsfc.k, but that copy was core dumping also.  BTW,
I seem to remember this happening on your system once before.  Am I
remembering this correctly?

For grins (and since it takes almost no effort on my part), I brought
over the latest bugfixes for McIDAS v7.8 to rebuild McIDAS on rossby.
We'll see if that fixes the segmentation violation problem or not
after the rebuild finishes.  Again, I realize that you are not actively
using McIDAS, so the simplest thing would have been to turn off surface
decoding after discovering that the surface decoder is core dumping.
This would leave the problem in place, however, if you decided that
you wanted the data at some point.

>Thanks for any help or suggestions, cheers,

I will bounce back after the build finishes with any additional comments.

Tom