[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20021107: disk filled with McIDAS-XCD decoded GRID files (again)



>From:  "Jennie L. Moody" <address@hidden>
>Organization:  UVa
>Keywords:  200211072217.gA7MHiX08669 LDM McIDAS-XCD cron

Jennie,

>I had to shut down the ldm this afternoon, Tony called
>me (I have been continuing to work from home given the
>guesome office situation) and reported that we had a full
>disk error again.  I stopped the ldm, and deleted the
>queue, but I first listed the size of the directories
>in /p4/data....the xcd directory was huge.  The queue
>was also bigger than it used to be 400MB versus 300MB.

The queue is not the problem.  There is no need to delete it if
you run out of disk.  Also, it is bigger because I increased its
size when I loaded in 5.2.1.

>Maybe I need to trim our data request down significantly?

I jumped onto windfall as soon as I saw your email.  The problem is that
the scouring is not being run for some reason.

Hmm...  I was in the process of changing the crontab entry in the 'ldm'
account to run mcscour.sh in /usr/local/ldm/util (I copied the version
from /home/mcidas/bin to /usr/local/ldm/util to tidy up), when I saw
the following upon exiting the change for crontab:

"/tmp/crontab6oaq_w" 25 lines, 1438 characters 
cron may not be running - call your system administrator

This is the problem.  cron is not running on windfall.

windfall: /usr/local/ldm/util $ ps -eaf | grep cron
     ldm 11777 11705  0 17:48:43 pts/4    0:00 grep cron

The 'ps' listing should have had a line that looks like:

(laraine.unidata.ucar.edu) 1250 % ps -eaf | grep cron
    root   208     1  0   Oct 23 ?        1:05 /usr/sbin/cron
 support 28595 10557  0 15:48:31 pts/2    0:00 grep cron

Give this, I tried restarting cron (as 'root'):

su -
<pass>
/usr/sbin/cron

and got:

windfall: / # /usr/sbin/cron
windfall: / # ! cannot start cron; FIFO exists Thu Nov  7 17:50:21 2002
! ******* CRON ABORTED ******** Thu Nov  7 17:50:21 2002

I read the man page for cron and saw:

 ...

     Since cron never exits, it should  be  executed  only  once.
     This  is done routinely through /etc/rc2.d/S75cron at system
     boot time.  The file /etc/cron.d/FIFO is used  (among  other
     things) as a lock file to prevent the execution of more than
     one instance of cron.

 ...

I deleted /etc/cron.d/FIFO and then was able to start cron:

windfall: / # rm /etc/cron.d/FIFO
rm: remove /etc/cron.d/FIFO (yes/no)? y
windfall: / # /usr/sbin/cron
windfall: / # ps -eaf | grep cron
    root 11807     1  0 17:53:55 ?        0:00 /usr/sbin/cron
    root 11818 11781  0 17:54:43 pts/4    0:00 grep cron

Now, your cron entries, including the one for mcscour.sh, should work again.
We need to verify that they are, however, since your disk will fill if
they are not.  The easiest way I know of doing this is by checking
the timestamp on the file /usr/local/ldm/logs/mcscour.log (the time
stamp on it as I write this is Nov  7 17:57 mcscour.log, but that
is because I ran mcscour.sh by hand; see below).

 ... (stuff deleted)

>This last stuff was just a list of the ldm processes
>running before I shut it down.

OK.

>Guess I am looking for suggestions here.

I think that the entire problem was cron not running.  Why it was not 
running, I can't say.

>Won't be back on til much later this evening,
>so I guess this is something to consider for
>tomorrow?

I ran mcscour.sh by hand from the 'ldm' account:

windfall: /usr/local/ldm/util $ ./mcscour.sh
windfall: /usr/local/ldm/util $ 

and now see needed disk space in /p4:

windfall: /usr/local/ldm/util $ df -k
Filesystem            kbytes    used   avail capacity  Mounted on
 ...
/dev/dsk/c0t0d0s7    8790689 4143865 4558918    48%    /p4
 ...

Now that there is disk available for decoding, I restarted the LDM:

ldmadmin delqueue          <- since the queue was gone, this was not needed
ldmadmin mkqueue           <- create a new queue
ldmadmin start

So, given the problems with inetd yesterday and cron not running, I
think that it would be a good thing to reboot windfall.  Also, I would
advise being on hand for the reboot _just in case_.

Tom
--
+-----------------------------------------------------------------------------+
* Tom Yoksas                                             UCAR Unidata Program *
* (303) 497-8642 (last resort)                                  P.O. Box 3000 *
* address@hidden                                   Boulder, CO 80307 *
* Unidata WWW Service                             http://www.unidata.ucar.edu/*
+-----------------------------------------------------------------------------+