[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20021103: File System Full (ldm data directory)



>From: "Jennie L. Moody" <address@hidden>
>Organization: UVa
>Keywords: 200211032250.gA3MoYX18513 LDM McIDAS scour

Hi Jennie,

Tom here.

>Hello!  Guess its been a while since I have had a question
>here, mostly just because I have been ignoring the prospect
>of tacking the job of upgrading things on my system (still
>need a newer version of McIDAS, think maybe I need to
>upgrade the ldm).  Maybe things breaking will be a motivation
>who knows.  

This would be a good time to move on upgrading your LDM and McIDAS
installations.  Right now, you have two different versions of
McIDAS running: one for the remote ADDE server stuff, and the
other with your and Tony's mods for your ozone research.

>For the past few days I have had problems with the /p4 directory
>(where I put the ldm data queue, the mcidasd directory and the
>xcd directory).  I am trying to figure out why this is happening.
>It looked like things are still getting removed (there are only
>three days of data, which was standard as I recall), so I am
>wondering if I spaced out and missed an announcement that we
>needed to move to larger queue size or something?

A file system filling would not be a result of needing a bigger
queue.  What is most likely happening is that decoded data files
are getting larger with more and more data available in the IDD.

>(Well,
>I know I have been spaced out, but the question still remains,
>did I miss a communique?).

No, you have not missed any important directives as far as the
LDM is concerned.  You have missed several LDM upgrades, but
this is not likely to have caused you any problems.

>I am just going to stop the ldm
>for now and delete the queue.  But, I guess I could use some
>help in figuring out what might have gone wrong?

What we need to figure out first is where the space is going.  Since
the queue does not grow, this is not where the problem lies.

>Thanks a bunch.

No worries.

>I know I have been out of touch, hope all goes
>well for everyone out there.  Congratulations on the new boss
>(Mohan), I hope that works out really well for everyone.       

Mohan starts in January.

>Thanks in advance to whomever jumps in here to offer some
>guidance.

I think the first thing to do is to logon to windfall and see where
the data on /p4 is being used.  The next thing is to find out if
the file system filling problem is being caused by a scouring failure
or simply more data.  The next thing to do is fix any problems found
and agree on an upgrade plan.

>(Sorry, I just sent this from the ldm account, and didn't
>have my signature file...so this is a repeat just to clarify
>where it was coming from!)

I deleted that copy this morning when I saw this one.

OK, so I logged in to windfall as LDM (first as you, then as root, then
as LDM -- it would be good to get the ldm, mcidas, and mcadde logins from
you).

The first thing I see right off is that /p4 is full because /p4/data/xcd
is using virtually all of the space:

%df -k 
Filesystem            kbytes    used   avail capacity  Mounted on
 ...
/dev/dsk/c0t0d0s7    8790689 8670619   32164   100%    /p4
 ...

%cd /p4
%du -k .
 ...
8082869 ./data/xcd
 ...

Now, the big disk users in /p4/data/xcd are the GRID files and the .XCD
files.  The GRID files are understandable given how large McIDAS decoded
GRID files are.  The .XCD are, of course, BIG (they contain all of
the textual data in the IDD for an entire day), but there really should
only be one of them.

So, the first thing to figure out is why the scouring of the .XCD and
related McIDAS index files is not working.

Wow!  I ran the mcscour.sh entry in your ldm's crontab, and 8 GB of space
in /p4 became available:

%/home/mcidas/bin/mcscour.sh
$ df -k
Filesystem            kbytes    used   avail capacity  Mounted on
 ...
/dev/dsk/c0t0d0s7    8790689 1682252 7020531    20%    /p4
 ...

Is it possible that you and/or Tony is logged on and working on the same
problem?

I decided to take a harder look at the scouring you have setup for the
files created by McIDAS-XCD.  Here is the relevant portion in the
mcscour.sh script that is being used (/home/mcidas/bin/mcscour.sh):

MCPATH=$MCPATH PATH=$PATH LD_LIBRARY_PATH=$LD_LIBRARY_PATH mcenv << 'EOF'

qrtmdg.k GRID 1 40 3
qrtmdg.k GRID 101 110 2
qrtmdg.k GRID 5001 5010 1
qrtmdg.k GRID 5051 5090 2
qrtmdg.k GRID 5101 5200 2
qrtmdg.k GRID 5201 5400 1
qrtmdg.k GRID 5401 5480 2
qrtmdg.k GRID 5501 5620 2
qrtmdg.k MD 1 70 2
doqtl.k 71 80 2
doqtl.k 81 90 2
doqtl.k 91 100 2
delwxt.k 1 10
igu.k DEL 132
lwu.k DEL VIRT9001
lwu.k DEL VIRT9002
lwu.k DEL ROUTEPP.LOG
# lwu.k DEL ASUS 1100 1200 4 - commented out by Anne
exit

EOF

It is entirely possible that the new grids in the datastream have
pushed your disk use up to the critical point.  For instance, with the
XCD decoding setup that is in place on windfall, the UK Met global
grids will be put into GRID5001 - GRID5010.  Also, the grids being
issued by the river forecast offices will also be put in these GRID
files.  This can take quite a bit of disk space -- the estimate is that
_if_ one is decoding all of the grib messages that come across
NOAAPORT, then you will need on the order of 6.5 GB of disk for every
day that you want to keep on line.  Your scouring setup is saying to
keep a couple days of various GRID files, so it is possible that you
are the victim of the Weather Services push to send out more and more
gridded data.  This situation will only get worse: the WX service has
plans to consolidate the two satellite imagery channels of NOAAPORT
into one to free up a channel for even more gridded data!

Another possibility is that a needed modification to the XCD startup
routine 'xcd_run' was causing a hidden file to grow to consume your
disk space.  I don't think that this is the cause of your problems
since the hidden file _should_ disappear when the LDM is stopped.  In
order to make sure that this wasn't the problem, I went ahead and made
the needed change to the copy of xcd_run being used on your system,
/usr/local/ldm/util/xcd_run.

So what to do next?  I suggest the following:

1) we should upgrade the LDM to the latest release.  This won't fix
   anything as far as disk space used by the LDM goes, but it will
   bring you up to rev.

2) review the LDM log files to see if your 300 MB queue is large
   enough

3) we need to take a hard look at your McIDAS setup so that we can:

   a) save all of your locally developed code

   b) upgrade you to a current release of McIDAS-X, -XCD

   c) remove the separate version of McIDAS-X that is being used by
      your McIDAS remote ADDE server (this is the work I alluded
      to in an earlier email)

Steps 1) and 2) are very simple.  I can do this in a matter of minutes.
I would then propose to remove previous LDM code to clean up disk space.

Step 3) is quite a bit more complex.  This will require that you and/or
Tony identify ALL of the local modifications you have made to McIDAS
(this includes locall developed code AND modifications to configuration
files like those used for XCD GRID decoding).  We then need to save
these off into a separate development directory which will be used
later as a local repository for your work. You will probably say that
_I_ am the one that recommended the current way that things are setup,
and you would be correct.  Luckily, I keep learning about how things
can be setup more efficiently for sites that are doing their local
McIDAS development.

I am willing to dive in to help, but I need your go ahead and help
before tackling the McIDAS-related modifications.

As I bring this note to a close, I noticed that somebody has restarted
the LDM on windfall.  Since this is what I was going to do anyway, I
will sign off for now.

Tom

>From address@hidden Mon Nov  4 09:32:07 2002
>Subject: NEVER MIND

Hey folks,

I guess I don't have a problem.  I found a bunch of stuff
cluttering up the /p4/data directory on windfall, and got
rid of it (old wmo notices, log files).  This seemed to
clear enough space, and I restarted the ldm, so I'll
keep my fingers crossed that this was the only problem.

sorry to bug ya for nothing.

Jennie