[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

19990310: Route Post Process Failure (cont.)



>From: "Jennie L. Moody" <address@hidden>
>Organization: UVa
>Keywords: 199903091926.MAA12825 McIDAS ROUTE SYSIMAGE.SAV

Jennie,

>Whew, what a day, haven't had a chance to respond and
>say thanks for fixing things last night.  Everything has
>been stable today.  Much obliged.

No problem.  I meant to send a quick note this morning.  Apparently,
mail being sent to you and others at UVa was bouncing.  The message
on the bounce had to do with not being able to write to a directory;
sound familar?  I think that whatever was the cause of your problem
(the orphaned shared memory segment) with the LDM and PP BATCH
invocations might also be the mail problem.

re: use ADDE to look at imagery elsewhere
>Yeah, but I wanted undergrads to look at material on the
>webpage....doesn't really matter, but I get irritated with
>myself that I cannot keep this all running, consistantly.

This problem probably had something to do with the operating system
and not the LDM or McIDAS.

re:The line that is unsettling is "unsetting MCPATH environment variable".
>I noticed this was still in the log file after your restarted the
>system, but only for the first few minutes?

Hmm...  Very interesting!!

re: GRIB data
>Okay, well, I did get a response from COMET and now have the
>GRIB files, now I need to build a spool and make the decoders
>read it and write out my mcidas grids right?  More about
>that later.

OK, but it will have to be on Friday.

re: getting questions
>Right, I always forget that, your job is to some extent 
>dependent on the ignorance of users like me...what job security.

Right.  Job security :-)

re: dial up access at UVa
>Right, the University maintains a number of dialin access lines,
>two baud rates (14.4 and 28.8) and 3 time limits (one hour, 30 minutes
>and 15 minutes).  I can *never* get a one hour line (unless its
>after 1am), and its harder to get the 28.8 lines, so I often settle
>for the 30 minute 14.4 lines.  It still often takes multiple redials
>(10-40?) to get a free line.  

Sounds like you need to connect to the Internet through an ISP and
bypass UVa's modem pool.  It would be good for your kids as well.

>This didn't used to be as much of a problem, and in fact for a long
>time the University didn't time their lines (this started about
>10-18 months ago? Ive been working from home for years).  But now, 
>to give more folks a chance to get in briefly to download email or 
>upload/download files they put time limits into effect.  The time 
>limit is hard, no matter what you are doing, you gotta get off, or 
>get thrown off in the middle of it. Truly sucks.    There are definately
>a *lot* more people trying to log in remotely, of course its always
>worse on bad weather days too, when people are stuck at home.

Time to attach to the Internet alright.

re: trying to create a file in ~mcidas/workdata
>Hmm, didn't think of using touch, but did look at permissions
>so didn't think this was a problem??  I _had_ checked this.

This problem was not an easy one, so don't let it get to you.

re: file permissions
>Weird, because permissions hadn't changed (its not like they
>were 777 before when stuff was running??)

Right.  The problem was at the OS level.

re: rebooting
>Do you still recommend rebooting?  This doens't seem like
>such a long time to me, 2 months?

Yes, I do recommend rebooting.  I am also one who doesn't like to restart
machines, but the fact of the matter is that there a lot of software out
there that have memory leaks.

re: copies of ROUTE.SYS and SYSKEY.TAB in strange directories
>Don't know. But this is around the time we got the new ldm running
>(I think), which I let Kevin set up.  I'll delete them.

OK.

re: running ipcs
>I had run ipcs (though did it while running the ldm,
>and didn't realize anything was wrong).

I didn't at first either.  It wasn't until I stopped the LDM that this
struck me as being odd.

re: files in .mctmp
>I forgot to look for .mctmp files

I just wanted to make sure that the LDM initiated McIDAS process had
exited cleanly.  They had not!

re: .mctmp directories are old
>Actually, it seems these other .mctmp directories from Feb 10 must
>be old too?  

Yes, very old.

re: doing a ps looking for mcenv processes
>You looked at mcenv to see what current mcidas jobs might be
>running that would/could have .mctmp files?

Right.

>This one above
>was my session, which actually is still running (I had loaded
>in some images from a case and wanted to save them as gifs
>(to make an image loop for some NOAA guys to look at), I
>wasn't done, but got interrupted and left the session running.

I found that out.

>Is there any problem with this, leaving a session running?

No, not usually.

>Its my account, and no real-time stuff is done as me...its not
>an issue, right.  (Similarly, stopping and starting the ldm
>shouldn't have any impact on a running user session, right?)

Right.

re: removing .mctmp directories
>Okay, so you killed them all. right.   This was the 
>part I had forgotten, of what we went through last time
>we had Postprocess batch failure, that we had old
>stuff hanging around....

This does not usually have to be done, so I wouldn't worry about forgetting
to do it.

re: the procedure
>I get what you did.  I will remember to look for processes (which don't
>exist any longer) that might have allocated shared memory.  

Right.  I try to write down everything I do for two reasons: one to tell
you (whatever user) the process I went through, and two to have in the
tracking system so that I can refer to my notes in the future.

re: how things got hosed up
>Yeah I don't get it...seems like this shared memory segment would have
>been a problem already, it had a time stamp of Feb 2??

It shouldn't have, but it obviously did.

>(Recall, we changed 
>a lot of things since then, including the location of batch.k which
>we moved to the /home/ldma/util directory.)  But for some reason
>the shared memory allocation didn't become a problem until yesterday...
>saying I don't get it would be an understatement.  

Change the location of batch.k shouldn't have done anything.  This was
just one of those things.

>Glad your there to help.  Thanks.

No problem.

Tom