Re: [gembud] Generating NEXRAD radar imagery with gpmap_gf

To: Greg Stossmeister <gstoss@xxxxxxxx>
Subject: Re: [gembud] Generating NEXRAD radar imagery with gpmap_gf
From: pmanousos@xxxxxxxxxxxxxxxxxxx
Date: Mon, 30 Apr 2012 15:56:17 -0400
All - seeing the issue with history files we thought we should alert you 
to a known "bug" with the history command and history file in RHEL that 
caused major slow down and eventual lock up of our system.  It is a known 
problem in RHEL 2.1, 3, 4, 5 and 6.  We only ran into it on RHEL 6.2  but 
the remedy was to add "set history=0" in our .cshrc 

We also added an alias h50 "set history=50" so that when you open an 
terminal window and type h50 it will start remembering your commands. 
Klunky yes, but at least our workstations don't lock up.   I had to log on 
to the REHL customer portal and search for "42739" to get this printout - 
which I cut/paste into this email.  I hope it translates well in your 
email viewer. 

See below - although not sure it will address your original issue.  Pete
-------------------------------------------

Currently Being Moderated
A system slows down due to a  .history file
Article ID: 42739 - Created on: Oct 5, 2010 10:54 PM - Last Modified:  Jan 
23, 2011 9:09 PM 
Issue
The data in the file ".history" becomes malformed and its file size gets 
larger and larger. 
Each command history should be recoded in the file with a  timestamp line 
a command entry line which should end with EOL(End Of  Line). 
Example of normal .history file:
      $ cat -n .history
           1  #+1289787344
           2  set
 
A timestamp and a run command entry are not recorded by turns. Not a run 
command entry but a timestamp is recorded unexpectedly (see the line 4). 
Additionally, some entries are merged unexpectedly (see the line 6 and 7 
below). 
Example of malformed .history file :
$ cat -n .history
           1  #+1289787344
           2  test
           3  #+1289787367
           4  1289787366
           5  #+1289787367
           6  128978#+12897#+1289#+12897test
           7  #+1289#+12897testls##+1289787401l12#+1289787402
           8  st
A system slows down because csh uses a lot of memory to read such a large 
.history file. 
Environment
Red Hat Enterprise Linux 2.1, 3, 4, 5, and 6
Resolution
This issue will be addressed in "Bug 648592 - .history file gets corrupted 
if several scripts run at once" 
There is a temporary workaround by disabling the history of csh 
Set the one of the following lines in ~/.cshrc or on the command line:    
     unset savehist
          OR
      set savehist=

Root Cause
The main issue which should be fixed is that tcsh does not handle 
~/.history file exclusively. The "merge" option causes possibility of a 
little more unexpected behaviour with warning mentioned in man page 
regarding the "-S" built-in command:
...
       history [-hTr] [n]
       history -S|-L|-M [filename] (+)
       history -c (+)
...
               With -S, the second form saves the history  list  to  file-
               name.   If the first word of the savehist shell variable is
               set to a number, at most that many lines are saved.  If the
               second word of savehist is set to ?merge?, the history list
               is merged with the existing history file instead of replac-
               ing  it  (if  there  is one) and sorted by time stamp.  (+)
               ____Merging is intended for an environment like  the  X  
Window
               System  with several shells in simultaneous use.  Currently
               it succeeds only when the  shells  quit  nicely  one  after
               another.____


Additionally, both a value greater than 0 and "merge" option are set to 
savehist on csh and this is the default setting in Red Hat Enterprise 
Linux 5.4 or later:
        $ set | grep savehist
        savehist        (1024 merge)



From:   Greg Stossmeister <gstoss@xxxxxxxx>
To:     gembud@xxxxxxxxxxxxxxxx
Date:   04/30/2012 03:48 PM
Subject:        Re: [gembud] Generating NEXRAD radar imagery with gpmap_gf
Sent by:        gembud-bounces@xxxxxxxxxxxxxxxx



Daryl,
    I noticed .history was bizarre last week - it had a ton of stuff in it 
that didn't look like the normal history stuff I was used to seeing. We 
deleted it this morning and I set my .cshrc to only save the last 40 
commands.

Greg

On Apr 30, 2012, at 1:40 PM, daryl herzmann wrote:

> 
> Offline... How large is your ~/.history file?
> 
> daryl
> 
> On Mon, 30 Apr 2012, Greg Stossmeister wrote:
> 
>> Daryl,
>>    No each process creates a temporary working subdirectory to run in 
based on the process id.
>> 
>> Greg
>> 
>> On Apr 30, 2012, at 1:23 PM, daryl herzmann wrote:
>> 
>>> Greg,
>>> 
>>> Are all these processes running out of the same CWD (directory)?  Try 
creating temp directories for each process and run the code from those 
directories.
>>> 
>>> daryl
>>> 
>>> On Mon, 30 Apr 2012, Greg Stossmeister wrote:
>>> 
>>>> Daryl,
>>>>  I have several shell scripts that I'm running out of cron every 5 
minutes. Each shell script runs 10 gpmap_gf processes in sequence. I've 
tried running 1 - 6 scripts at a time. This typically works fine during 
the day with one of the these scripts completing in about 2 minutes. As 
evening comes on they take longer and longer to run and it seems like that 
take more and more memory. From the "top" command the scripts often use 
500-800 MB of memory but in the evening this seems to mushroom to > 3GB 
per script. The load on the machine at night from these scripts alone 
jumps to >30  and by morning the machine usually dies with out of memory 
errors even though I'm automatically killing the scripts when they run 
longer than 2 minutes.
>>>> 
>>>> Looking at /var/log/debug.log I'm seeing segfault errors:
>>>> 
>>>> Apr 26 17:07:25 sferic kernel: gpmap_gf[2164]: segfault at 0 ip 
000000392692ff7f sp 00007fff8d981128 error 4 in 
libc-2.12.so[3926800000+186000]
>>>> Apr 26 17:07:25 sferic abrt[2179]: saved core dump of pid 2164 
(/export/ldm/home/gempak/GEMPAK6.4.0/os/linux64/bin/gpmap_gf) to 
/var/spool/abrt/ccpp-201
>>>> 2-04-26-17:07:25-2164.new/coredump (827392 bytes)
>>>> Apr 26 17:07:25 sferic abrtd: Directory 
'ccpp-2012-04-26-17:07:25-2164' creation detected
>>>> Apr 26 17:07:25 sferic kernel: gpmap_gf[2239]: segfault at 0 ip 
000000392692ff7f sp 00007ffffd173658 error 4 in 
libc-2.12.so[3926800000+186000]
>>>> Apr 26 17:07:25 sferic kernel: gpmap_gf[2242]: segfault at 0 ip 
000000392692ff7f sp 00007fff8f4df6f8 error 4 in 
libc-2.12.so[3926800000+186000]
>>>> Apr 26 17:07:25 sferic kernel: gpmap_gf[2247]: segfault at 0 ip 
000000392692ff7f sp 00007fff73574d18 error 4 in 
libc-2.12.so[3926800000+186000]
>>>> Apr 26 17:07:25 sferic kernel: gpmap_gf[2261]: segfault at 0 ip 
000000392692ff7f sp 00007fff8bda1358 error 4 in 
libc-2.12.so[3926800000+186000]
>>>> Apr 26 17:07:25 sferic kernel: gpmap_gf[2245]: segfault at 0 ip 
000000392692ff7f sp 00007fff71495a28 error 4 in 
libc-2.12.so[3926800000+186000]
>>>> Apr 26 17:07:25 sferic kernel: Pid 2245(gpmap_gf) over 
core_pipe_limit
>>>> Apr 26 17:07:25 sferic kernel: Skipping core dump
>>>> Apr 26 17:07:25 sferic abrt[2260]: not dumping repeating crash in 
'/export/ldm/home/gempak/GEMPAK6.4.0/os/linux64/bin/gpmap_gf'
>>>> Apr 26 17:07:25 sferic abrt[2279]: not dumping repeating crash in 
'/export/ldm/home/gempak/GEMPAK6.4.0/os/linux64/bin/gpmap_gf'
>>>> Apr 26 17:07:25 sferic kernel: gpmap_gf[2289]: segfault at 0 ip 
000000392692ff7f sp 00007fffca7118a8 error 4 in 
libc-2.12.so[3926800000+186000]
>>>> Apr 26 17:07:25 sferic kernel: gpmap_gf[2286]: segfault at 0 ip 
000000392692ff7f sp 00007fffef00ac98 error 4 in 
libc-2.12.so[3926800000+186000]
>>>> Apr 26 17:07:25 sferic kernel: gpmap_gf[2303]: segfault at 0 ip 
000000392692ff7f sp 00007fff92019618 error 4 in 
libc-2.12.so[3926800000+186000]
>>>> Apr 26 17:07:25 sferic kernel: Pid 2303(gpmap_gf) over 
core_pipe_limit
>>>> 
>>>> Greg
>>>> 
>>>> On Apr 30, 2012, at 12:10 PM, daryl herzmann wrote:
>>>> 
>>>>> On Mon, 30 Apr 2012, Greg Stossmeister wrote:
>>>>> 
>>>>>> Does anyone generate a lot of individual NEXRAD level III products 
with gpmap_gf? I'm trying to generate real-time plots of NOQ Reflectivity 
and NOU Velocity from 30 radars in the midwest and its crashing my server 
after a few hours, even when I only run 3 plots at a time. I'm running 
GEMPAK6.4.0 on a RHEL 6 machine with 64 GB of memory. I'm wondering what 
I'm doing wrong and if someone has a better way of doing this.
>>>>> 
>>>>> crashing your server, how?  Exhausting memory?  kernic panic?  Are 
the processes not going away once running them?  How are you running them?
>>>>> 
>>>>> daryl
>>>>> 
>>>>> --
>>>>> /**
>>>>> * Daryl Herzmann
>>>>> * Assistant Scientist -- Iowa Environmental Mesonet
>>>>> * http://mesonet.agron.iastate.edu
>>>>> */
>>>> 






_______________________________________________
gembud mailing list
gembud@xxxxxxxxxxxxxxxx
For list information or to unsubscribe,  visit: 
http://www.unidata.ucar.edu/mailing_lists/ 




-----------------------------------------
The information contained in this message is intended only for the
personal and confidential use of the recipient(s) named above. If
the reader of this message is not the intended recipient or an
agent responsible for delivering it to the intended recipient, you
are hereby notified that you have received this document in error
and that any review, dissemination, distribution, or copying of
this message is strictly prohibited. If you have received this
communication in error, please notify us immediately, and delete
the original message.
References:
- [gembud] Generating NEXRAD radar imagery with gpmap_gf
  - From: Greg Stossmeister
- Re: [gembud] Generating NEXRAD radar imagery with gpmap_gf
  - From: daryl herzmann
- Re: [gembud] Generating NEXRAD radar imagery with gpmap_gf
  - From: Greg Stossmeister
- Re: [gembud] Generating NEXRAD radar imagery with gpmap_gf
  - From: daryl herzmann
- Re: [gembud] Generating NEXRAD radar imagery with gpmap_gf
  - From: Greg Stossmeister
- Re: [gembud] Generating NEXRAD radar imagery with gpmap_gf
  - From: Greg Stossmeister
2012 messages navigation, sorted by:
1. Thread
2. Subject
3. Author
4. Date
5. ↑ Table Of Contents
Search the gembud archives: