>From: Gilbert Sebenste <address@hidden> >Organization: NIU >Keywords: 200210050242.g952g0127088 McIDAS-XCD DMRAOB DMSYN Gilbert, >Thanks for all the hard work! What is the diagnosis? My opinion is that there is something wrong with either GCC 3.2, the C libraries that are on weather2, something else in RedHat 8, or something in your system configuration on weather2. I had 'top' running on your system and a Solaris x86 system here at the UPC as the clock ticked past 0Z so I could monitor the size of XCD data monitors. As soon as surface METAR and synoptic/ship/buoy data for a new day started to arrive, the XCD data monitors DMSFC and DMSYN on weather2 both more than doubled in size. This is the situation that would cause DMSYN to go into an infinite loop. At the same time and while receiving the same data, our Solaris x86 system showed no change in size of either of these data monitors. The same version of McIDAS-X, -XCD was built using gcc/g77 on both your and our systems, but the GCC version we are using on our x86 box is 2.95.3. GCC on weather 2 is 3.2. >Also, I notice that >pqact is taking up 21 MB of RAM, so it still makes me wonder if >something fishy isn't happening there. You have to be careful when interpreting the sizes of LDM processes. The reason is that their size will reflect the memory mapped LDM queue. For instance, the size for pqact as indicated by 'top' on our x86 system is: 26725 ldm 1 58 0 1940M 460M sleep 11:13 0.56% pqact This reflects the fact that the queue on our machine is 2 GB. >Also, please note that I am running the non-bugfixed version of McIDAS on >weather.admin...and it doesn't hang there. However, I have 1 GB of memory >on that machine, vs. 500 MB on weather2. Maybe that could help provide a >clue? Ah so... This is probably telling us something very important. I talked with our system admin about what was happening on weather2, and he mused that what we are seeing may be something that is isolated to weather2 alone. Your comment that unpatched -XCD on a different RH 8 system at NIU does not show hangs strongly suggests that there is something fundamentally wrong with the OS installation on weather2. Exactly what that may be I can't say. It is "funny" (not ha ha) that a routine trying to malloc a small (~ 82 KB) amount of memory on weather2 sends the data monitor into an infinite loop even though there is LOTS of swap space available ( > 0.5 GB). This sort of implies that there is somehing amiss with the swapping. Again, what it may be I can't say. Each time dmsyn.k would hang on weather2, an examination of the core file that is caused by sending the process a 'kill -ABRT' signal showed that the routine that was in a tight loop was one that organizes memory on behalf of malloc. That routine can be found in /lib/libc.so.6. It was weird that this routine would hang especially when there was ample swap space on disk that could have been used to swap things out of memory. I was suspicious of some sort of memory starvation on weather2 quite some time ago. If you will remember our phone conversation, I asked if it would be possible to put more memory in weather2 to see if that wouldn't solve your problem. For the record: all of the memory leaks I found in McIDAS C routines could not have amounted to the increase in executable sizes that I was seeing when the data monitors went off to create an MD file for a new day's data. In fact, the amount of memory that was used for this task is only about one tenth the amount that the executable would grow to. This remains a mystery to me. Tom >From address@hidden Tue Nov 26 11:50:59 2002 >Subject: Re: 20021126: dmraob.k and dmsyn.k hanging on weather2; LDM memory >leak (cont.) re: opinion is there is something wrong with either GCC 3.2, the C libraries that are on weather2, something else in RedHat 8, or something in your system configuration on weather2. >I suspect GCC. My Weather2 RedHat installation was done from scratch, >unlike the others which have been updated since 8.0 (not a "clean >install"), as they say. Weather3 is getting old and is starting to fail, >but weather is running just fine. re: same version of data monitors on other machines don't grow; you are using GCC 3.2 >Right. re: interpreting size of LDM programs OK. re: XCD running on weather with no meltdown may be problem with weather2 >Or that the memory leaks were never big enough to overwhelm weather, but t>hey did on half the memory size on weather2? re: memory leaks weren't big enough to cause size increases seen for data monitors >Well, we'll keep monitoring. In any case, I'll upgrade GCC again when the >next patch comes out. Thanks for the hard work and the trouble...I assume >these patches will be in place for all future versions.
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata inquiry tracking system and then made publicly available through the web. If you do not want to have your interactions made available in this way, you must let us know in each email you send to us.