Due to the current gap in continued funding from the U.S. National Science Foundation (NSF), the NSF Unidata Program Center has temporarily paused most operations. See NSF Unidata Pause in Most Operations for details.
Thanks. I will look into that. The boxes are getting a bit long in the tooth so maybe things are getting backed up. -----Original Message----- Sent: Wednesday, May 26, 2004 2:06 PM Cc: 'Arthur A. Person'; ldm-users@xxxxxxxxxxxxxxxx; GEMPAK support Robert, As I mentioned to Art back when he raised the problem with corrupt data files: one possibility is that your surface file is possibly getting corrupted by pqact firing up more than one instance of the dcmetr decoder. This would happen if your file I/O backed up to a point where pqact failed to write into the open pipe, or if the decoder had become slowed to the point that pqact could not push down any more data to the open PIPE. The way to see if this is the case is to look in the dcmetr.log files for overlapping process id's for the same filename (date/time). Since the same stream is apparently working on a second system, the one issue to consider is whether the system loading is the same on both machines. Also, you can issue a "kill -USR2" twice to the pqact process that is responsible for running decoders such as dcmetr. This will put the pqact process into debug mode an output to your LDM log file information about how long it is taking the pqact process to process each product once it arrives in the local queue. These lines will be found with a pq_sequence "Delay" message. If your pqact process is overloaded, you would see increasing values on the order of hundereds to thousands of seconds. A well running LDM typically has values less than 1 second. (A third "kill -USR2" will take you out of debug mode, which is good since debug logging takes a lot of log file space). If your pqact is falling behind, one symptom would be that data doesn't show up on disk for some time even though you are receiving it in a timely fashion. In the $NAWIPS/ldm/etc/gen_pqact.csh script, I provide an option for creating separate configuration files to run multiple instances of pqact which will distribute the processing load of pqact (which is helpful in particular if you are fileing all NEXRAD or CRAFT data). As for rtstats, our LDM server that receives these moved its network recently which might have resulted in the process not exiting on shutdown. I will continue to monitor these issues. Steve Chiswell On Wed, 2004-05-26 at 11:48, Robert Mullenax wrote: > I still think it's the same issue as we gave as we get file corruption > problems > of the *sao.gem files even when dcmetr isn't using all of the CPU. I know > they are corrupted when users report GARP crashes when trying to load them. > > I know this isn't OS specific as you are running Linux and we are running > Solaris x86. I upgraded to GEMPAK5.7.2p2 and had the same problem with > dcmetr. I am running ldm-6.0.14 > > -----Original Message----- > From: Arthur A. Person [mailto:person@xxxxxxxxxxxxx] > Sent: Wednesday, May 26, 2004 12:43 PM > To: Robert Mullenax > Cc: ldm-users@xxxxxxxxxxxxxxxx > Subject: RE: Strange ldm/gempak behaviour > > > Hi... > > It looks like the LDM stopping issue is due to rtstats hanging due to a > problem Unidata is working on right now. > > As for dcmetr, I actually have it running on a second system because I've > been having file corruption problems on my first system. In today's case, > dcmetr seemed hosed on the first system but okay on the second system. I > say this just to indicate that I too have seen dcmetr file corruption > problems but I'm still trying to figure out what the root cause might be. > I don't think I see it using a lot of cpu, however, nor do I ever have to > pkill it. > > Art. > > On Wed, 26 May 2004, Robert Mullenax wrote: > > > Yes, I have been having very similar problems with dcmetr (it would hang > > and use all the CPU and produce corrupted .gem files). I reported it a > week > > > > or so ago, but never got any responses from anyone having issues. > > > > I have to do a pkill -9 dcmetr > > > > > > > > -----Original Message----- > > From: owner-ldm-users@xxxxxxxxxxxxxxxx > > [mailto:owner-ldm-users@xxxxxxxxxxxxxxxx]On Behalf Of Arthur A. Person > > Sent: Wednesday, May 26, 2004 11:22 AM > > To: ldm-users@xxxxxxxxxxxxxxxx > > Subject: Strange ldm/gempak behaviour > > > > > > Hi... > > > > Thought I'd throw this out for comments... I just fixed (I think) a > > strange LDM/Gempak problem: dcmetr was core dumping many times/minute, > > yesterday's *sao.gem file was at the 4GB limit (actually, larger > > 4488229376 bytes...???) but today's was ~4.5M thus far. I figured I would > > stop/restart the ldm, but when I tried to stop it, one rpc and rtstats > > wouldn't go down, so I had to kill them, remake the queues, and then > > restart. Oddly enough, I have a second ldm running on another system that > > also decodes metars (who's files seemed okay size-wise) that, when I tried > > to stop its ldm, it also hung similarly and I had to kill/rebuild/restart > > it as well. > > > > Anyone have any similar experience or could suggest a cause? I don't > > recall that I've ever seen anything quite like this before. > > > > Thanks. > > > > Art. > >
ldm-users
archives: