[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[LDM #UKF-836086]: LDM 6.9.4 issue---queue size getting monstrous



Gilbert,

> Forwarded conversation
> Subject: "ldmadmin check" problem LDM may not be running
> ------------------------
> 
> From: **<address@hidden>
> Date: Mon, Jan 17, 2011 at 5:31 PM
> To: address@hidden, address@hidden
> 
> 
> Checking for a running LDM system...
> Checking the system clock...
> Checking the most-recent insertion into the queue...
> Vetting the size of the queue and the maximum acceptable latency...
> vetQueueSize(): The maximum acceptable latency (registry parameter
> "/server/max-latency": 3600 seconds) is greater than the observed minimum
> virtual residence time of data-products in the queue (2029 seconds).
> This will hinder detection of duplicate data-products.
> The value of the "/reconciliation-mode" registry-parameter is "increase
> queue"
> Increasing the capacity of the queue...
> Creating new queue of 2249086265 bytes and 106734 slots...
> Illegal size "2249086265"
> Usage: pqcreate [options] <initialsz>[k|m|g] <pqfname>
> pqcreate [options] -s <initialsz>[k|m|g] [-q <pqfname>]
> Options:
> -v
> -c
> -f
> -l logfname
> -S nproducts
> (default pqfname is "/home/ldm/var/queues/ldm.pq")
> vetQueueSize(): Couldn't create new queue: /home/ldm/var/queues/ldm.pq.new

It appears that 1) the "ldmadmin check" command noticed that the product-queue 
wasn't big enough given the "max latency" parameter; 2) the reconciliation-mode 
parameter was set to "increase queue size"; and 3) the size of the queue that 
would be necessary in order to guarantee duplicate data-product detection is 
larger than the operating-system can handle. I suspect that the system in 
question is a 32-bit one and that it doesn't support large files (files larger 
than about 2 GB).

Your options include 1) setting the reconciliation-mode parameter to "do 
nothing", which will cause the "ldmadmin check" command to complain but, 
otherwise, do nothing and will prevent the LDM from guaranteeing duplicate 
data-product detection; 2) setting the reconciliation-mode parameter to 
"decrease max latency", which will cause the maximum latency parameter to be 
adjusted downwards in order to guarantee duplicate product detection; 3) 
migrating to a system that supports larger files in order to keep the default 
3600 second maximum latency; and 4) trying to rebuild the LDM on the current 
system with support for large files.

I figured that there would be some "growing pains" with the addition of this 
new feature. Let me know what you decide or if you have any questions.

> ----------
> From: **<address@hidden>
> Date: Mon, Jan 17, 2011 at 5:46 PM
> To: address@hidden, address@hidden
> 
> 
> virtual residence time of data-products in the queue (1961 seconds).
> Creating new queue of 2322946772 bytes and 109217 slots...
> Illegal size "2322946772"
> 
> ----------
> From: **<address@hidden>
> Date: Mon, Jan 17, 2011 at 6:01 PM
> To: address@hidden, address@hidden
> 
> 
> virtual residence time of data-products in the queue (1929 seconds).
> Creating new queue of 2359502899 bytes and 111645 slots...
> Illegal size "2359502899"
> 
> ----------
> From: *Gilbert Sebenste* <address@hidden>
> Date: Mon, Jan 17, 2011 at 7:10 PM
> To: address@hidden
> Cc: address@hidden
> 
> 
> Hello Steve,
> 
> Gilbert here. Now you get to see my second job in action. :-)
> We have a problem here at AllisonHouse. One of our machines,
> feed03.allisonhouse.com, has a huge ldm.pq.new file and since
> we run those from shared memory, it is filling it up and
> crashing our LDM. I think Tom Yoksas had this problem.
> This is only occurring on one of two servers which are
> essentially identical, getting the same feeds.
> Anyway, ldmadmin check complains that:
> 
> 
> In the ldmd.log file, I see:
> 
> Jan 17 23:16:32 feed03 10.1.1.12[25694] NOTE: LDM-6 desired product-class:
> 20110117221632.957 TS_ENDT {{NEXRAD2,  "K[L-R]"},{NONE,
> "SIG=081abfd86fff9ce0e906d4df32984bbc"}}
> Jan 17 23:16:33 feed03 10.1.1.12[25695] NOTE: LDM-6 desired product-class:
> 20110117221633.068 TS_ENDT {{NEXRAD2,  "K[S-Z]"},{NONE,
> "SIG=753faf5aef690a29022c8ad31e661de0"}}
> Jan 17 23:16:33 feed03 10.1.1.12[25696] NOTE: LDM-6 desired product-class:
> 20110117221633.194 TS_ENDT {{NEXRAD2,  "P[A-Z]"},{NONE,
> "SIG=9d08e28a11b3cd7ecf0f9e31fcc7f64c"}}
> Jan 17 23:16:33 feed03 10.1.1.12[25698] NOTE: LDM-6 desired product-class:
> 20110117221633.536 TS_ENDT {{NEXRAD3,  ".*"},{NONE,
> "SIG=1c9a5dea7c65b2c63b9fbaf0c08c8eca"}}
> Jan 17 23:16:33 feed03 10.1.1.12[25699] NOTE: LDM-6 desired product-class:
> 20110117221633.662 TS_ENDT {{IDS|DDPLUS,  "^(W.....) (....)"},{NONE,
> "SIG=bdc9fbb9c60b57219c478675bc0303cb"}}
> Jan 17 23:16:33 feed03 10.1.1.12[25700] NOTE: LDM-6 desired product-class:
> 20110117221633.787 TS_ENDT {{IDS|DDPLUS,  "^(ASUS01) (KWBC)"},{NONE,
> "SIG=70ed6821f05064e0a4c8166b5e36ee80"}}
> Jan 17 23:16:33 feed03 10.1.1.12[25701] NOTE: LDM-6 desired product-class:
> 20110117221633.930 TS_ENDT {{IDS|DDPLUS,  "^(FSUS02) (KWBC)"},{NONE,
> "SIG=b29855e5fefa81a4bc9a3ff915255f40"}}
> Jan 17 23:16:34 feed03 pqact[25689] NOTE: Starting from insertion-time
> 2011-01-17 23:16:05.458968 UTC
> Jan 17 23:16:05 feed03 pqact[24062] NOTE: Behind by 0.129631 s
> Jan 17 23:16:05 feed03 pqact[24061] NOTE: Behind by 0.142458 s
> Jan 17 23:16:20 feed03 pqcopy[25659] NOTE: Starting Up (25542)
> Jan 17 23:16:20 feed03 pqcopy[25659] ERROR: mmap: (nil) 0 2141605888: Cannot
> allocate memory
> Jan 17 23:16:20 feed03 pqcopy[25659] ERROR: pq_open failed:
> /home/ldm/var/queues/ldm.pq.new: Cannot allocate memory
> Jan 17 23:16:20 feed03 pqcopy[25659] NOTE: Exiting
> Jan 17 23:16:20 feed03 pqcopy[25659] NOTE: Number of products copied: 0
> Jan 17 23:16:32 feed03 pqcheck[25665] NOTE: Starting Up (25542)
> Why is it creating this new ldm.pq.new? Weird. Anyway, it's causing the LDM
> to crash. When I do an
> ldmadmin delqueue after stopping it, it doesn't delete the .new file, and
> just takes up a bunch
> of unnecssary memory. After stopping the LDM and doing an ldmadmin delqueue,
> I then did a rm ldm.pq.new, and did this:
> 
> /home/ldm% ll
> ls: unparsable value for LS_COLORS environment variable
> total 2093460
> drwxrwxrwt  2 root root         60 Jan 18 00:17 ./
> drwxr-xr-x 11 root root       3720 Oct  8 03:12 ../
> -rw-rw-r--  1 ldm  ldm  2141605888 Jan 17 23:16 ldm.pq.new
> /home/ldm% rm ldm.pq.new
> rm: remove regular file `ldm.pq.new'? y
> /home/ldm% rehash
> /home/ldm% ldmadmin clean
> ldmadmin mkqueue
> /home/ldm% ldmadmin mkqueue
> /home/ldm% rehash
> /home/ldm% ldmadmin clean
> ldmadmin newlog
> /home/ldm% ldmadmin newlog
> /home/ldm% cd /dev/shm
> /home/ldm% ls
> ls: unparsable value for LS_COLORS environment variable
> ldm.pq
> /home/ldm% ll
> ls: unparsable value for LS_COLORS environment variable
> total 1191740
> drwxrwxrwt  2 root root         60 Jan 18 00:18 ./
> drwxr-xr-x 11 root root       3720 Oct  8 03:12 ../
> -rw-rw-r--  1 ldm  ldm  1219145728 Jan 18 00:18 ldm.pq
> /home/ldm% df -k
> Filesystem           1K-blocks      Used Available Use% Mounted on
> /dev/mapper/SysVolGroup-LogVolRoot
> 234316432  13830396 208391368   7% /
> /dev/sda1               124427     20223     97780  18% /boot
> tmpfs                  4155468   1191740   2963728  29% /dev/shm
> /dev/sdb1            240292420  61002716 167083520  27% /home/ldm/data
> /home/ldm% ldmadmin newlog
> /home/ldm% ldmadmin start
> The product-queue is OK.
> Checking pqact(1) configuration-file(s)...
> /home/ldm/etc/pqact.conf: syntactically correct
> /home/ldm/etc/pqact.conf.emwin: syntactically correct
> Checking LDM configuration-file (/home/ldm/etc/ldmd.conf)...
> Starting the LDM server...
> /home/ldm% pwd
> ---
> And it now seems to be fine. I think Tom Yoksas had a similar issue, but
> since I didn't
> have it, I just blew it off. Anyway...
> 
> I realize that I am coming from a .com address, and therefore I completely
> understand you have no obligation to support me in this whatsoever.
> But, I do think you should obviously know about it, in case this
> is a serious or significant issue.
> 
> Thanks!
> 
> Gilbert
> 
> ----
> 
> Gilbert Sebenste
> Chief Meteorologist
> Allisonhouse, LLC


Regards,
Steve Emmerson

Ticket Details
===================
Ticket ID: UKF-836086
Department: Support LDM
Priority: Normal
Status: Closed


NOTE: All email exchanges with Unidata User Support are recorded in the Unidata inquiry tracking system and then made publicly available through the web. If you do not want to have your interactions made available in this way, you must let us know in each email you send to us.