[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[LDM #UKF-836086]: LDM 6.9.4 issue---queue size getting monstrous



Gilbert,

> Forwarded conversation
> Subject: "ldmadmin check" problem LDM may not be running
> ------------------------
> 
> From: **<address@hidden>
> Date: Mon, Jan 17, 2011 at 5:31 PM
> To: address@hidden, address@hidden
> 
> 
> Checking for a running LDM system...
> Checking the system clock...
> Checking the most-recent insertion into the queue...
> Vetting the size of the queue and the maximum acceptable latency...
> vetQueueSize(): The maximum acceptable latency (registry parameter
> "/server/max-latency": 3600 seconds) is greater than the observed minimum
> virtual residence time of data-products in the queue (2029 seconds).
> This will hinder detection of duplicate data-products.
> The value of the "/reconciliation-mode" registry-parameter is "increase
> queue"
> Increasing the capacity of the queue...
> Creating new queue of 2249086265 bytes and 106734 slots...
> Illegal size "2249086265"
> Usage: pqcreate [options] <initialsz>[k|m|g] <pqfname>
> pqcreate [options] -s <initialsz>[k|m|g] [-q <pqfname>]
> Options:
> -v
> -c
> -f
> -l logfname
> -S nproducts
> (default pqfname is "/home/ldm/var/queues/ldm.pq")
> vetQueueSize(): Couldn't create new queue: /home/ldm/var/queues/ldm.pq.new

It appears that 1) the "ldmadmin check" command noticed that the product-queue 
wasn't big enough given the "max latency" parameter; 2) the reconciliation-mode 
parameter was set to "increase queue size"; and 3) the size of the queue that 
would be necessary in order to guarantee duplicate data-product detection is 
larger than the operating-system can handle. I suspect that the system in 
question is a 32-bit one and that it doesn't support large files (files larger 
than about 2 GB).

Your options include 1) setting the reconciliation-mode parameter to "do 
nothing", which will cause the "ldmadmin check" command to complain but, 
otherwise, do nothing and will prevent the LDM from guaranteeing duplicate 
data-product detection; 2) setting the reconciliation-mode parameter to 
"decrease max latency", which will cause the maximum latency parameter to be 
adjusted downwards in order to guarantee duplicate product detection; 3) 
migrating to a system that supports larger files in order to keep the default 
3600 second maximum latency; and 4) trying to rebuild the LDM on the current 
system with support for large files.

I figured that there would be some "growing pains" with the addition of this 
new feature. Let me know what you decide or if you have any questions.

> ----------
> From: **<address@hidden>
> Date: Mon, Jan 17, 2011 at 5:46 PM
> To: address@hidden, address@hidden
> 
> 
> virtual residence time of data-products in the queue (1961 seconds).
> Creating new queue of 2322946772 bytes and 109217 slots...
> Illegal size "2322946772"
> 
> ----------
> From: **<address@hidden>
> Date: Mon, Jan 17, 2011 at 6:01 PM
> To: address@hidden, address@hidden
> 
> 
> virtual residence time of data-products in the queue (1929 seconds).
> Creating new queue of 2359502899 bytes and 111645 slots...
> Illegal size "2359502899"
> 
> ----------
> From: *Gilbert Sebenste* <address@hidden>
> Date: Mon, Jan 17, 2011 at 7:10 PM
> To: address@hidden
> Cc: address@hidden
> 
> 
> Hello Steve,
> 
> Gilbert here. Now you get to see my second job in action. :-)
> We have a problem here at AllisonHouse. One of our machines,
> feed03.allisonhouse.com, has a huge ldm.pq.new file and since
> we run those from shared memory, it is filling it up and
> crashing our LDM. I think Tom Yoksas had this problem.
> This is only occurring on one of two servers which are
> essentially identical, getting the same feeds.
> Anyway, ldmadmin check complains that:
> 
> 
> In the ldmd.log file, I see:
> 
> Jan 17 23:16:32 feed03 10.1.1.12[25694] NOTE: LDM-6 desired product-class:
> 20110117221632.957 TS_ENDT {{NEXRAD2,  "K[L-R]"},{NONE,
> "SIG=081abfd86fff9ce0e906d4df32984bbc"}}
> Jan 17 23:16:33 feed03 10.1.1.12[25695] NOTE: LDM-6 desired product-class:
> 20110117221633.068 TS_ENDT {{NEXRAD2,  "K[S-Z]"},{NONE,
> "SIG=753faf5aef690a29022c8ad31e661de0"}}
> Jan 17 23:16:33 feed03 10.1.1.12[25696] NOTE: LDM-6 desired product-class:
> 20110117221633.194 TS_ENDT {{NEXRAD2,  "P[A-Z]"},{NONE,
> "SIG=9d08e28a11b3cd7ecf0f9e31fcc7f64c"}}
> Jan 17 23:16:33 feed03 10.1.1.12[25698] NOTE: LDM-6 desired product-class:
> 20110117221633.536 TS_ENDT {{NEXRAD3,  ".*"},{NONE,
> "SIG=1c9a5dea7c65b2c63b9fbaf0c08c8eca"}}
> Jan 17 23:16:33 feed03 10.1.1.12[25699] NOTE: LDM-6 desired product-class:
> 20110117221633.662 TS_ENDT {{IDS|DDPLUS,  "^(W.....) (....)"},{NONE,
> "SIG=bdc9fbb9c60b57219c478675bc0303cb"}}
> Jan 17 23:16:33 feed03 10.1.1.12[25700] NOTE: LDM-6 desired product-class:
> 20110117221633.787 TS_ENDT {{IDS|DDPLUS,  "^(ASUS01) (KWBC)"},{NONE,
> "SIG=70ed6821f05064e0a4c8166b5e36ee80"}}
> Jan 17 23:16:33 feed03 10.1.1.12[25701] NOTE: LDM-6 desired product-class:
> 20110117221633.930 TS_ENDT {{IDS|DDPLUS,  "^(FSUS02) (KWBC)"},{NONE,
> "SIG=b29855e5fefa81a4bc9a3ff915255f40"}}
> Jan 17 23:16:34 feed03 pqact[25689] NOTE: Starting from insertion-time
> 2011-01-17 23:16:05.458968 UTC
> Jan 17 23:16:05 feed03 pqact[24062] NOTE: Behind by 0.129631 s
> Jan 17 23:16:05 feed03 pqact[24061] NOTE: Behind by 0.142458 s
> Jan 17 23:16:20 feed03 pqcopy[25659] NOTE: Starting Up (25542)
> Jan 17 23:16:20 feed03 pqcopy[25659] ERROR: mmap: (nil) 0 2141605888: Cannot
> allocate memory
> Jan 17 23:16:20 feed03 pqcopy[25659] ERROR: pq_open failed:
> /home/ldm/var/queues/ldm.pq.new: Cannot allocate memory
> Jan 17 23:16:20 feed03 pqcopy[25659] NOTE: Exiting
> Jan 17 23:16:20 feed03 pqcopy[25659] NOTE: Number of products copied: 0
> Jan 17 23:16:32 feed03 pqcheck[25665] NOTE: Starting Up (25542)
> Why is it creating this new ldm.pq.new? Weird. Anyway, it's causing the LDM
> to crash. When I do an
> ldmadmin delqueue after stopping it, it doesn't delete the .new file, and
> just takes up a bunch
> of unnecssary memory. After stopping the LDM and doing an ldmadmin delqueue,
> I then did a rm ldm.pq.new, and did this:
> 
> /home/ldm% ll
> ls: unparsable value for LS_COLORS environment variable
> total 2093460
> drwxrwxrwt  2 root root         60 Jan 18 00:17 ./
> drwxr-xr-x 11 root root       3720 Oct  8 03:12 ../
> -rw-rw-r--  1 ldm  ldm  2141605888 Jan 17 23:16 ldm.pq.new
> /home/ldm% rm ldm.pq.new
> rm: remove regular file `ldm.pq.new'? y
> /home/ldm% rehash
> /home/ldm% ldmadmin clean
> ldmadmin mkqueue
> /home/ldm% ldmadmin mkqueue
> /home/ldm% rehash
> /home/ldm% ldmadmin clean
> ldmadmin newlog
> /home/ldm% ldmadmin newlog
> /home/ldm% cd /dev/shm
> /home/ldm% ls
> ls: unparsable value for LS_COLORS environment variable
> ldm.pq
> /home/ldm% ll
> ls: unparsable value for LS_COLORS environment variable
> total 1191740
> drwxrwxrwt  2 root root         60 Jan 18 00:18 ./
> drwxr-xr-x 11 root root       3720 Oct  8 03:12 ../
> -rw-rw-r--  1 ldm  ldm  1219145728 Jan 18 00:18 ldm.pq
> /home/ldm% df -k
> Filesystem           1K-blocks      Used Available Use% Mounted on
> /dev/mapper/SysVolGroup-LogVolRoot
> 234316432  13830396 208391368   7% /
> /dev/sda1               124427     20223     97780  18% /boot
> tmpfs                  4155468   1191740   2963728  29% /dev/shm
> /dev/sdb1            240292420  61002716 167083520  27% /home/ldm/data
> /home/ldm% ldmadmin newlog
> /home/ldm% ldmadmin start
> The product-queue is OK.
> Checking pqact(1) configuration-file(s)...
> /home/ldm/etc/pqact.conf: syntactically correct
> /home/ldm/etc/pqact.conf.emwin: syntactically correct
> Checking LDM configuration-file (/home/ldm/etc/ldmd.conf)...
> Starting the LDM server...
> /home/ldm% pwd
> ---
> And it now seems to be fine. I think Tom Yoksas had a similar issue, but
> since I didn't
> have it, I just blew it off. Anyway...
> 
> I realize that I am coming from a .com address, and therefore I completely
> understand you have no obligation to support me in this whatsoever.
> But, I do think you should obviously know about it, in case this
> is a serious or significant issue.
> 
> Thanks!
> 
> Gilbert
> 
> ----
> 
> Gilbert Sebenste
> Chief Meteorologist
> Allisonhouse, LLC


Regards,
Steve Emmerson

Ticket Details
===================
Ticket ID: UKF-836086
Department: Support LDM
Priority: Normal
Status: Closed