Hi Stacey, re: > We are ingesting Level-II > radar data and seem to be running into some type of LDM issue. We think it > might be a resource issue, but we cannot seem to find what it could be. We > are ingesting all the Level-II radar data on one machine. That machine > starts a separate process for each radar site. That process reads data > from STDIN and once enough data has been received, it process the data into > images and places those images on the LDM queue. It then returns to > reading from STDIN. With this setup, we are getting a lot of these types > of warnings: > > pqact WARN: write(13,,4096) to decoder took 9 s: > /wxhub/decoders/l2decode/l2decodev2KDLH > pqact WARN: write(7,,4096) to decoder took 8 s: > /wxhub/decoders/l2decode/l2decodev2KEOX > pqact WARN: write(18,,4096) to decoder took 9 s: > /wxhub/decoders/l2decode/l2decodev2KLCH This are informational warnings that a process is taking longer than expected to finish. The messages are not necessarily an indication of a problem that needs addressing especially since the times shown are modest. If the times were large, then it would be an indication that the situation should be investigated further. re: > We have our LDM queue set to 2GB and that queue is located in RAM. We > restart the LDM and a few minutes later we start receiving the messages > above. After about 30 minutes of processing data, we start getting these > messages: > > ulog DEBUG: Deleting oldest to make space 97024 bytes > ulog DEBUG: Deleting oldest to make space 27616 bytes > ulog DEBUG: Deleting oldest to make space 167104 bytes These debug messages are informing you that the LDM queue routines are doing what they are designed to do which is delete the oldest products in the queue to make space for new ones that are being received. If the products being deleted have already been processed, then there is no problem. If the products being deleted have not been processed, then there is a problem. re: > So it seems we are losing data, but we cannot find out why. Neither of the things you listed above indicate that products are being lost. Have you checked to see if you are, in fact, actually losing products? This could be done, for instance, by making an inventory of the products that were received and processed and comparing it to an inventory that your upstream feed site received and sent you. The LDM utility 'notifyme' can be of great help in this kind of investigation. re: > The machine we > are running this on is extremely fast and all it does is process the > Level-II data. It has two 16 core CPUs clocked at 2.6Ghz. So we have 32 > cores in this thing and 32 Gigs of memory. We are not writing anything out > to disk, we are building images from the data and placing those images back > on the LDM queue. OK. Presumably those images are then being sent to other machines? Question: - how do you have the actions structured in your LDM pattern-action file(s)? I ask because if all of the processing actions are in a single pattern-action file, then you may have a processing bottleneck. Each pattern-action file action is checked against every product regardless of whether a previous action in the pattern-action file matched and was executed. If your pattern-action file has a LOT of actions, it may take a "long" time to work through all of the actions for a product before the next product can be acted on. re: Below is a top command showing the computer's state at > the time we are receiving the WARN and DEBUG messages. > > top - 13:31:37 up 1 day, 17:30, 4 users, load average: 2.74, 2.44, 2.40 > Tasks: 426 total, 3 running, 423 sleeping, 0 stopped, 0 zombie > Cpu(s): 6.7% us, 0.1% sy, 0.0% ni, 93.2% id, 0.0% wa, 0.0% hi, 0.0% si > Mem: 33250348k total, 2718984k used, 30531364k free, 159560k buffers > Swap: 116177060k total, 0k used, 116177060k free, 2043820k cached Nothing looks out of line here. re: > Here are some specs on what we are running: > > LDM version: 6.8.1 > CPU: 32 cores @ 2.6Ghz > RAM: 32 Gigs > OS: Custom version of Debian Linux This looks like a very capable machine. I can not comment on whether or not a custom version of Debian Linux would cause problems, but I doubt that it would. re: > We were wondering if maybe there is some buffer limit for STDIN in Linux > that is getting reached. Each process is reading data from STDIN. Yes, there are buffer limits for *nix pipes. If your decoder process is fast enough, however, the buffer limit should not be a problem. If you are convinced that you are, in fact, losing data, you may want to change your decoding strategy to writing the products to disk and running your decoding processes on the disk images directly. This would eliminate any bottleneck that may be encountered in the way you are currently handling the data. re: > Then it > goes off and build some images, which could take 5-10 seconds to complete. > It then comes back and beings reading from STDIN again. Would this cause > STDIN to get backed up while the decoder is building the images? Yes, most certainly. re: > Could this be our bottle neck? It could be, yes. re: > Below are our ldmd.conf and pqact.conf lines for this data: > > ldmd.conf: > request NEXRAD2 ".*" 18.104.22.168 > > pqact.conf: > CRAFT > ^L2-([^/]*)/(.*)/([0-9][0-9][0-9][0-9][0-1][0-9][0-3][0-9][0-2][0-9][0-5][0-9][0-9][0-9])/[0-9]+/[0-9]+/[IES]/V0/0$ > PIPE /wxhub/decoders/l2decode/l2decodev2 \2 Thanks for including these; they help to understand how you are processing the data. re: > Any suggestions you could give would be greatly appreciated. One easy thing to do would be to create multiple pattern-action files each of which processed a mutually-exclusive subset of the data being received. This would be as simple as: - copying the pattern-action file to, say, 4 other pattern-action files (named differently, of course) and then changing your single ldmd.conf EXEC line into 5 EXEC lines, each of which processed 20% of the products This 5-way splitting of the processing would lessen the time spent waiting before the products for the next NEXRAD could be processed. - FILEing the NEXRAD products to disk and changing your decoding actions to read from the disk files directly This would insure that the products received would be available for processing. The tricky part of this and all Level II decoding is that there is no indicator for when the last piece of a volume scan is received. There is a product that indicates that it is the last part of a volume scan, but there is no guarantee that it is received last. Our approach to processing Level II data is to write the pieces to disk in their own subdirectory and then kick off a process that reassembles the pieces into a full volume scan. That process is responsible for determining if all pieces have been received; it will sleep for a bit and look again if it "thinks" that there are pieces from the original volume scan that have not been received yet. This approach works nicely, but it does delay the availability of the data a bit (but not that long). re: > Thank you, No worries. Cheers, Tom -- **************************************************************************** Unidata User Support UCAR Unidata Program (303) 497-8642 P.O. Box 3000 address@hidden Boulder, CO 80307 ---------------------------------------------------------------------------- Unidata HomePage http://www.unidata.ucar.edu **************************************************************************** Ticket Details =================== Ticket ID: WEM-615262 Department: Support LDM Priority: Normal Status: Closed
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata inquiry tracking system and then made publicly available through the web. If you do not want to have your interactions made available in this way, you must let us know in each email you send to us.