Hi, This is a follow-up to a phone conversation that Alain and I had on the morning of Monday, July 8: re: > As per our telephone conversation, I would like to request the links for > real time statistics and any pertinent information to help us > troubleshoot this problem. When encountering problems receiving data vi the Unidata LDM/IDD, the best course of action is: 1) check to see if real-time statistics have been reported by your machine(s) and are available for display on the Unidata website: Unidata HomePage http://www.unidata.ucar.edu Projects -> Internet Data Distribution http://www.unidata.ucar.edu/projects/index.html#idd IDD Current Operational Status http://www.unidata.ucar.edu/software/idd/rtstats/ Statistics by Host http://www.unidata.ucar.edu/cgi-bin/rtstats/siteindex 2) the left had column in the siteindex page you will find the classification of machines reporting real-time statistics by their domain name The machine(s) reporting real-time statistics for your domain will be listed in the right hand column of the siteindex page. Each machine name entry is a link to a set of information for that machine Example: Domain Hosts ca.gc.ec.cmc ldm-data.cmc.ec.gc.ca [6.10.1] ldm-wxo.cmc.ec.gc.ca [6.8.1] noaaport3.cmc.ec.gc.ca [6.6.4] noaaport4.cmc.ec.gc.ca [6.6.4] tigge-ldm.cmc.ec.gc.ca [6.6.4] 3) the page that will be shown when one clicks on the name of the machine of interest will contain a set of links for each datastream that is being REQUESTed by that machine For instance: https://www.unidata.ucar.edu/cgi-bin/rtstats/siteindex?ldm-data.cmc.ec.gc.ca Real-time Statistics for ldm-data.cmc.ec.gc.ca [ LDM 6.10.1 ] FEED NAME HDS latency log(latency) histogram volume products topology IDS|DDPLUS latency log(latency) histogram volume products topology NEXRAD2 latency log(latency) histogram volume products topology Cumulative volume summary Cumulative volume summary Graph 4) the things to look at when assessing whether the problem being investigated is a local problem, or one upstream are: latency - the amount of time between the creation of a product (i.e., when a product is first added to the original LDM queue from which it is distributed) and its receipt (i.e., the time that the product was received at the local machine) volume - time series of data volume received for the particular feed products - time series of the number of products received for the particular feed topology - the route that a product takes from its creation to its receiption 5) things that can be gleaned from the items listed in 4): latency - the time history of the latency shows: - if products are being received in a timely manner If the latencies are small, the products are being received with little delay. - if there is anything wrong with the system clock on the receiving machine A trend in the lowest latencies typically shows that the clock on the receiving machine is drifting. A latency plot where the lowest latency is consistently one non-zero, shows that the clock on the receiving machine is either slow or fast NB: - problems with the local clock should be fixed as soon as possible. If they are not, then one may either miss products when the LDM is restarted for any reason or no data will be received for some period of time when the LDM is restarted for any reason. - latencies that approach 3600 seconds (one hour) are a warning that there is some problem receiving the datastream being REQUESTed. When the latencies exceed 3600 seconds for an LDM installation that is configured in the "standard" manner, data _will_ be lost/not received/thrown away upon receipt. The reason for this is the LDM was designed for real-time delivery of data, and one of the working assumptions is that data that is an hour old is too old to be considered real-time. volume - this timeseries shows how much data was received per hour for the feed in question NB: - LDM REQUESTs for feeds that have high data volumes (e.g., CONDUIT, NEXRAD2, FNMOC, HRRR) may need to be split into mutually-exclusive subsets. The feed that has been typically split into five subsets is CONDUIT. With the move to dual polarization full volume scan radar data, the NEXRAD2 feed has become a candidate for feed REQUEST splitting. latency of low-volume feed(s) is acceptably low while the latency for high volume feed(s) is unacceptably high - very low latencies for a feed like IDS|DDPLUS coupled with very high latencies for a high volume feed like CONDUIT or NEXRAD2 is a classic indication of artificial bandwidth limiting in one or more legs in the network path being taken during data delivery. We refer generically to this situation as "packet shaping". It is our experience that packet shaping is typically done "close" to the downstream node (i.e., the machine receiving data). The network connection at/near UCAR/NCAR is never intentionally bandwidth limited, so if there is a bottleneck somewhere it is most likely not here. - when an instance of what looks to be packet shaping is discovered, it is the responsibility of the downstream site to initiate investigations into where the bottleneck may be. We (Unidata/UCAR) are willing to help in the investigations and help with resolution of problems, but we typically have no influence when the problem resides in the downstream's institution. 6) things to try when latencies for one or more feeds are unacceptably high: - determine if there is any network problems at one's institution - make sure that the LDM installed on one's machine(s) are functioning correctly and reasonably up-to-date - check real-time statistics being reported to us (links above) to make sure that you really are not receiving the data This may sound funny, but it is our experience that a number of sites assume that they are not receiving data when they actually are and their problem is in processing the data received. - if a packet shaping signature is seen, try splitting the high volume feed(s) that are experiencing unacceptably high latencies - if still having problems after undertaking local investigations, send an email to: Unidata IDD Support <address@hidden> Please do _NOT_ phone individuals in Unidata for help or send email to Unidata staff member's private email addresses. The reason for this is that the Unidata staff member may be out of the office and not able to respond to personal email or voicemail. Email sent to the address above is reviewed by several Unidata staff throughout the weekday and routinely on weekends and even holidays, so it is most likely that help will be provided faster. As we talked about during our phone conversation this morning, it is my opinion that: - the clock on one of your machines, ldm-data.cmc.ec.gc.ca, is not being properly maintained I can say this easily after looking at the latency plot for the IDS|DDPLUS feed - the linearly increasing trend in latency indicates that ldm-data's clock is drifting. - the disparity in the latency for IDS|DDPLUS and NEXRAD2 on ldm-data indicates that there is some limit to how much data (volume) a single network connection can have. This situation might be mitigated by either: - finding the source of the bottleneck and getting it fixed - or, splitting the high volume NEXRAD2 feed into several (e.g., 5) mutually exclusive subsets In order to make recommendations on how to split the NEXRAD2 feed, we would need to see the LDM configuration file (~ldm/etc/ldmd.conf) in use on ldm-data. As a final comment I would like to add that we hold training workshops each year for the software packages we support; the next training workshop for the LDM will be held on August 1-2 at our facility here in Boulder, CO. There is still at least one slot open for the LDM training session, but it may fill in the next day or so. Information on our training workshops can be found in: Unidata HomePage http://www.unidata.ucar.edu Events -> 2013 Training Workshop Please let me know if there was anything in the above that was unclear or needs further explanation. Cheers, Tom -- **************************************************************************** Unidata User Support UCAR Unidata Program (303) 497-8642 P.O. Box 3000 address@hidden Boulder, CO 80307 ---------------------------------------------------------------------------- Unidata HomePage http://www.unidata.ucar.edu **************************************************************************** Ticket Details =================== Ticket ID: PPJ-526440 Department: Support IDD Priority: Normal Status: Closed
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata inquiry tracking system and then made publicly available through the web. If you do not want to have your interactions made available in this way, you must let us know in each email you send to us.