>From: Gerry Creager N5JXS <address@hidden> >Organization: Texas A&M University -- AATLT >Keywords: 200503251531.j2PFViv2028115 LDM processing relay cluster Hi Gerry, >OK: I'm getting and storing all the data on bigbird. I'm starting to >process, as it comes in, a lot of the MADIS data to databases, and make >gifs of most/all of the Level II dBz 0.5deg stuff, as well as some other >elevations, and products. And I'm also starting to do some on-demand >gifs of the radar for another little project. I just jumped on bigbird to take a quick look at load averages and noticed that the age of the oldest product in your LDM queue is very small: ... + V 20050326.1453 7.07 6.34 6.49 31 14 45 465 6M 6M 0 scourBY(number|day) 20050326.1454 8.86 7.24 6.80 31 14 45 502 37M 6M 0 scourBY(number|day) 20050326.1455 8.37 7.36 6.87 31 14 45 512 27M 6M 0 scourBY(number|day) Because of this, I decided to check your queue size, and see that it is only 400 MB: address@hidden ldm]$ ls -alt data/ldm.pq -rw-rw-r-- 1 ldm ldm 407732224 Mar 15 16:34 data/ldm.pq Given the volume of data you are ingesting, it would be better if your queue size was substantially larger than 400 MB. I believe that you were running a 2 GB or 4 GB queue in the past. Since you have upgraded your LDM from 6.1 to 6.2.1 and then 6.3.0 relatively recently (Feb 18 for 6.2.1 and March 15 for 6.3.0), I am assuming that the need to set the queue size in the new ~ldm/etc/ldmadmin-pl.conf configuration file was missed. In LDM 6.2.1, configurations like $hostname, $pq_size, etc were moved from ~ldm/bin/ldmadin into a persistent file ~ldm/etc/ldmadmin-pl.conf. The confiuration entries in the new configuration file are almost all the same as those in ~ldm/bin/ldmadmin. The only exceptions are new entries that allow one to better tune the LDM queue (e.g., $pq_slots = ). I see that the that $pq_size was left at the default 400 MB in ldmadmin-pl.conf: $pq_size = "400M"; It may be wise to change this to something like 2 GB: $pq_size = "2G"; and then remake the queue to the larger size: <as 'ldm'> -- edit ~ldm/etc/ldmadmin-pl.conf ldmadmin stop ldmadmin delqueue ldmadmin mkqueue -f ldmadmin start The process can be sped up considerably by string all ldmadin invocations together on a single command line: ldmadmin stop && ldmadmin delqueue && ldmadmin mkqueue -f && ldmadmin start >I'm running outta horsepower in the several machines I've been working >with. One thing I'd been doing is getting the little subset of stuff I >needed to run on each machine, as a feed from bigbird, then processed it >locally. OK. The LDM overhead should not be much, so it must be the case that your available processing power is being consumed by your processing. >Where I'm heading is to nfs mount the data from bigbird, and then >process it into image directories either on the local machine and >cross-mount those to the webserver, or write them onto another >nfs-mounted directory. > >What are your thoughts on efficiencies and potential problems? The one problem with NFS mounts is the dependencies it creates on the order in which machines come up after shutdowns. This is especially the case when using the automounter. Are you thinking that the NFS mounts will save CPU on the machines doing the processing? I am not sure if this will or will not be the case. It really depends on the NFS implementation. While talking about OSes, I want to let you know that we have upgraded to Fedora Core 3 on our 32-bit AND 64-bit platforms. FC3 appears to be quite a bit more stable AND faster than either FC2 or FC1. While experimenting with a cluster approach to IDD data relay (more information included at the end of this email), we had a shootout between Sun Solaris x86 5.10, FreeBSD 5.3, and Fedora Core 3 64-bit Linux on three identically equipped Sun Sunfire V20Z boxes (dual Opteron, 4 GB RAM, 2x36 GB 10000 RPM SCSI, in 1U rackmount cases). All three are 64-bit OSes, so the comparison was as fair as we could make it. The _clear_ winner for IDD relay was FC3 64--bit; FreeBSD 5.3 came in second (not bad, but not nearly as good as FC3); and Solaris x86 5.10 was a _distant_ third (performance was dismal in our testing). Because of our testing, we replaced FreeBSD and x86 with FC3 on all of our boxes. I mention our testing since I see that bigbird is running FC2. Do you know if your 3Ware RAID card is supported under FC3? I have a hunch that it is, since I have been led to believe that Redhat Enterprise WS 4 is the RH-supported version of FC3. >I've >started running LDM as a trigger for the afrementioned architecture: >When the Bird flings a dataset across, rather than filing or decoding, >etc., it triggers the processing script telling the code to look at the >NFS-mounted data. Seems to be working pretty well so far. OK. You are using up bandwidth by sending the products across the wire, but this may not be so bad depending on your resources. Another approach would be to create an EXP product on bigbird that contained only the metadata from the IDD product and use that as a trigger on the processing machines. >I just want to make sure this thing will scale somewhat, or I'll be >doing this exercise over and over again! It seems to me that the bottlenecks you will encounter in the future are: - servicing of more NFS clients from bigbird - sending full sized products across the wire and throwing them away Below I include an email that I sent to another user that is strongly considering becoming a toplevel IDD relay node. As you will see, the note describes a cluster approach that we have been pursuing here in the UPC. As you read the info, please remember that we are still learning about the cluster and _will_ be making changes to the setup in the coming days/weeks/months/etc. I offer the following in order to ** hopefully ** provoke a revisit to an effort you were involved in some time back: establishment of a toplevel IDD relay node at the Houston GigaPop. Please let me know what you think... From: Unidata Support <address@hidden> Date: Tue, 15 Mar 2005 18:41:01 -0700 Subject: 20050315: IDD top level relay atm.geo.nsf.gov PSU (cont.) re: >How should we proceed from here? Perhaps it would be useful if I described the setup we have been moving towards for our toplevel IDD relay nodes -- idd.unidata.ucar.edu and thelma.ucar.edu. Let me warn you that I am not the expert in what I am about to say, but I think I can relate the essence of what we have been working on. The real brains behind what I describe below are: John Stokes - cluster design and implementation Steve Emmerson - LDM development Mike Schmidt - system administration and cluster design Steve Chiswell - IDD design and monitoring I am sure that these guys will chime in when they see something I have mis-stated :-) As you know, in addition to atm.geo.nsf.gov we operate the top level IDD relay nodes idd.unidata.ucar.edu and thelma.ucar.edu. Instead of idd.unidata and thelma.ucar being simple machines, they are part of a cluster that is composed of 'director's (machines that directs IDD feed requests to other machines) and 'data servers' (machines that are fed requests by the director(s) and service those requests). We are using the IP Virtual Server (IPVS) available in current versions of Linux to forward feed requests from 'directors' to 'data servers'. In our cluster, we are using Fedora Core 3 64-bit Linux run on a set of identically configured Sun SunFire V20Z 1U rackmount servers: dual Opterons; 4 GB RAM; 2x36 GB 10K RPM SCSI; dual GB Ethernet interfaces. We got in on a Sun educational discount program and bought our 5 V20Zs for about $3000 each. These machines are stellar performers for IDD work when running Fedora Core 3 64-bit Linux. We tested three operating systems side-by-side before settling on FC3; the others were Sun Solaris x86 10 and FreeBSD 5.3, both of which are 64-bit. FC3 was the _clear_ winner; FreeBSD was second; and Solaris x86 10 was a _distant_ third. As I understand it, RedHat Enterprise WS 4 is FC3 with full RH support. Here is a "picture" of what idd.unidata.ucar.edu and thelma.ucar.edu currently look like (best viewed with fixed width fonts): |<----------- directors ------------>| +-------+ +-------+ | ^ | ^ V | V | +---------------+ +---------------+ idd.unidata | LDM | IPVS | | LDM | IPVS | thelma.ucar +---------------+ +---------------+ / \ | | / \ / \ | | / \ / \ +----+ | / \ +-------/-------\------|----------+/ \ | / \ | / \ | / \ +----------------+ \ | / \ / | \ V / \ / V \ +---------------+ +---------------+ +---------------+ | 'uni2' LDM | | 'uni3' LDM | | 'uni4' LDM | +---------------+ +---------------+ +---------------+ |<----------------- data servers ---------------------->| The top level indicates two 'director' machines: idd.unidata.ucar.edu and thelma.ucar.edu (thelma used to be a SunFire 480R SPARC III box). Both of these machines are running IPVS and LDM 6.3.0 configured on a second interface (IP). The IPVS 'director' software forwards port 388 requests received on a one interface configured as idd.unidata.ucar.edu on one machine and thelma.ucar.edu on the other. The set of 'data server' backends are the same for both directors (at present). When an IDD feed request is received by idd.unidata.ucar.edu or thelma.ucar.edu it is relayed by the IPVS software to one of the data servers. Those machines are configured to also be known internally as idd.unidata.ucar.edu or thelma.ucar.edu, but they do not ARP, so they are not seen by the outside world/routers. The IPVS software keeps track of how many connections are on each of the data servers and forwards ("load levels") based on connection numbers (we will be changing this metric as we learn more about the setup). The data servers are all configured identically: same RAM, same LDM queue size (8 GB currently), same ldmd.conf contents, etc. All connections from a downstream machine will always be sent to the same data server as long as its last connection has not died more than one minute ago. This allows downstream LDMs to send an "are you alive" query to a server that they have not received data from in awhile. Once there have been no IDD request connections by a downstream host for one minute, a new request will be forwarded to the data server that is least loaded. This design allows us to take down any of the data servers for whatever maintenance is needed (hardware, software, etc.) whenever we feel like it. When a machine goes down, the IPVS server is informed that the server is no longer available, and all downstream feed requests are sent to the other data servers that remain up. On top of that, thelma.ucar.edu and idd.unidata.ucar.edu are on different LANs and may soon be located in different parts of the UCAR campus. LDM 6.3.0 was developed to allow running the LDM on a particular interface (IP). We are using this feature to run an LDM on the same box that is running the IPVS 'director'. The IPVS listens on one interface (IP) and the LDM runs on another. The alternate interface does not necessarily have to represent a different Ethernet device; it can be a virtual interface configured in software. The ability to run LDMs on specific interfaces (IPs) allows us to run LDMs as either 'data collectors' or as additional data servers on the same box running the 'director'. By 'data collector', I mean that the LDMs on the 'director' machines have multiple ldmd.conf requests that bring data to the cluster (e.g., CONDUIT from atm, UIUC, and/or, NEXRAD2 from Purdue, HDS from here, IDS|DDPLUS from there, etc.). The data server LDMs request data redundantly from the 'director' LDMs. We currently do not have redundancy for the directors, but we will be adding that in the future. We are just getting our feet wet with this cluster setup. We will be modifying configuations as we learn more about how well the system works. In stress tests run here at the UPC, we were able to demonstrate that one V20Z was able to handle 50% more downstream connections than the 480R thelma.ucar.edu without introducing latency. With three data servers we believe that we can now field literally every IDD feed request in the world if we had to (the ultimate failover site). If the load on the data servers ever becomes too high, all we need do is add one or more additional boxes to the mix. The ultimate limiting factor in this setup will be the routers and network bandwidth here in UCAR. Luckily, we have excellent networking! The cluster that is currently configured relays an average of 120 Mbps (~1.2 TB/day) to downsteam connections. Peak rates can, however, exceed 250 Mbps. Please let me know what you think about he above! Cheers, Tom >From address@hidden Sun Mar 27 08:35:45 2005 Hi, Tom! re: bigbird's LDM queue is 400 MB >Stupid user error. I *thought* I recalled that queues were now >automagically max'd, so I didn't check that. >I'll do so.. . If I correctly recall, I can now use '2G' or '4G' >instead of all the zeros. If not, well, I'll know in a minute as it'll >fail and I'll reedit. >Thanks for catching this... :-( re: what is using processing capabilities >Pretty much true, so I've distributed the gempak processing over a total >of 3 machines for Level II and one additional for the CONUS Level III >mosaic. Another does my MADIS and IDD/DDPLUS ingest, as well as the >rest of the WMO feeds, HDS, etc. So far, it's working pretty well, but >I suspect NFS isn't the best implementation, and I'm looking at >distributing processing via ssh/scp to the (well, I hesitate to refer to >it this way, but...) cluster of processing systems. re: NFS mounts >We initiate the mounts at boot time and leave 'em nailed. However, >sometimes during higher loads, NFS loses lock for a few seconds to >minutes... re: Are you thinking that the NFS mounts will save CPU >Yeah. And the Linux implementation isn't at the absolute top of the >heap. In fact, knowing a little bit about NFS is the reason I'm >thinking about the ssh/scp route. It should allow the data to be >snagged processed, and the results returned pretty efficiently. re: UPC experience with dual Opteron machines running FC3 >I may revamp the bird to go toward a dual Opteron implementation. >Interesting. re: bigbird running FC2 >Yeah. It is. Don't know if you recall, or if I told you. When we >started having the real nightmares with FC2 and the 2.6 kernels, I >talked to one of our vendors. He didn't like the sound of the problems >and called an engineering contact at 3Ware. They sent us a replacement >controller (we had to replace the parallel ATA drives with S-ATA >ourselves but we got a *good* price on 300GB drives), and told us the >problems with the parallel ATA controller in 2.6 were real, were their >fault, and possibly not fixable. Thus the new (and well-supported) >hardware. re: send product metadata to trigger actions on downstream machines >Hadn't thought of THAT. Interesting idea. re: bottlenecks to be faced - servicing of more NFS clients from bigbird >Yeah. Solution: private network to handle NFS. re: - sending full sized products across the wire and throwing them away >Not a network issue if we enable the private network (which we have, >overall). re: UPC direction on clusters >I think it's certainly do-able... looks like equipment-grant time for >next year, unless I can snag more money from other sources... that's not >impossible now, as some of the work I'm doing with NWS on GIS-based >dissemination, including the Polygon Warning tests >(http://mesonet.tamu.edu/PolygonTest/ but the site's still a little >quirky and we're trying to fix the nagging little things) may lead to >some money. If it does, first goes to a grad student to dedicate to >that process, the rest goes toward better hardware for the LDM/IDD relays. >All that said, I can do a mirror almost literally over the next couple >of days, using mesodata3 as the 2nd feed source, and ramp up >availability. We could use round-robin DNS for the time being to get >the connections flowing. >Concerning connectivity, the possibility remains of placing the hardware >back in Houston, but we're revamping the Texas network infrastructure. >At this time, I'm exactly 1 router away from the LEARN POP for TAMU, and >I'm helping drive some of the requirements. In fact, the LDM and Level >II work are being used as drivers here for current and future work. The >downside of that is that at some point, I'll have to come up with some >funding to support our network bandwidth requirements. I'm considering >that in all new funding requests. >The LEARN connection will provide a 10Gb/sec link throughout Texas, and >initially a pair of OC12 (622Mb/sec) interfaces to Internet2. Our >Commodity Internet capability will also ramp up to at least 1Gb/sec over >the next 4 months or so. As I start consuming more bandwidth, I'll have >to pay (as indicated above) but that's not tomorrow, and I won't be cut >off or throttled. So far, I don't think I'm bandwidth-limited by my >current location. >I've CC'd James Esslinger on this. I snagged him to work for me as an >admin and facilities manager in our lab (we need to contrive a meeting >here for you to come visit). I'll discuss this with James over the next >week and we'll see what we can do to start (or restart) the process of >adding TAMU as a top-level redistro site. >In the mean time, feel free to add feed requests via Internet2 (that'd >be NOAA, and universities, for the most part) pointed at bigbird, and >let's see where we taper off. >I'll talk with James about an upgrade program for the various boxes to >FC3. For what it's worth, I've been very happy with it, and the main >reason we haven't migrated everything there was a desire to not break >systems that were already working "OK" on the various other systems. We >actually still have a RH 8.0 system running... despite my comments in >the past that "friends don't let friends run RH 8.0". >We'll be back in touch. Thanks for the thoughts! >Happy Easter! >gerry -- Gerry Creager -- address@hidden Texas Mesonet -- AATLT, Texas A&M University Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.847.8578 Page: 979.228.0173 Office: 903A Eller Bldg, TAMU, College Station, TX 77843
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata inquiry tracking system and then made publicly available through the web. If you do not want to have your interactions made available in this way, you must let us know in each email you send to us.