An
Upgraded Unidata 64-Bit IDD Relay Service
at Pennsylvania State University
Charles Pavloski and Art Person
The Pennsylvania State University
University Park, Pennsylvania,
The Unidata Local Data Manager (LDM) and Unidata Internet Data Distribution (IDD) systems provide a stable mechanism for the dissemination of meteorological data from various sources to the greater Unidata community. The Department of Meteorology at The Pennsylvania State University (Penn State) has participated as an IDD level-two relay data distribution site since 1998. Recently, the equipment providing this IDD relay service, an Intel Pentium III 866 MHz server with 832 MB RAM, was becoming incapable of handling the growing IDD data stream. Using funds provided via the Unidata Equipment Grants program, supplemental funds from Penn State and working with Unidata IDD staff, a trio of 64-bit machines running the LDM and Linux Virtual Server (LVS) software is now being configured for IDD relay service originating from Penn State.
Unidata’s Role at Penn State
Unidata’s IDD feed and associated software distributions such as GEMPAK/N-AWIPS, IDV and McIDAS-X are considered vital tools for research, instruction and outreach. For example, the GEMPAK/N-AWIPS suite of software allows our students to explore current and past weather scenarios as part of upper-level undergraduate meteorology courses. The GEMPAK/N-AWIPS software is also used for the generation of graphics for the popular Penn State electronic map wall (e-WALL) available on the World Wide Web at the following Uniform Resource Locator (URL): http://www.meteo.psu.edu/ewall
For research purposes, graduate students, instructors and faculty use our real time and archived data from the IDD for a significant number of research initiatives. As the scope of products and the volume of data increase via IDD, so does the importance of Unidata products to our educational, outreach and research programs.
System Configuration and Rationale
The original intent of our upgrade was to replace our aging Compaq ML330 866 MHz IDD relay server with an updated system that could adequately handle both current and foreseeable loads for the next several years. During the past year, however, it was brought to our attention that Unidata was developing a new LDM cluster configuration (http://www.unidata.ucar.edu/newsletter/2005june/clusterpiece.htm). Using this configuration, a head "director" node would field data requests from downstream sites and service them through an array of two or more "real servers" which actually fulfill the requests. This approach implements Linux Virtual Server (LVS) technology and seemed to be the appropriate approach for our servers. However, this configuration required a minimum of three systems: one “director” and two “real” data servers. After pricing servers with standard vendors, it became apparent that this was not possible without going considerably over budget. Therefore, we chose the “build-your-own” system approach. This
involved exploring the hardware necessary to build three systems, weighing the costs involved and the inherent risks. The primary risks were reliability versus warranty issues and compatibility issues between the various hardware and software components. These risks were addressed by choosing only name-branded components with a good reputation in the server market. The platform we chose is as follows:
“Director” system: SuperMicro SC833T-550 chassis
Intel SE7520AF2 motherboard with RAID enabled
2 Intel 3.0 GHz Xeon EM64T processors
2 GB Kingston DDR2 RAM
2 ST336706LW 36 GB 10K rpm SCSI hard drives
2 WD740GD 74 GB 10K rpm SATA hard drives
Floppy and CD-ROM drives
Red Hat EL AS 4 U3
Unidata LDM 6.4.5
“Real” data servers (2): SuperMicro SC833T-550 chassis
Intel SE7520AF2 motherboard
2 Intel 3.0 GHz Xeon EM64T processors
4 GB Kingston DDR2 RAM
1 WD740GD 74 GB 10K rpm SATA hard drive
Floppy and CD-ROM drives
Red Hat EL AS 4 U3
Unidata LDM 6.4.5
One weakness in the above configuration is the lack of a redundant power supply in the “director." However, since all three chassis are identical and power supply failures are infrequent, we decided the supply from a “real” server could be swapped into the “director” if necessary to resume operation of the relay cluster with minimal downtime. Another potential weakness is the relatively small memory configurations of 2 and 4 GB. This was intentional since server memory prices were high at the time of purchase and can be expanded as needs demand. Both systems are capable of up to 16 GB of memory.
Operationally, our LVS configuration can be considered in two parts: IDD data routing, and network packet routing. In our configuration, the IDD data are gathered by the “director” system from various sources. The “director” then feeds these data to the “real” data servers, which then feed downstream sites. From the network packet routing point-of-view, downstream LDM sites make data requests to our virtual LDM server address, which is handled by the “director” system. The “director” then passes the request on to one of the “real” servers, which then deliver the data to the downstream site. The “director”, in this case, is simply acting as a specialized case of a network router. Since the configuration is “virtual”, the downstream sites are unaware of the fact that there is more than one machine behind the virtual address.
FIGURE 1 – The trio
of machines comprising PSU Meteorology’s new director and twin
server IDD system.
System Performance - RAID Versus Single Hard Drive
In order to provide fault-tolerance, most server-class systems today incorporate RAID disk arrays to provide storage for the most critical parts of a system. RAID can be implemented in several ways, but the most common are RAID levels 1 and 5. RAID 1 simply provides disk mirroring, meaning all data are stored on two (or more) real disks at the same time. RAID 5 provides for parity data striped across arrays of disks. In both cases, the underlying hardware/software maintains the array such that the user is unaware that an array is in use, and should one drive fail during operation, it also allows for replacement of the failed disk also without interruption of service (I/O may slow down for awhile).
The desirability of such a RAID configuration for an LDM relay system may be obvious, however, there have been a number of reports from users that suggested when LDM product queues were placed on hardware RAID disk arrays on Linux systems, that the LDM performed poorly to the point of congestion for a fully loaded IDD data stream. The use of single, non-RAIDed disks, were not affected this way. Since inability to use a RAID platform for mission critical parts of our LDM relay would jeopardize the reliability of the relay, we decided to do some testing of our chosen configuration to see whether this would be an issue or not.
The selected configuration for our director system uses a RAID 1 mirror running on an embedded Intel IOP332 RAID processor chip on an Intel SE7520AF2 motherboard. The mirror is 36 GB (consisting of two Seagate ST336706LW 10K rpm 36 GB SCSI disks) and contains both the operating system (Red Hat Enterprise Linux AS 4 update 3 64-bit) and the product queue. The system is also configured with two SATA-attached Western Digital WD740GD 10K rpm 74 GB hard drives used for comparison testing.
As a basic test, the system was simply run as a local downstream site by configuring one of the other “real” data server systems as a relay. This relay received the complete IDD data stream from idd.unidata.ucar.edu and then transferred that data across our local gigabit LAN to this test system. After running this configuration for several days, no unusual delays were observed.
Next, file creation tests were performed since the “ldmadmin mkqueue” command was observed to take an excessive amount of processor time while running. To test this more clearly, a program was created that wrote a 4 GB file in 32KB chunks (similar to what the LDM mkqueue command does). Since the file was 4 GB in size, caching effects of the physical 2 GB system memory limitation were reduced. The results showed the following:
User Time |
System Time |
Real Time |
CPU Utilization |
|
RAID |
0.05 seconds |
262 seconds |
274 seconds |
99% |
SATA |
0.05 seconds |
18 seconds |
116 seconds |
16% |
Writing to the RAID array took an unreasonable amount of processor time compared to writing to the direct-connect SATA drive. When similar tests were performed on a different system with a Red Hat 4, 32-bit operating system writing to a 3Ware RAID controller, the results were acceptable for both RAID and direct connected SATA drive at around 17% utilization. Thus, there’s an apparent write-performance problem when using the RAID array with the 64-bit version of the operating system.
A second series of tests involved writing a gigabyte file from the RAID array to the SATA drive, and then from the SATA drive back to the RAID array. The intention was to compare CPU usage in each input/output direction. Writing to the SATA drive consumed 5.6 seconds while writing to the RAID array consumed 26.9 seconds. These observations again support the conclusion that RedHat 4 64-bit Linux appears to have a performance issue when writing to the RAID 1 array on the Intel SE7520AF2..
We next addressed the question of whether or not this specific write problem extended to LDM operation when writing data to the product queue. To test this, a program was written that mimicked the operation of the LDM by memory-mapping 4 GB files and then reading it from beginning to end while writing it from end to beginning in 32KB chunks. Locking and unlocking of the 32K byte chunks was performed when writing to the files. The average results while running the tests 3 times are as follows:
User Time |
System Time |
Real Time |
CPU Utilization |
|
RAID |
8.3 seconds |
6.1 seconds |
503 seconds |
3% |
SATA |
8.3 seconds |
9.1 seconds |
430 seconds |
4% |
These results indicate that memory mapping is apparently NOT affected by the file writing problems exhibited above. The SATA drive performed about 15% better in wall-clock time, while in processor time, the RAID device performed about 17% better. These tests were performed without write-back cache enabled on the RAID controller. When that was enabled, the SATA drive still out-performed the RAID device in wall-clock time, but only by 7%.
In conclusion, the file writing problems with Red Hat 4 on this platform would probably create significant performance issues if it were used for filing or decoding data. However, as a pure relay node (using only memory mapping), the Red Hat 4 operating system should run the LDM quite nicely.
High Availability
The intent behind the virtual server approach to IDD data delivery is to minimize downtime related to hardware failures or upgrade requirements. This higher availability is achieved in our LDM virtual server by incorporating two “real” data servers behind one virtual server using LVS technology. However, there are still weak points. As noted above, the “director” system does not have a redundant power supply. Worse than that, the “director” system itself has no redundant partner like the “real” data servers do. If a “real” data server goes down, the second one will take over. However, if the “director” goes down, the virtual LDM goes down. The solution for this shortcoming is to implement a second backup “director” utilizing technologies available to handle “director” failover. We are currently investigating implementation of this approach in our configuration to make the Penn State relay a true high-availability virtual LDM server.
Future Expansion and the National Lambda Rail
Penn State and Unidata are both participants in the National Lambda Rail (NLR) (http://www.nlr.net/) project. The continually growing need for timely distribution of meteorological observations, satellite imagery and model data stress the current commodity internet and Internet2 networks. Both institutions are making efforts to use this third-generation high-speed network and the NLR should provide plenty of bandwidth and speed for the next several years. Penn State intends to continue to provide and expand timely IDD relay service to downstream sites using our new high-availability systems and the more highly capable networking of the NLR.