Re: [ldm-users] the grid

patrick wrote:
an interesting article about 'the grid,' which
we will likely be using in the future.

Or maybe not. Neither grid, nor cloud, computing is nearly so easy to enable/enact as our friends at CERN would have you believe. The concept of ubiquitous data apparently local to you is something we should investigate, and there are several decent (grid-related!) projects of note. However, CERN researchers fail mention that they've worked with their less fortunate partners (do you know how much CERN and the EU spend on basic research, compared to the US, especially in physics? Or, for that matter, weather prediction?) to standardize software across all the "grid" nodes wherever they may be. This to the point that, not knowing that Scientific Linux, already based on recompiling RedHat Enterprise, had decided to go with CentOS, researchers on our campus flatly stated they'd not use any resources my organization stood up, because it didn't meet their software requirements. All they had to remember was, "Scientific Linux" to make CERN compliance happy...

Our experiences with "grid" enablement, so far, have been both good and bad. On one hand, we've beenworking with a group that feels they can make any app work on any system as long as the Globus Tool Kit is installed. Let's just say that their progress to date on the grid resources we've got in that group is, well, slow. Another group we work with has adopted LDM to get weather model data and share other data between sites. They employ a standard minimum software "stack" (which does include Globus... but not a random version, rather, they specify which is required) and publish guidelines for porting code to different resources. Their results are a bit better.

Really, the high energy physics guys have it right, though: Standardize ALL the software and require it be kept reasonably up to date, then distribute the applications they expect second and third tier researchers to use. In other words, they control the OS, utilities, compilers, and applications. It's a homogeneous computing environment, which for grid enablement, is a good environment.

We've looked at similar approaches for the atmospheric sciences: What about a common-hardware, common-software, 100 TF distributed environment (that's 0.1 petaFlop, or a significant chunk of computing horsepower). What if it were set up to handle on-demand, near-real-time forecasting, e.g., LEAD-ish event-driven models, triggered when SPC were to issue a mesoscale discussion, or a watch, and updated in a RUC-ish manner. What if that were available for competetive scheduling for YOUR weather model run. Oh, you don't run a model? Select from extant models and request a graphics run. You don't have a project that needs that much computing time but you're developing a proposal? Ask for a development allocation.

Using a homogeneous approach, grid enablement becomes much more manageable. Commonizing the hardware and interconnect between nodes makes the software interactions easier to manage. But the investment is still non-trivial and who's gonna do it is still a problem.

One other thing unmentioned in the article is scheduling. In the scenario I described above, the idea of preemptive, prioritized, and reservation-enabled scheduling is implied... and pretty well mandatory. Today's implementation of "the grid", and the implementations I anticipate for the next 5 years or more, are well suited to batch jobs with no element of urgency. You enqueue your job and when you get the result, you analyze it. You might enqueue several jobs, and post-process the results all together. You care about the job(s) running to completion and being programmatically sound, in that, if you run the job several times, with the same input, you get the same output. You don't want that job to get in right now, run fast on enough CPUs to complete in minutes, and then get out. In batch mode, you're not in a hurry. That model works for retrospective analysis in our field, but not really well for forecasting. Forecasting has a deadline by which the job has to be done, post-processing completed, and a human has reviewed the results. Batch processing has no such guarantees (today).

There are metaschedulers (Spruce) that purport to do things like things like this for you but they honestly need a lot more work to be "right".

So: Will we all be using "the grid" in the future? Maybe, almost certainly, yes. (As to why, I suspect that at some point NSF will say, "Enough!" to every project buying a sub-teraflop cluster to do specific project computation, and then to be aged out at the end of the project... say, in three years... rather than being sustained.) I doubt it will be TeraGrid facilities in the near-term, although some will try to support this sort of usage, and NSF will likely foster it. But they are used to, and understand batch processing, and lack our sense of real-time urgency. A dedicated infrastructure for Atmospheric Sciences (or Ocean/Atmosphere)? Makes sense to me, but NSF has denied this once, at least.

Short form is, I think we have a long way to go to make "the grid" our standard computing environment.

Gerry Creager -- gerry.creager@xxxxxxxx
Texas Mesonet -- AATLT, Texas A&M University        
Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.862.3983
Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843

  • 2008 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the ldm-users archives: