an interesting article about 'the grid,' which
we will likely be using in the future.
Or maybe not. Neither grid, nor cloud, computing is nearly so easy to
enable/enact as our friends at CERN would have you believe. The concept
of ubiquitous data apparently local to you is something we should
investigate, and there are several decent (grid-related!) projects of
note. However, CERN researchers fail mention that they've worked with
their less fortunate partners (do you know how much CERN and the EU
spend on basic research, compared to the US, especially in physics? Or,
for that matter, weather prediction?) to standardize software across all
the "grid" nodes wherever they may be. This to the point that, not
knowing that Scientific Linux, already based on recompiling RedHat
Enterprise, had decided to go with CentOS, researchers on our campus
flatly stated they'd not use any resources my organization stood up,
because it didn't meet their software requirements. All they had to
remember was, "Scientific Linux" to make CERN compliance happy...
Our experiences with "grid" enablement, so far, have been both good and
bad. On one hand, we've beenworking with a group that feels they can
make any app work on any system as long as the Globus Tool Kit is
installed. Let's just say that their progress to date on the grid
resources we've got in that group is, well, slow. Another group we work
with has adopted LDM to get weather model data and share other data
between sites. They employ a standard minimum software "stack" (which
does include Globus... but not a random version, rather, they specify
which is required) and publish guidelines for porting code to different
resources. Their results are a bit better.
Really, the high energy physics guys have it right, though: Standardize
ALL the software and require it be kept reasonably up to date, then
distribute the applications they expect second and third tier
researchers to use. In other words, they control the OS, utilities,
compilers, and applications. It's a homogeneous computing environment,
which for grid enablement, is a good environment.
We've looked at similar approaches for the atmospheric sciences: What
about a common-hardware, common-software, 100 TF distributed environment
(that's 0.1 petaFlop, or a significant chunk of computing horsepower).
What if it were set up to handle on-demand, near-real-time
forecasting, e.g., LEAD-ish event-driven models, triggered when SPC were
to issue a mesoscale discussion, or a watch, and updated in a RUC-ish
manner. What if that were available for competetive scheduling for YOUR
weather model run. Oh, you don't run a model? Select from extant
models and request a graphics run. You don't have a project that needs
that much computing time but you're developing a proposal? Ask for a
Using a homogeneous approach, grid enablement becomes much more
manageable. Commonizing the hardware and interconnect between nodes
makes the software interactions easier to manage. But the investment is
still non-trivial and who's gonna do it is still a problem.
One other thing unmentioned in the article is scheduling. In the
scenario I described above, the idea of preemptive, prioritized, and
reservation-enabled scheduling is implied... and pretty well mandatory.
Today's implementation of "the grid", and the implementations I
anticipate for the next 5 years or more, are well suited to batch jobs
with no element of urgency. You enqueue your job and when you get the
result, you analyze it. You might enqueue several jobs, and
post-process the results all together. You care about the job(s)
running to completion and being programmatically sound, in that, if you
run the job several times, with the same input, you get the same output.
You don't want that job to get in right now, run fast on enough CPUs
to complete in minutes, and then get out. In batch mode, you're not in
a hurry. That model works for retrospective analysis in our field, but
not really well for forecasting. Forecasting has a deadline by which
the job has to be done, post-processing completed, and a human has
reviewed the results. Batch processing has no such guarantees (today).
There are metaschedulers (Spruce) that purport to do things like things
like this for you but they honestly need a lot more work to be "right".
So: Will we all be using "the grid" in the future? Maybe, almost
certainly, yes. (As to why, I suspect that at some point NSF will say,
"Enough!" to every project buying a sub-teraflop cluster to do specific
project computation, and then to be aged out at the end of the
project... say, in three years... rather than being sustained.) I doubt
it will be TeraGrid facilities in the near-term, although some will try
to support this sort of usage, and NSF will likely foster it. But they
are used to, and understand batch processing, and lack our sense of
real-time urgency. A dedicated infrastructure for Atmospheric Sciences
(or Ocean/Atmosphere)? Makes sense to me, but NSF has denied this once,
Short form is, I think we have a long way to go to make "the grid" our
standard computing environment.
Gerry Creager -- gerry.creager@xxxxxxxx
Texas Mesonet -- AATLT, Texas A&M University
Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.862.3983
Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843