[ldm-users] the grid

Gerry Creager gerry.creager at tamu.edu
Tue Apr 8 07:28:59 MDT 2008


patrick wrote:
> an interesting article about 'the grid,' which
> we will likely be using in the future.
> 
> http://www.techtree.com/India/News/The_Grid_to_Render_the_Web_Obsolete/551-88299-643.html

Or maybe not.  Neither grid, nor cloud, computing is nearly so easy to 
enable/enact as our friends at CERN would have you believe.  The concept 
of ubiquitous data apparently local to you is something we should 
investigate, and there are several decent (grid-related!) projects of 
note.  However, CERN researchers fail mention that they've worked with 
their less fortunate partners (do you know how much CERN and the EU 
spend on basic research, compared to the US, especially in physics?  Or, 
for that matter, weather prediction?) to standardize software across all 
the "grid" nodes wherever they may be.  This to the point that, not 
knowing that Scientific Linux, already based on recompiling RedHat 
Enterprise, had decided to go with CentOS, researchers on our campus 
flatly stated they'd not use any resources my organization stood up, 
because it didn't meet their software requirements.  All they had to 
remember was, "Scientific Linux" to make CERN compliance happy...

Our experiences with "grid" enablement, so far, have been both good and 
bad.  On one hand, we've beenworking with a group that feels they can 
make any app work on any system as long as the Globus Tool Kit is 
installed.  Let's just say that their progress to date on the grid 
resources we've got in that group is, well, slow.  Another group we work 
with has adopted LDM to get weather model data and share other data 
between sites.  They employ a standard minimum software "stack" (which 
does include Globus... but not a random version, rather, they specify 
which is required) and publish guidelines for porting code to different 
resources.  Their results are a bit better.

Really, the high energy physics guys have it right, though: Standardize 
ALL the software and require it be kept reasonably up to date, then 
distribute the applications they expect second and third tier 
researchers to use.  In other words, they control the OS, utilities, 
compilers, and applications.  It's a homogeneous computing environment, 
which for grid enablement, is a good environment.

We've looked at similar approaches for the atmospheric sciences: What 
about a common-hardware, common-software, 100 TF distributed environment 
(that's 0.1 petaFlop, or a significant chunk of computing horsepower). 
  What if it were set up to handle on-demand, near-real-time 
forecasting, e.g., LEAD-ish event-driven models, triggered when SPC were 
to issue a mesoscale discussion, or a watch, and updated in a RUC-ish 
manner.  What if that were available for competetive scheduling for YOUR 
weather model run.  Oh, you don't run a model?  Select from extant 
models and request a graphics run.  You don't have a project that needs 
that much computing time but you're developing a proposal?  Ask for a 
development allocation.

Using a homogeneous approach, grid enablement becomes much more 
manageable.  Commonizing the hardware and interconnect between nodes 
makes the software interactions easier to manage.  But the investment is 
still non-trivial and who's gonna do it is still a problem.

One other thing unmentioned in the article is scheduling.  In the 
scenario I described above, the idea of preemptive, prioritized, and 
reservation-enabled scheduling is implied... and pretty well mandatory. 
  Today's implementation of "the grid", and the implementations I 
anticipate for the next 5 years or more, are well suited to batch jobs 
with no element of urgency.  You  enqueue your job and when you get the 
result, you analyze it.  You might enqueue several jobs, and 
post-process the results all together.  You care about the job(s) 
running to completion and being programmatically sound, in that, if you 
run the job several times, with the same input, you get the same output. 
  You don't want that job to get in right now, run fast on enough CPUs 
to complete in minutes, and then get out.  In batch mode, you're not in 
a hurry.  That model works for retrospective analysis in our field, but 
not really well for forecasting.  Forecasting has a deadline by which 
the job has to be done, post-processing completed, and a human has 
reviewed the results.  Batch processing has no such guarantees (today).

There are metaschedulers (Spruce) that purport to do things like things 
like this for you but they  honestly need a lot more work to be "right".

So:  Will we all be using "the grid" in the future?  Maybe, almost 
certainly, yes.  (As to why, I suspect that at some point NSF will say, 
"Enough!" to every project buying a sub-teraflop cluster to do specific 
project computation, and then to be aged out at the end of the 
project... say, in three years... rather than being sustained.)  I doubt 
it will be TeraGrid facilities in the near-term, although some will try 
to support this sort of usage, and NSF will likely foster it.  But they 
are used to, and understand batch processing, and lack our sense of 
real-time urgency.  A dedicated infrastructure for Atmospheric Sciences 
(or Ocean/Atmosphere)?  Makes sense to me, but NSF has denied this once, 
at least.

Short form is, I think we have a long way to go to make "the grid" our 
standard computing environment.

gerry
-- 
Gerry Creager -- gerry.creager at tamu.edu
Texas Mesonet -- AATLT, Texas A&M University	
Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.862.3983
Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843


More information about the ldm-users mailing list