Dear Roy et al.,
Sorry for coming late to the party … Roy asked for some feedback from GDS
administrators on how server-side analysis is being used.
On Jul 1, 2012, at 4:13 PM, Roy Mendelssohn wrote:
> ... That is why I would like to hear more from people who are running F-TDS
> and GDS - how many requests do they get for server side functions,
I did a quick 'grep' on our GDS log files (100 individual months) and
calculated an average of 5585 server-side analysis requests per month, which is
< 1% of the total number of data requests to the server. Many months had 0, the
maximum was 247811. Most of these were for the real time GFS forecast data; we
are not serving a whole lot of climate data on our GDS. The complexity of the
analysis expressions is pretty broad -- some examples are basic subsets (which
I would describe as user misunderstanding the purpose of server-side analysis),
simple expressions to get the wind speed and direction from vector components,
slp differences at two grid points, time series of area averages, ensemble
averages, and variance of ensemble averages (this uses the cached result from
the ensemble average calculation).
> what is the usual response time and download for these request,
It would take some clever parsing of the log entries to get an average time,
but a cursory glance suggests most are less than 10 seconds.
> how large are the usual expressions?
If by 'large' you mean 'lots of characters in the expression', here are some
examples (1 short, 2 long):
The size of a request in terms of data volume can be constrained by server
configuration. The third example above is from the GDS documentation, and a lot
of users try it out and then modify it to suit their needs. It's more of a
climate analysis kind of expression, it calculates the mean 500mb height
anomaly associated with warm tropical SST anomalies.
> … I would welcome people who are using some of these other approaches to
> describe what they have done, the benefits of doing things that way, and what
> it means for a client.
I would say server-side analysis (of the kind employed by our GDS users) is
useful on a small scale -- individuals who desire forecast information at their
particular location. For hard-core climate research that requires the analysis
of BIG data, we haven't yet been able to exploit the power of server-side
analysis (moving the analysis to the data). At COLA, we generate a lot of data
at remote super computer centers (e.g. NCAR), but then we move a lot of it back
to our own disks to analyze it with our favorite tools, or else we login with
accounts at the remote locations where our data reside and use the analysis
servers set up there for users to access their data. For CMIP5, it is just not
practical to try to automate remote analysis of data that are so widely
distributed, with subtle differences between each data server, and a data
structure that is highly granular. Nobody at COLA is interested in using a
browser to do any data analysis, it must be programmable to be useful.
Jennifer M. Adams
4041 Powder Mill Road, Suite 302
Calverton, MD 20705