Re: [wcsplus] more on asynchronous response

NOTE: The wcsplus mailing list is no longer active. The list archives are made available for historical reasons.

To: "Tandy, Jeremy" <jeremy.tandy@xxxxxxxxxxxxxxxx>
Subject: Re: [wcsplus] more on asynchronous response
From: "Jon Blower" <jdb@xxxxxxxxxxxxxxxxxxxx>
Date: Fri, 26 Oct 2007 15:51:57 +0100

Dear all,

Good discussion, sorry I'm joining late.  (I agree with Dom's comment
regarding use of the WPS ExecuteResponse document by the way and was
about to post essentially the same message!)

Regarding caching of requests and particularly Jeremy's comment:

This is pretty easy to resolve; if you want to cache the response for
multiple usages I recommend creating an MD5 hash of the 'canonicalised'
request & using this as the key in a key-value-pair map; i.e. you use
the request hash to look up a previous response (if any).


I've applied a limited portion of my limited brainpower to this in the
past for another project and unfortunately building an effective cache
is a little more complicated than this.  (although this is an
implementation issue and I guess doesn't affect the WCS+ specification
per se).  There are two complicating factors:

1) In OGC services, parameter names are case-insensitive, values are
not.  Also many parameters are optional, and still others have no
relevance to the data extraction itself (e.g. what if only the output
format is different?  Could you cache the raw data and convert on the
fly?  This is probably a good idea and works well for my WMS
implementation).  All these things mean that you can't simply hash the
query string map without a little extra logic.

2) The BBOX parameter causes issues because slightly different BBOX
parameters might lead to identical data extractions (if the difference
in the BBOX values is smaller than a grid cell for example).  If you
simply do a string comparison you'll end up missing these cases, which
are very common in practice.

So you need a custom cache if you want it to be optimal (a naive cache
might do the job in some cases though).  My approach was to convert
the query string into a low-level set of data extraction parameters
(i.e. the parameters that are passed to NetCDF libraries for example,
to extract a block of data) and cache these low-level parameters
instead.  These parameters typically consist of a file name, internal
variable id and a set of indices for each axis in the data file.  Your
system will then parse the query string into these low-level
parameters and check for identical parameters in the cache.  BTW I
would recommend caching the raw data array to allow people to download
the same data in different formats without doing the extraction twice.


Moving on, I'm not sure about the use of HTTP response codes and
RESTful paradigms to manage the asynchronous download (I'm a fan of
REST in general by the way).  I would recommend thinking carefully
about the complexities that this design would impose on the design of
clients (the same goes for a "serverDecide" option in the asynchronous
parameter of the WCS request).  Sorry I don't have time to elucidate
but every little bit of extra complexity required of a client would
drastically reduce the number of clients that get developed.  One
server, many clients: keep the client simple.


Jon

On 10/26/07, Tandy, Jeremy <jeremy.tandy@xxxxxxxxxxxxxxxx> wrote:

All -- I found the discussion between Ethan & Paolo pretty interesting!
Thanks for putting it on the wcsplus list.

I have a couple of comments that I hope don't confuse the issues ...

1) You say:
> Yes I think that if two users make the same request than the server
> has to do the same processing twice. (Obviously a smart server could
> recognize that the requests are the same and make use of a sort of
> internal cache, but this is an implementation problem. By the way, it
> is not easy to recognize that two requests are the same, in particular

> due to the query string which is made of non-hierarchical parameters.
> E.g. two requests could only differ for the parameters order.)

This is pretty easy to resolve; if you want to cache the response for
multiple usages I recommend creating an MD5 hash of the 'canonicalised'
request & using this as the key in a key-value-pair map; i.e. you use
the request hash to look up a previous response (if any). This is how
standard web-proxies (like the open source 'Squid' work - I think).
Issues are (1) how many requests to store, & (2) how do you know when
the cache expires.

2) You suggest a 202 Accept response to the asynchronous request ...
Another option is:
If you consider the 'transaction' created by the async request as a
resource (in the RESTful sense); one could CREATE the transaction
resource by POSTing a request. POST is the correct method; as you would
be creating a subordinate resource (of unknown ID) ... The server is
responsible for identifying the URI of the resource.
Given those assumptions, you could respond with a '201 Created' & use
the 'Location' response header to direct the client application to the
'status monitor' page ...

See: http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html

Quote:
"14.30 Location

The Location response-header field is used to redirect the recipient to
a location other than the Request-URI for completion of the request or
identification of a new resource. For 201 (Created) responses, the
Location is that of the new resource which was created by the request.
For 3xx responses, the location SHOULD indicate the server's preferred
URI for automatic redirection to the resource. The field value consists
of a single absolute URI.

       Location       = "Location" ":" absoluteURI

An example is:

       Location: http://www.w3.org/pub/WWW/People.html

      Note: The Content-Location header field (section 14.14) differs
      from Location in that the Content-Location identifies the original
      location of the entity enclosed in the request. It is therefore
      possible for a response to contain header fields for both Location
      and Content-Location. Also see section 13.10 for cache
      requirements of some methods."

Cheers, Jeremy

--------------------------------------------------------------
Dr Jon Blower              Tel: +44 118 378 5213 (direct line)
Technical Director         Tel: +44 118 378 8741 (ESSC)
Reading e-Science Centre   Fax: +44 118 378 6413
ESSC                       Email: jdb@xxxxxxxxxxxxxxxxxxxx
University of Reading
3 Earley Gate
Reading RG6 6AL, UK
--------------------------------------------------------------

References:
- Re: [wcsplus] more on asynchronous response
  - From: Tandy, Jeremy

2007 messages navigation, sorted by:
1. Thread
2. Subject
3. Author
4. Date
5. ↑ Table Of Contents
Search the wcsplus archives: