Re: [wcsplus] more on asynchronous response

All -- I found the discussion between Ethan & Paolo pretty interesting!
Thanks for putting it on the wcsplus list.

I have a couple of comments that I hope don't confuse the issues ...

1) You say:
Yes I think that if two users make the same request than the server=20
has to do the same processing twice. (Obviously a smart server could=20
recognize that the requests are the same and make use of a sort of=20
internal cache, but this is an implementation problem. By the way, it=20
is not easy to recognize that two requests are the same, in particular

due to the query string which is made of non-hierarchical parameters.
E.g. two requests could only differ for the parameters order.)

This is pretty easy to resolve; if you want to cache the response for
multiple usages I recommend creating an MD5 hash of the 'canonicalised'
request & using this as the key in a key-value-pair map; i.e. you use
the request hash to look up a previous response (if any). This is how
standard web-proxies (like the open source 'Squid' work - I think).
Issues are (1) how many requests to store, & (2) how do you know when
the cache expires.

2) You suggest a 202 Accept response to the asynchronous request ...
Another option is:
If you consider the 'transaction' created by the async request as a
resource (in the RESTful sense); one could CREATE the transaction
resource by POSTing a request. POST is the correct method; as you would
be creating a subordinate resource (of unknown ID) ... The server is
responsible for identifying the URI of the resource.
Given those assumptions, you could respond with a '201 Created' & use
the 'Location' response header to direct the client application to the
'status monitor' page ...

See: http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html

Quote:
"14.30 Location

The Location response-header field is used to redirect the recipient to
a location other than the Request-URI for completion of the request or
identification of a new resource. For 201 (Created) responses, the
Location is that of the new resource which was created by the request.
For 3xx responses, the location SHOULD indicate the server's preferred
URI for automatic redirection to the resource. The field value consists
of a single absolute URI.

      Location       =3D "Location" ":" absoluteURI

An example is:

      Location: http://www.w3.org/pub/WWW/People.html

     Note: The Content-Location header field (section 14.14) differs
     from Location in that the Content-Location identifies the original
     location of the entity enclosed in the request. It is therefore
     possible for a response to contain header fields for both Location
     and Content-Location. Also see section 13.10 for cache
     requirements of some methods."

Cheers, Jeremy


-----Original Message-----
From: wcsplus-bounces@xxxxxxxxxxxxxxxx
[mailto:wcsplus-bounces@xxxxxxxxxxxxxxxx] On Behalf Of Ethan Davis
Sent: 24 October 2007 01:11
To: Paolo Mazzetti
Cc: wcsplus@xxxxxxxxxxxxxxxx
Subject: Re: [wcsplus] more on asynchronous response

Hi Paolo,

Paolo Mazzetti wrote:
Hi Ethan,

I am trying to summarize our respective positions and find a common
point of view useful to finalize a discussion paper. These are mine
opinions and temptative conclusion. Since I think that these issues
concern too much technical details for the mailing-list, I send my
thoughts directly to you (and Stefano in cc).

I hope you don't mind, I'm CCing the list because I think a number of
others would be interested in these details. Also, the other reason for
the list is to archive the discussions and I'd really like to keep all
of this conversation in one place.

(Sorry to make any uninterested parties hit the "delete" key more than
necessary. If anyone really wants this conversation taken off-list, let
us know.)

a) On resources and representation. I agree with your interpretation
of what resources and representations are in the WCS domain in the
sense that different subsets, interpolation, etc. identify different
resources and not simply different representations. This means that
the query string parameters are not the set of input parameters for a
single processing service resource, but actually parts of different
resources identifiers. (Indeed only the parameter FORMAT should be
considered affecting the representation and not identifying the
resource. In a perfect REST world its content should be provided in
the Accept header field.). Our (Stefano's and mine) previous note
speaking of 'representation' storage was misleading.
In my opinion, what is provided by the possible redirection is not a
new resource but a (temporary) URI which is an alias of the original
URI for the same resource (a resource can have more than an URI). For
example the resource http://someserver.net/coverages/foo?bbox=3D... is =

assigned a temporary identifier
http://someserver.net/coverages/temp/xyz. Anyway the resource is still

retrievable at the original (and authoritative URI). This alias is
useful because, for example, in the time range of its validity the
retrieving of the resource representation could be faster than the
retrieving from the original (canonical) URI.

b) On creation and redirection.  Taking into account also the previous

interpretation I still prefer the redirection response (302 code). In
particular, I think that a GET should not create any resource. RFC
1945 (HTTP/1.0) explicitely stated that "/Of the methods defined by
this specification, only POST can create a resource/.".  In HTTP/1.1
this statement was suppressed, I suppose, for the introduction of
methods other than GET, HEAD and POST but I think that its original
meaning (GET and HEAD methods cannot create resources) should remain
valid. Moreover I think that 302 responses could be cached and the URI

provided used more than one time. The RFC says that "/Since the
redirection might be altered on occasion, the client SHOULD continue
to use the Request-URI for future requests. This response is only
cacheable if indicated by a Cache-Control or Expires header field./"
(Upper case as in the original). I interpret it on a weak sense such
as "If you are not sure about the validity of the redirection then use

the original uri" but if the server knows the redirection validity it
can provide it in the header and the client can refer to it.

I think dealing with asynchronous responses requires a flexible view of
GET vs POST, creation, and "resource". An asynchronously created
resource is, in general, only temporarily available and so doesn't
affect the long-term state of the system. Even if the response is stored
more permanently it still does not change the state of the system as the
stored resource could be requested again with the original URI.

The key point in my thinking then is the intent of the request. The
intent is to retrieve a resource and not change the underlying data (or
cause any other "side-effects"). So, I think the intent of the request
is both "safe" and "idempotent" in which case GET seems appropriate.

Of course, that is for a server determined asynchronous response. When a
client makes a "store=3Dtrue" request, the intent of the request is to
create a new (though possibly temporary) resource. [Idempotent but not
safe?] So, maybe a POST is more appropriate in this case.

Concerning the other two points that you touched in your last email,
these are my opinions:

1) delayed/non-stored/pull case
What happens if two users make the same request around the same time?

Does the server have to do the same processing twice?
Yes I think that if two users make the same request than the server
has to do the same processing twice. (Obviously a smart server could
recognize that the requests are the same and make use of a sort of
internal cache, but this is an implementation problem. By the way, it
is not easy to recognize that two requests are the same, in particular

due to the query string which is made of non-hierarchical parameters.
E.g. two requests could only differ for the parameters order.)

And even worse, a small difference in a BBOX value might result in the
same resource.

Why would anyone prefer the delayed/non-stored/pull case over
delayed/stored/pull?
By the client point of view the non-stored use case has the only
(really small) advantage of avoiding the redirection. But the server
has other advantages (especially in terms of simplicity) and could
decide to not support the stored use-case for all or some of its
resources.

Ah ha. Upon re-reading the "202 Accept" section of the HTTP spec, I
realize that there is nothing in the spec that says anything about
the results of the accepted processing. The 202 response seems to
have been targeted only at requests for processing where knowing it
has been completed is all that is important. Not, as I have
interpreted it, that processing is done and may have resulted in a
new resource (all encoded in the body of the response or the results
of a status monitor). I think our interpretation of the 202 response
is the root of the difference in some of our responses. Though I
still find the 202 response the cleanest mapping to an asynchronous
response. Whether the accepted processing results in an externally
accessible artifact or not, the 202 response seems to capture what is

going on. It is up to the body of the 202 response and any response
to the "status monitor" to communicate information about any
artifacts of the accepted processing.
Yes the 202 specification is very plain.  Sending 202 the server
informs that the request has been accepted but gives no other
information about the processing. It simply avoids to mantain the
connection open for long-running processes. It seems to be designed as

the minimal basis for allowing asynchronous interaction over HTTP. It
can be used as is for a polling approach. A more meaningful semantics
is demanded to the body content. This is the reason we should define a
(XML?) schema for providing information about processing
status/result.

I definitely agree that we need to define some XML schema to provide
this information.

Taking into account all the previous points we could consider the
following approach for asynchronous operations:

a) the Client performs a GET on URI Ures
b) If the availability is delayed the server sends a 202 providing a
link to a status monitor resource (identified by the URI Ustatus)
c) the client observes the status monitor (by polling or with a push
approach in the future)
d) When the resource is available the status monitor responds:
   d1) 200 and content if storage is not required
   d2) 302 with redirection to alias URI U2 and expiration information

(if storage is required)

I think that this approach could be considered really close to what
RFC says. Let me know what you think.

That sounds good. Though I think of the status monitor as an extension
of the body of the 202 response (which is the XML document mentioned
above that we need to define). Perhaps this is part of why I had not
thought of using redirects. I see this status monitor XML document as
removing the asynchronous response from the realm of the HTTP
specification (sort of) and instead moving it into the xlink:href world.

So, rather than the status monitor response code redirecting us to the
new resource, the body of the status monitor response would indicate the
new resource was available and provide a link to the new resource. So,
here's my take:

a) client GETs the Ures URI
b) if delayed, response 202 code with
b1) Location header providing status monitor URI (Ustatus)
b2) Body containing XML document with status, estimate of completion,
and link to status monitor URI (Ustatus)
c) client GETs the Ustatus URI:
c1) if still not available, response code 200 with XML document same as
response b2 (maybe without Ustatus link).
c2) if available, response code 200 with XML document (similar to
response b2?) that indicates the resource is ready and provides a link
to  the resulting resource.

Some very simple XML possibilities ...
For b)
<asynchResponse status=3D"processing"
completionEstimate=3D"2007-10-24T02:34">
 <statusMonitor xlink:href=3D"some URI" /> </asynchResponse>

For c1)
<asynchResponse status=3D"processing"
completionEstimate=3D"2007-10-24T02:34" />

For c2)
<asynchResponse status=3D"done">
 <generatedResource xlink:href=3D"some URI" /> </asynchResponse>