Re: [wcsplus] asynchronous response [was: Re: WCS 1.0+]

Hi Stefano,

Stefano Nativi wrote:
Hi Ethan,

Sorry for the delay of my answer; I had to prepare my term in Padua.

I try to answer inline.

I agree "Asynchronous Access" is not very clear. "Store" is much clearer. Does "Persistent Storage" add some specific meaning for you to "storage"?

We used the term "persistent" to express the storage capability to last a certain time period allowing more than one download. Perhaps, "persistent" is too strong.

I think "storage" already implies keeping items for some amount of time. Is the key point for you the time period it is stored ("persistent", or "long-term"?), that it doesn't go away after the first access (?), or that more than one client can request the stored data ("shared storage"?).

I agree that "Push" capabilities might be handy. However, I think it is loaded with difficulties. The main issue is that there will be many firewall issues to get around on the client side. Most firewalls are setup to allow outgoing HTTP requests and incoming responses but not incoming requests. The other issue is that it means a client needs to also be an HTTP server so it can accept HTTP POST requests. If there are a lot of clients trying to receive PUSH responses, we are back to the firewall which may only have one or two machines each with one or two ports open for HTTP. There are ways to deal with all these issues (e.g., you mention the possibility of an upload server (proxy?) that then deals with the client) but all the PUSH issue adds a lot of complexity to the already quite complex asynchronous issue. Because of this, I think we (WCS 1.0+) should skip the PUSH issue for now.

First of all, we agree that the "push" approach should be avoided for now.

As for your comment, it is possible to use a proxy application server to avoid the firewall issue.

Agreed on both counts.

One comment on the response codes, I think we should use the 201 (Created) HTTP response code rather than 302 (Found) for the immediate/store case. I think the meaning of the 201 code (the request has caused a new resource to be created and it can be found at the given URI) more closely matches this case than the meaning of the 302 code (the resource is not at the request URI but can be found at the given redirect URI). A subtle difference perhaps but I think it is important to be careful that our mapping matches the standard meaning of the HTTP response codes.

Actually there's a subtle diversity, here; it mainly depends on the used interface. Let's consider the following use cases:

a) case where to use "302 Found" with Location: U2
    > I send to U1 URI a GET request to retrieve the U1 resource
    > the U1 resource representation is available at URI U2
> the authoritative address of the resource is still U1 (clients must send their following requests to U1 URI)
b) case where to use "201 Create" with Location: U2
    > I send to U1 URI a POST request for creating a new resource U2
    > the created U2 resource is available at U2 URI
> the authoritative address of the resource is U2 (clients must send their following GET requests to U2 URI)

In summary, a GET request should be used to retrieve a "representation" of an existing resource; while, POST is used to "create" a new resource along with its authoritative address.

That is a pretty strict interpretation of the line between resource and representation and between GET and POST. Since a WCS response can be a quite complex (possibly never to be repeated) "representation" of a resource (subset, remap, interpolate, etc), I really think of each one as a new resource rather than a representation.

Also, for a stored WCS response, clients should be able to share the resulting URI with others and access it multiple times. The HTTP spec is very clear that the 302 response should not be cached or used multiple times, that any repeated access should go to the original URI.

So, I still feel that 201 is a more appropriate fit for this case.

I don't understand your delayed/non-stored/pull case. Isn't it implicitly the same as the delay/store/pull? The server starts processing on the first request, ignores any requests till it is done, and once finished stores the data till it is requested again. Why not use the 202 response to send information about when to check again and all the other stuff recommended as content in the 202 response body.

The difference is about the redirection aspect:

a) delayed/non-stored/pull case:
    > client requests U1 resource
    > server returns the 202 Accept (with status ?)
    > .....
    > client requests U1 resource
    > server returns the 200 OK with the resource representation
    > /following requests requires the entire processing, again.

/b) delayed/stored/pull case:
    > client requests U1 resource
    > server returns the 302 Found with Location: U2
    > client requests U2 resource
    > server returns the 202 Accept (with status ?)
    > .....
    > client requests U2 resource
    > server returns the 200 OK with the resource representation
> /following requests may be directed to U2, accessing the existing resource representation (persistent store).
/

I'm still not sure I understand. What happens if two users make the same request around the same time? Does the server have to do the same processing twice? Why would anyone prefer the delayed/non-stored/pull case over delayed/stored/pull?



Ah ha. Upon re-reading the "202 Accept" section of the HTTP spec, I realize that there is nothing in the spec that says anything about the results of the accepted processing. The 202 response seems to have been targeted only at requests for processing where knowing it has been completed is all that is important. Not, as I have interpreted it, that processing is done and may have resulted in a new resource (all encoded in the body of the response or the results of a status monitor). I think our interpretation of the 202 response is the root of the difference in some of our responses.

Though I still find the 202 response the cleanest mapping to an asynchronous response. Whether the accepted processing results in an externally accessible artifact or not, the 202 response seems to capture what is going on. It is up to the body of the 202 response and any response to the "status monitor" to communicate information about any artifacts of the accepted processing.

//In my opinion, this discussion and the related documentation is very interesting and I'd like to consolidate it in an OGC discussion paper. What do you think? Who is interested in co-authoring this document?

I'm not familiar with OGC discussion papers but I'm ok with consolidating the discussion for review by others outside the WCS 1.0.0+ group.

Ethan


--Stefano

--
Ethan R. Davis                                Telephone: (303) 497-8155
Software Engineer                             Fax:       (303) 497-8690
UCAR Unidata Program Center                   E-mail:    edavis@xxxxxxxx
P.O. Box 3000
Boulder, CO  80307-3000                       http://www.unidata.ucar.edu/
---------------------------------------------------------------------------