Dear all, Good discussion, sorry I'm joining late. (I agree with Dom's comment regarding use of the WPS ExecuteResponse document by the way and was about to post essentially the same message!) Regarding caching of requests and particularly Jeremy's comment:
This is pretty easy to resolve; if you want to cache the response for multiple usages I recommend creating an MD5 hash of the 'canonicalised' request & using this as the key in a key-value-pair map; i.e. you use the request hash to look up a previous response (if any).
I've applied a limited portion of my limited brainpower to this in the past for another project and unfortunately building an effective cache is a little more complicated than this. (although this is an implementation issue and I guess doesn't affect the WCS+ specification per se). There are two complicating factors: 1) In OGC services, parameter names are case-insensitive, values are not. Also many parameters are optional, and still others have no relevance to the data extraction itself (e.g. what if only the output format is different? Could you cache the raw data and convert on the fly? This is probably a good idea and works well for my WMS implementation). All these things mean that you can't simply hash the query string map without a little extra logic. 2) The BBOX parameter causes issues because slightly different BBOX parameters might lead to identical data extractions (if the difference in the BBOX values is smaller than a grid cell for example). If you simply do a string comparison you'll end up missing these cases, which are very common in practice. So you need a custom cache if you want it to be optimal (a naive cache might do the job in some cases though). My approach was to convert the query string into a low-level set of data extraction parameters (i.e. the parameters that are passed to NetCDF libraries for example, to extract a block of data) and cache these low-level parameters instead. These parameters typically consist of a file name, internal variable id and a set of indices for each axis in the data file. Your system will then parse the query string into these low-level parameters and check for identical parameters in the cache. BTW I would recommend caching the raw data array to allow people to download the same data in different formats without doing the extraction twice. Moving on, I'm not sure about the use of HTTP response codes and RESTful paradigms to manage the asynchronous download (I'm a fan of REST in general by the way). I would recommend thinking carefully about the complexities that this design would impose on the design of clients (the same goes for a "serverDecide" option in the asynchronous parameter of the WCS request). Sorry I don't have time to elucidate but every little bit of extra complexity required of a client would drastically reduce the number of clients that get developed. One server, many clients: keep the client simple. Jon On 10/26/07, Tandy, Jeremy <jeremy.tandy@xxxxxxxxxxxxxxxx> wrote:
All -- I found the discussion between Ethan & Paolo pretty interesting! Thanks for putting it on the wcsplus list. I have a couple of comments that I hope don't confuse the issues ... 1) You say: > Yes I think that if two users make the same request than the server > has to do the same processing twice. (Obviously a smart server could > recognize that the requests are the same and make use of a sort of > internal cache, but this is an implementation problem. By the way, it > is not easy to recognize that two requests are the same, in particular > due to the query string which is made of non-hierarchical parameters. > E.g. two requests could only differ for the parameters order.) This is pretty easy to resolve; if you want to cache the response for multiple usages I recommend creating an MD5 hash of the 'canonicalised' request & using this as the key in a key-value-pair map; i.e. you use the request hash to look up a previous response (if any). This is how standard web-proxies (like the open source 'Squid' work - I think). Issues are (1) how many requests to store, & (2) how do you know when the cache expires. 2) You suggest a 202 Accept response to the asynchronous request ... Another option is: If you consider the 'transaction' created by the async request as a resource (in the RESTful sense); one could CREATE the transaction resource by POSTing a request. POST is the correct method; as you would be creating a subordinate resource (of unknown ID) ... The server is responsible for identifying the URI of the resource. Given those assumptions, you could respond with a '201 Created' & use the 'Location' response header to direct the client application to the 'status monitor' page ... See: http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html Quote: "14.30 Location The Location response-header field is used to redirect the recipient to a location other than the Request-URI for completion of the request or identification of a new resource. For 201 (Created) responses, the Location is that of the new resource which was created by the request. For 3xx responses, the location SHOULD indicate the server's preferred URI for automatic redirection to the resource. The field value consists of a single absolute URI. Location = "Location" ":" absoluteURI An example is: Location: http://www.w3.org/pub/WWW/People.html Note: The Content-Location header field (section 14.14) differs from Location in that the Content-Location identifies the original location of the entity enclosed in the request. It is therefore possible for a response to contain header fields for both Location and Content-Location. Also see section 13.10 for cache requirements of some methods." Cheers, Jeremy
-------------------------------------------------------------- Dr Jon Blower Tel: +44 118 378 5213 (direct line) Technical Director Tel: +44 118 378 8741 (ESSC) Reading e-Science Centre Fax: +44 118 378 6413 ESSC Email: jdb@xxxxxxxxxxxxxxxxxxxx University of Reading 3 Earley Gate Reading RG6 6AL, UK --------------------------------------------------------------