Re: [wcsplus] WCS REST (was Design of asynchronous request in DEWS WCS)

Dear all,

(I'm replying to Stefano's email under John's new thread title since
this makes more sense!  I've copied Stefano's email below my message.)

Stefano - this is a really interesting email, thanks.  I must admit
I'm not sure I've fully grasped all the implications of what you say
but perhaps you can help me understand a couple of points:

1) I agree that the REST approach gives a neater and more logical URI
syntax, which can also apply to WMS, WFS etc.  It seems logical to
have a URI that represents a coverage (or Layer in a WMS), and to use
this URI as the base for data and metadata queries.  However I'm
afraid I'm still struggling to see the gains beyond this:

2) The web is highly scalable (largely) because it is
document-oriented and these documents (being generally "not very big")
can easily be cached in various places (in your browser, in proxy
servers, at ISPs etc) so that the load on the primary servers is
reduced.  However, dynamic content is usually not cached (caching
would destroy the dynamic nature) so web servers that serve dynamic
content don't benefit too much from this scalability.  Serving of
"large" files is usually made scalable, not through the caching
system, but through the creation of mirror sites (i.e. an application
outside the architecture of the Web).

3) Web servers are easy to implement because, for static content,
there is a very simple mapping between a filesystem on disk and a URI
hierarchy.  Servers that serve dynamic content are harder to
implement.

Given that OWS servers are highly dynamic and will often deal with
large datasets, I can't yet see how a REST approach brings benefits to
efficiency or ease of implementation.  The server still has to do
basically the same tasks of query parsing, data extraction and data
formatting.  To take the example of caching - what do we cache?
Entire coverages (e.g. http://someserver.net/coverages/foo)?  I don't
think this is feasible in the general case.  In any case you will need
most or all of your business logic to exist on all cache servers in
the system to support subsetting.

It is of course highly desirable to make OWS servers more efficient
and easier to implement.  However, I'm not sure the analogy with the
Web is valid.  I still think that OWS are query-oriented rather than
resource-oriented: a general WCS server will have "a few" resources
(coverages) but will need to be able to serve an infinity of possible
subsetting queries.  I think a better analogy is with data-driven
dynamic websites, which can only scale up to large number of
simultaneous users through "clustering" the back-end, which of course
adds to the complexity.

Perhaps I'm missing your point though.  It is Monday morning after all! ;-)

Jon

On 11/2/07, Stefano Nativi <nativi@xxxxxxxxxxx> wrote:
Dear all,

I really appreciate this discussion which touches several of the
issues we have been discussing and facing in our research and
development activity.

We have been developing OWS on SOAP; recently, we decided to play
with some REST implementations (especially for asynch interactions).
Therefore, I'd like to add some comments stemming from our
understanding of REST and experience with it. Please, forgive the
long content of this email; actually I put together Paolo's and my
comments :-) .


Let me distinguish between the REST approach (the architectural
style) and the RESTful implementation (the current technological
solutions for implementing REST).

The REST approach proved to be highly scalable and sufficiently
flexible in many contexts, primarly the WEB infrastructure but also
DB and filesystem access. In all these cases we have resources
singularly addressed with a uniform interface.

Indeed the possible REST actions are limited by the uniform interface
which tipically maps the simple CRUD (create, retrieve, update and
delete) paradigm. Often simplicity means generality and flexibility
(see the netCDF data model case); in fact, this simplicity was one of
the reason for the WEB pervasive success and for its scalability. On
the other hand, advanced semantic actions (e.g. resource processing
actions) must be mapped to the basic CRUD vocabulary.

For example in the DB domain we can use SQL: a DB is the resource
domain; the uniform interface is made of
SELECT/INSERT/CREATE/UPDATE/DELETE methods; resource-IDs are all the
possible SQL "WHERE" clauses.
For the WEB (which may be seen as a globally distributed DB),
resource-IDs are the WEB URIs (i.e. the "WEB clauses").
In both cases the resource-ID may become really complex (i.e. very
long KVP strings; or complex SQL JOIN SELECTS) and, hence, it may be
difficult to efficiently manage these IDs. For a REST WCS
implementation (at the abstract level; no implementation details),
resource-IDs are the GetCoverage clauses (analogous to the "SELECT"
request content).

In our opinion, this is the real asset/limitation of REST: the
application business logic must be faced and partially addressed at
the interaction level (the protocol level), leaving the rest of the
business logic to the server which, consequently, may result simpler
(almost any Institution can manage a WEB server, today). With the
Service-oriented approach, the entire application business logic is
left to the server (i.e. the service provider) implementing a even
simpler interaction: Exchange/Send an Electronic Document. Thus, SOA
guarantees high flexibility, but the server (the service provider)
has to face all the resource-related issues (e.g. resource caching,
ID, creation, encoding, etc.) anyway.

Thus REST focus is on uniform interface and resource addressing not
on resources nature (discrete, existing, etc.). If we can provide a
uniform interface and a complete resources addressing we can adopt a
REST architecture.
In our opinion WCS seems to be implicitly based on a uniform
interface (since we GET coverages, GET coverages descriptions and GET
server capabilities and we do not explicitly define other action like
INTERPOLATE, SUBSET, etc.), allowing to address each resource. Hence,
a REST architecture seems an effective choice for this domain.


As to RESTful implementation for Geospatial resources, several issues
must be considered.

First of all we should define what "resource" and " resource
representation" are in this domain. We could decide that a dataset is
the resource and all the features extracted from the dataset through
interpolation, subsetting and resampling are simply different
representations. In such case we should only address the dataset with
a known URI and possibly create new resources if required. On the
other hand we could consider each feature extracted from a dataset as
a different resource. In such case we should address each feature
with a different URI.

Presently, we are working on this second approach for some reasons:
for theoretical consistency (according to the Web architecture a
representation should only affect formats), and for implementation
reasons (different URIs could support server-side caching).

Concerning the addressing problem we do not need to explicitly define
URIs for each possible feature. We can simply provide a functional
mapping between a URI-space and resource representations. In the OWS
the URL-encoding of KVP string in a GET request IS the resource
addressing. The fact that the feature is dynamically created is not
an architectural problem but an implementation issue which might
require smart caching servers.

For example:

http://someserver.net/wcs?name=foo&bbox=-180,-90,180,90&;...

is the URI for the feature extracted by the coverage named "foo" with
the interpolation, subsetting and resampling defined by bbox (and
other) parameters.
(A better URI could be defined leaving only non-hierarchical
parameters in the query part of the URI. Something like:

http://someserver.net/coverages/foo?bbox=-180,-90,180,90&;...

)

When the request is encoded in a POST it should be considered as a
query to the root resource which responds with the representation of
the target resource. This could also be viewed as an
extraction-from-dataset service; however, this may introduce useless
complexity since the request is still a GET action. In fact, there
exists an implicit hierarchy of our features, and the root feature
(the "foo" coverage in our example) doesn't support only its own GET
operation, but also the selection of its children via a POST operation.

These considerations seem to be valid not only for WCS but for all
the data access services (e.g. WCS, WFS and WMS). They conform to a
resource-oriented approach and can be implemented in a RESTful
architecture with "minimal" modifications of existing specifications.
Besides, the RESTful implementation might be easily adopted by data
providers, since it should be based on well-known technologies.

The case of WPS and WCTS seems to be different. In fact, they don't
define a uniform interface for the many operations they should
support; on the contrary, they introduce a uniform interface to
receive a message which contains specific operation requests. In this
case we should use the POST method as the extension point for
interaction with HTTP based services which create new addressable
resources (a sort of ending point in the SOA view). In such a way we
should have the advantages of pervasive and scalable data provision
(through the RESTful implementation) and modular and composable
processing (through the service-oriented architecture).


Some possible conclusions:

A RESTful implementation is valuable for scalability and
extensibility (derived by the REST architectural style) as well as
for simplicity (the implementation is simple since it is based on
well-known technology and only simple operations must be supported server-side)

The RESTful implementation seems feasible for data access services
because they are typically resource-based.

The RESTful architecture must interact with a Service-oriented
architecture for basic and advanced processing. XML and HTTP are the
key technologies for bridging.




Thank you for your patience,

Stefano and Paolo

--
--------------------------------------------------------------
Dr Jon Blower              Tel: +44 118 378 5213 (direct line)
Technical Director         Tel: +44 118 378 8741 (ESSC)
Reading e-Science Centre   Fax: +44 118 378 6413
ESSC                       Email: jdb@xxxxxxxxxxxxxxxxxxxx
University of Reading
3 Earley Gate
Reading RG6 6AL, UK
--------------------------------------------------------------