Re: [thredds] Station Data Subset Service vs. OPeNDAP constraint expressions

At LASP, we have been doing some work on a Time Series Server which
independently implements the DAP2 spec. The architecture is built around
the CDM and NcML. It accepts relational constraints like those stated
here and returns results as a DAP Sequence or any one of a number of
formats. We have focused on time series data (for scalar, vector, and
spectral data). This simplified problem domain has allowed an
interesting design to emerge. We plan to extend the data model to
support more general types and scientific "feature" types.

You can checkout version 1 of the code and see the poster we presented
last year at AGU on SourceForge:

http://sourceforge.net/projects/tsds/

or see it in action (with test data) at:

http://tsds.net/tsds/

We are currently working on version 2, known as LaTiS, with a modified
data model to better capture our dataset abstraction and to avoid some
workarounds that we did in version 1. This new model builds on the idea
of coordinate variables and captures the functional semantics (i.e. independent and dependent variables) that scientific datasets typically represent. I presented some of these ideas at the HDF Workshop last month:

http://hdfeos.org/workshops/ws14/presentations/day2/LaTiS_Common_Data_Model_WS14.pptx

I hope to have a release of LaTiS out by December in time for AGU where
I will be presenting a talk about promoting interoperability by using
common data interfaces in the middle tier of data access systems.

We don't immediately aim to address John's concerns of a semantic API at the service level. Our immediate needs are satisfied by providing a "smart" IDL API and Web clients (e.g. JavaScript). The DAP spec does allow for "functions" in the request. It is possible to capture some higher level semantics there. Another approach is to use a data descriptor (like NcML) to rename the variables to a standard name that can then be used in an OPeNDAP request.

I welcome any questions or comments about our work.

Doug

P.S. Maybe this thread should be cross-posted to netcdf-java?

On 10/13/10 9:52 PM, John Caron wrote:
On 10/13/2010 5:00 PM, Bob Simons wrote:
John,

I read about the new Station Data Subset Service (I'll call it SDSS
in this email)
http://www.unidata.ucar.edu/projects/THREDDS/tech/interfaceSpec/StationDataSubsetService.html


(version 0.2), which lists you as the contact. I understand that the UAF group is considering using SDSS to deal with station data.

I noticed that SDSS queries are very similar to OPeNDAP constraint
expression queries (
http://www.opendap.org/user/guide-html/guide_33.html ). Yet, SDSS
seems limited to one type of dataset (stations with time, latitude,
longitude, ... data, because it uses specific variable names, e.g.,
stn, north, south, west, east, time for the constraints) while
OPeNDAP constraint expressions can be used with a much broader
range of datasets, notably, any dataset that can be represented as
a database-like table, because it isn't tied to any specific
variable names. And OPeNDAP's bigger set of operators (=,<,<=,>,>=,
!=, =~) can be applied to any variable, not just
longitude/latitude/depth/time/stn.

The sample queries in the SDSS documentation can easily be
converted to OPeNDAP constraint expression queries, for example:

SDSS: ?north=17.3&south=12.088&west=140.2&east=160.0 OPeNDAP:
?latitude<=17.3&latitude>=12.088&longitude>=140.2&longitude<=160.0

SDSS: ?stn=KDEN OPeNDAP: ?stn="KDEN"

SDSS: ?stn=KDEN&stn=KPAL&stn=SDOL OPeNDAP: ?stn=~"KDEN|KPAL|SDOL"
(=~ lets you specify a regular expression to be matched)

SDSS:
?time_start=2007-03-29T12:00:00Z&time_end=2007-03-29T13:00:00Z
OPeNDAP:
?time>="2007-03-29T12:00:00Z"&time<="2007-03-29T13:00:00Z"

SDSS' accept=mime_type could be mimicked by having the OPeNDAP
server support file extensions in addition to .dods and .asc (or by
some other means if necessary). And mime types have a problem if
two file types share the same mime type.

OPeNDAP's sequence data type is well-suited to this type of data
query and to the API described at
http://www.unidata.ucar.edu/software/netcdf-java/reference/FeatureDatasets/PointFeatures.html
.

I have worked quite a lot with OPeNDAP constraint expressions and I
have found them to be * Very flexible (well-suited to a wide range
of datasets and queries), * Very easy for non-programmers to read,
write, and understand, * Easy to convert into queries for other
types of data servers (e.g., SQL, SOS, OBIS), * Easy for data
servers to handle and optimize. They are sort of like a nice subset
of SQL with a really simple syntax.


All of this discussion leads up to this: I'm very curious: why did
you decide to define a new protocol instead of using the existing
standard OPeNDAP constraint expression protocol? And/or, would you
consider switching to the OPeNDAP constraint expression protocol?

Instead of creating a new service with one server implementation
(THREDDS) and one client implementation (netcdf-java), switching to
OPeNDAP constraint expressions would hook your service into the
realm of other servers and clients that already support OPeNDAP
constraint expressions.

And supporting OPeNDAP constraint expressions in THREDDS seems like
a logical extension for a data server which already supports
OPeNDAP grid/hyperslab queries.

I am very curious to hear your thoughts on this.

Thanks for considering this.


Sincerely,

Bob Simons IT Specialist Environmental Research Division NOAA
Southwest Fisheries Science Center 1352 Lighthouse Ave Pacific
Grove, CA 93950-2079 Phone: (831)658-3205 Fax: (831)648-8440 Email:
bob.simons@xxxxxxxx

The contents of this message are mine personally and do not
necessarily reflect any position of the Government or the National
Oceanic and Atmospheric Administration. <><  <><  <><  <><  <><
<><  <><  <><  <><

Hi Bob:

The original motivation of the Netcdf Subset Service was to provide
subsets of gridded data in netCDF-CF format. The subsetting request
is specified in coordinate (lat/lon/alt/time) space, so that it could
be done from a web form, or from a simple wget script. The service
has continued to evolve, and its time to evaluate where it is and
where it should go, so your question comparing it to OPeNDAP is
timely.

Background

The NetCDF Subset Services (NCSS) are a family of experimental web
protocols for making queries in coordinate space (rather than index
space), against CDM "Feature Type" datasets; see:

http://www.unidata.ucar.edu/projects/THREDDS/tech/interfaceSpec/NetcdfSubsetService.html

 Functionally, they are intended to be a simplified version of the
OGC protocols, and are most directly an alternative to OGC web
services. In order to support queries in coordinate space the data
model has to have a general notion of coordinates, and in particular,
the use case I want to cover is to support space/time subsetting. The
data models of OPeNDAP, netCDF and HDF5 have only partially handled
coordinate systems; see:

http://www.unidata.ucar.edu/software/netcdf-java/CoordinateSystemsNeeded.htm

 This is one reason why the OGC protocols have the mind share that
they do (plus lots of $$$ and commercial effort, etc). This is also
the reason that the CDM is an extension of OPeNDAP, netCDF and HDF5,
rather than just their union, see:

http://www.unidata.ucar.edu/software/netcdf-java/CDM/index.html

As I mentioned, NCSS are intended to return results in commonly used
formats (netCDF, CSV, XML, etc) that can be used in other
applications directly, rather than having to have a smart client that
can convert binary dods objects.

OPeNDAP

To answer your specific questions:

Yet, SDSS seems limited to one type of dataset (stations with time,
latitude, longitude, ... data, because it uses specific variable
names, e.g., stn, north, south, west, east, time for the
constraints) while OPeNDAP constraint expressions can be used with
a much broader range of datasets, notably, any dataset that can be
represented as a database-like table, because it isn't tied to any
specific variable names.  And OPeNDAP's bigger set of operators
(=,<,<=,>,>=, !=, =~) can be applied to any variable, not just
longitude/latitude/depth/time/stn.

"stn, north, south, west, east, time" are not variable names, they
are names for those semantic concepts, and dont depend on those names
being present in the dataset. In that sense they are more general
than an OPeNDAP request, where you have to know what the actual names
of the variables are.

OPeNDAP constraint expressions are very powerful but they have two
major problems:

1) they operate on the syntactic level, so, for example, they dont
know that lon == longitude, and so cant deal with the longitude seam
at +/- 180 (or wherever it is). Another example: if your dataset does
not include lat/lon variables, but instead is on a projection, your
client has to know how to do the projective geometry math.

2) its hard to efficiently implement the full relational constraint
expressions unless you are using an RDBMS. For that reason, you
rarely see it implemented in OPeNDAP servers. The NCSS only
implements space and time and variable subsetting. This is hard
enough to do in a general way, but not as hard as supporting
relational constraints on all fields. (OTOH, the relational queries
are very nice to use, its just the server implementation thats
hard).

I have made various suggestions to James over the years on what
extensions to OPeNDAP could be used for this use case, but there's no
point in Unidata creating non-standard OPeNDAP implementations, since
the whole point of OPeNDAP is interoperability between clients and
servers.  If a standard OPeNDAP way to do coordinate space subsetting
emerged, we would be willing to implement it. The "DAPPER protocol"
for example seems to be the best fit that Ive seen for the "Station
Data Subset Service" use case; essentially DAPPER is a small set of
conventions on top of OPeNDAP.  These need to be clarified and
extended a bit IMO to be generally useful, but are a good start.
(BTW, Are you using it?)

In the meanwhile, its much faster for us to roll our own, since we
own both the server and the client stack, so we can experiment with
what works without worrying about breaking OPeNDAP or OGC standards.
Most of the work is in the server implementation, so if there was a
different but functionally equivalent query protocol, we could easily
switch to it. So Im pretty confident that the software we have been
implementing can be used, no matter what protocol clients eventually
want us to support. I am aware of the dangers of proprietary
protocols, but also the frustration of complex standards and ones
that don't move for 10 years.

Smart clients like the ones you have been writing can do a lot on top
of OPeNDAP, but dumb(er) clients cant. We need to push as much of
those smarts into the server as possible, and in order to do that, we
need to operate on "higher level semantic" objects than indexed
arrays. In the CDM, these objects are intended to be the "Feature
Types". The "Grid" Feature Type allows the TDS to support the OGC WCS
and WMS protocols, which are becoming more important to getting our
data out to a wider community. Those have the problem of being overly
complex. The NCSS protocols are looking for the sweet spot of
functionality and simplicity.

would you consider switching to the OPeNDAP constraint expression
protocol?

Id be willing to add something like DAPPER as another way that the
Station Data Subset Service can deliver data, if there was an
important class of clients that needed it and could use it. OTOH, if
your software is using the CDM stack, do you care how the objects are
delivered to it?

switching to OPeNDAP constraint expressions would hook your service
into the realm of other servers and clients that already support
OPeNDAP constraint expressions.

Id be interested in knowing which clients can handle relational
constraint expressions? The NetCDF clients cannot, because it falls
outside of the data model and API. I know you guys do a lot with
relational databases, so its not surprising if your software does.
Ive been working almost exclusively on top of collections of files
(netcdf, hdf, grib, bufr, etc). I have been on the lookout for new
solutions, but for now it seems that people need services that run on
top of those file collections.

Comments, please

Im looking forward to an extended discussion of these issues and
where remote access protocols should evolve. Anyone who would like to
comment, please feel free. Note that Ive cross posted to 2 groups,
beware of cross posting if you're not on both. (now that i think of
it, im not sure that im on both).

John Caron

_______________________________________________ thredds mailing list
thredds@xxxxxxxxxxxxxxxx For list information or to unsubscribe,
visit: http://www.unidata.ucar.edu/mailing_lists/



  • 2010 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the thredds archives: