Re: [thredds] Station Data Subset Service vs. OPeNDAP constraint expressions

 Hi John,

The high level description of our adapted DAPPER convention is given at : https://gforge.ifremer.fr/wiki/doku.php?id=oceanotron:opendap_dapper_frontdesk

We have no server on internet yet. It is under validation for now. I hope we will be able to publish one in the coming weeks. I'll let you know then.

Thomas



On 10/18/2010 05:48 PM, John Caron wrote:
Hi Thomas:

Have you published a description of your version of the DAPPER convention? Do you have any URLs that could be used to test server/client compatibility?

John

On 10/14/2010 11:20 AM, Thomas.Loubrieu@xxxxxxxxxx wrote:
Dear all,

I just would like to tell that we are developping an opendap server
at Ifremer which is dedicated to in-situ datasets (points, profiles,
timeseries and trajectories). It is now called Oceanotron and it is
the update of server we previously called Dap4cor.

It is supposed to be fully compliant with Dapper server (but reads
others types of file format) and relies on the relational opendap
constraints to filter the dataset.

Pydap is our favorite API for testing the server.

To sum up, the purpose of my email is to tell we like very much
relational constraints in OPeNDAP.

Best regards,

Thomas

John Caron <caron@xxxxxxxxxxxxxxxx> a écrit :

On 10/13/2010 5:00 PM, Bob Simons wrote:
John,

I read about the new Station Data Subset Service (I'll call it
SDSS in this email)
http://www.unidata.ucar.edu/projects/THREDDS/tech/interfaceSpec/StationDataSubsetService.html


(version 0.2), which lists you as the contact. I understand that the UAF group is considering using SDSS to deal with station data.

I noticed that SDSS queries are very similar to OPeNDAP
constraint expression queries (
http://www.opendap.org/user/guide-html/guide_33.html ). Yet, SDSS
seems limited to one type of dataset (stations with time,
latitude, longitude, ... data, because it uses specific variable
names, e.g., stn, north, south, west, east, time for the
constraints) while OPeNDAP constraint expressions can be used
with a much broader range of datasets, notably, any dataset that
can be represented as a database-like table, because it isn't
tied to any specific variable names. And OPeNDAP's bigger set of
operators (=, <, <=, >, >=, !=, =~) can be applied to any
variable, not just longitude/latitude/depth/time/stn.

The sample queries in the SDSS documentation can easily be
converted to OPeNDAP constraint expression queries, for example:

SDSS: ?north=17.3&south=12.088&west=140.2&east=160.0 OPeNDAP:
?latitude<=17.3&latitude>=12.088&longitude>=140.2&longitude<=160.0



SDSS: ?stn=KDEN
OPeNDAP: ?stn="KDEN"

SDSS: ?stn=KDEN&stn=KPAL&stn=SDOL OPeNDAP:
?stn=~"KDEN|KPAL|SDOL" (=~ lets you specify a regular expression
to be matched)

SDSS:
?time_start=2007-03-29T12:00:00Z&time_end=2007-03-29T13:00:00Z
OPeNDAP:
?time>="2007-03-29T12:00:00Z"&time<="2007-03-29T13:00:00Z"

SDSS' accept=mime_type could be mimicked by having the OPeNDAP
server support file extensions in addition to .dods and .asc (or
by some other means if necessary). And mime types have a problem
if two file types share the same mime type.

OPeNDAP's sequence data type is well-suited to this type of data
query and to the API described at
http://www.unidata.ucar.edu/software/netcdf-java/reference/FeatureDatasets/PointFeatures.html
.

I have worked quite a lot with OPeNDAP constraint expressions and
I have found them to be * Very flexible (well-suited to a wide
range of datasets and queries), * Very easy for non-programmers
to read, write, and understand, * Easy to convert into queries
for other types of data servers (e.g., SQL, SOS, OBIS), * Easy
for data servers to handle and optimize. They are sort of like a
nice subset of SQL with a really simple syntax.


All of this discussion leads up to this: I'm very curious: why
did you decide to define a new protocol instead of using the
existing standard OPeNDAP constraint expression protocol? And/or,
would you consider switching to the OPeNDAP constraint expression
protocol?

Instead of creating a new service with one server implementation
(THREDDS) and one client implementation (netcdf-java), switching
to OPeNDAP constraint expressions would hook your service into
the realm of other servers and clients that already support
OPeNDAP constraint expressions.

And supporting OPeNDAP constraint expressions in THREDDS seems
like a logical extension for a data server which already supports
OPeNDAP grid/hyperslab queries.

I am very curious to hear your thoughts on this.

Thanks for considering this.


Sincerely,

Bob Simons IT Specialist Environmental Research Division NOAA
Southwest Fisheries Science Center 1352 Lighthouse Ave Pacific
Grove, CA 93950-2079 Phone: (831)658-3205 Fax: (831)648-8440
Email: bob.simons@xxxxxxxx

The contents of this message are mine personally and do not
necessarily reflect any position of the Government or the
National Oceanic and Atmospheric Administration. <>< <>< <>< <><
<>< <>< <>< <>< <><

Hi Bob:

The original motivation of the Netcdf Subset Service was to provide
subsets of gridded data in netCDF-CF format. The subsetting request
is specified in coordinate (lat/lon/alt/time) space, so that it
could be done from a web form, or from a simple wget script. The
service has continued to evolve, and its time to evaluate where it
is and where it should go, so your question comparing it to OPeNDAP
is timely.

Background

The NetCDF Subset Services (NCSS) are a family of experimental web
protocols for making queries in coordinate space (rather than index
space), against CDM "Feature Type" datasets; see:

http://www.unidata.ucar.edu/projects/THREDDS/tech/interfaceSpec/NetcdfSubsetService.html



Functionally, they are intended to be a simplified version of the OGC protocols, and are most directly an alternative to OGC web services. In order to support queries in coordinate space the data model has to have a general notion of coordinates, and in particular, the use case I want to cover is to support space/time subsetting. The data models of OPeNDAP, netCDF and HDF5 have only partially handled coordinate systems; see:

http://www.unidata.ucar.edu/software/netcdf-java/CoordinateSystemsNeeded.htm



This is one reason why the OGC protocols have the mind share that they do (plus lots of $$$ and commercial effort, etc). This is also the reason that the CDM is an extension of OPeNDAP, netCDF and HDF5, rather than just their union, see:

http://www.unidata.ucar.edu/software/netcdf-java/CDM/index.html

As I mentioned, NCSS are intended to return results in commonly
used formats (netCDF, CSV, XML, etc) that can be used in other
applications directly, rather than having to have a smart client
that can convert binary dods objects.

OPeNDAP

To answer your specific questions:

Yet, SDSS seems limited to one type of dataset (stations with
time, latitude, longitude, ... data, because it uses specific
variable names, e.g., stn, north, south, west, east, time for the
constraints) while OPeNDAP constraint expressions can be used
with a much broader range of datasets, notably, any dataset that
can be represented as a database-like table, because it isn't
tied to any specific variable names. And OPeNDAP's bigger set of
operators (=, <, <=, >, >=, !=, =~) can be applied to any
variable, not just longitude/latitude/depth/time/stn.

"stn, north, south, west, east, time" are not variable names, they
are names for those semantic concepts, and dont depend on those
names being present in the dataset. In that sense they are more
general than an OPeNDAP request, where you have to know what the
actual names of the variables are.

OPeNDAP constraint expressions are very powerful but they have two
major problems:

1) they operate on the syntactic level, so, for example, they dont
know that lon == longitude, and so cant deal with the longitude
seam at +/- 180 (or wherever it is). Another example: if your
dataset does not include lat/lon variables, but instead is on a
projection, your client has to know how to do the projective
geometry math.

2) its hard to efficiently implement the full relational constraint
expressions unless you are using an RDBMS. For that reason, you
rarely see it implemented in OPeNDAP servers. The NCSS only
implements space and time and variable subsetting. This is hard
enough to do in a general way, but not as hard as supporting
relational constraints on all fields. (OTOH, the relational queries
are very nice to use, its just the server implementation thats
hard).

I have made various suggestions to James over the years on what
extensions to OPeNDAP could be used for this use case, but there's
no point in Unidata creating non-standard OPeNDAP implementations,
since the whole point of OPeNDAP is interoperability between
clients and servers. If a standard OPeNDAP way to do coordinate
space subsetting emerged, we would be willing to implement it. The
"DAPPER protocol" for example seems to be the best fit that Ive
seen for the "Station Data Subset Service" use case; essentially
DAPPER is a small set of conventions on top of OPeNDAP. These need
to be clarified and extended a bit IMO to be generally useful, but
are a good start. (BTW, Are you using it?)

In the meanwhile, its much faster for us to roll our own, since we
own both the server and the client stack, so we can experiment with
what works without worrying about breaking OPeNDAP or OGC
standards. Most of the work is in the server implementation, so if
there was a different but functionally equivalent query protocol,
we could easily switch to it. So Im pretty confident that the
software we have been implementing can be used, no matter what
protocol clients eventually want us to support. I am aware of the
dangers of proprietary protocols, but also the frustration of
complex standards and ones that don't move for 10 years.

Smart clients like the ones you have been writing can do a lot on
top of OPeNDAP, but dumb(er) clients cant. We need to push as much
of those smarts into the server as possible, and in order to do
that, we need to operate on "higher level semantic" objects than
indexed arrays. In the CDM, these objects are intended to be the
"Feature Types". The "Grid" Feature Type allows the TDS to support
the OGC WCS and WMS protocols, which are becoming more important to
getting our data out to a wider community. Those have the problem
of being overly complex. The NCSS protocols are looking for the
sweet spot of functionality and simplicity.

would you consider switching to the OPeNDAP constraint expression
protocol?

Id be willing to add something like DAPPER as another way that the
Station Data Subset Service can deliver data, if there was an
important class of clients that needed it and could use it. OTOH,
if your software is using the CDM stack, do you care how the
objects are delivered to it?

switching to OPeNDAP constraint expressions would hook your
service into the realm of other servers and clients that already
support OPeNDAP constraint expressions.

Id be interested in knowing which clients can handle relational
constraint expressions? The NetCDF clients cannot, because it falls
outside of the data model and API. I know you guys do a lot with
relational databases, so its not surprising if your software does.
Ive been working almost exclusively on top of collections of files
(netcdf, hdf, grib, bufr, etc). I have been on the lookout for new
solutions, but for now it seems that people need services that run
on top of those file collections.

Comments, please

Im looking forward to an extended discussion of these issues and
where remote access protocols should evolve. Anyone who would like
to comment, please feel free. Note that Ive cross posted to 2
groups, beware of cross posting if you're not on both. (now that i
think of it, im not sure that im on both).

John Caron

_______________________________________________ thredds mailing
list thredds@xxxxxxxxxxxxxxxx For list information or to
unsubscribe, visit: http://www.unidata.ucar.edu/mailing_lists/




--


-------------------------------------------------------------
Thomas LOUBRIEU
IFREMER IDM/ISI
BP70
29280 Plouzane
FRANCE

email: Thomas.Loubrieu@xxxxxxxxxx
WWW  : http://www.coriolis.eu.org/cdc
Tel.:  (+33) (0)2 98 22 48 53
Fax:   (+33) (0)2 98 22 46 44

-------------------------------------------------------------




  • 2010 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the thredds archives: