Re: THREDDS/DLESE Connections slides

Subject: Re: THREDDS/DLESE Connections slides
From: John Caron <caron@xxxxxxxx>
Date: Tue, 18 Dec 2001 10:14:12 -0700



Peter Cornillon wrote:

Just to make sure i understand your terminology:

files = physical files
YUP
datasets = logical files we want the user to see
I don't think about datasets in a file concept. It could be a group of
files, a single file,... I guess that the reason that I don't thinkabout it that way is that the data need not be in digital form to be
grouped in a data set. Beach profiles that have been collected over
the past 50 years and consist of pages of numbers - monthly values of
depth below mean low water at specified distances from a marker in agiven direction would qualify. I suppose that your definition iscorrect from a computer perspective, I just don't think of it that way.



ok, i didnt really mean to use the word "file". how about:

"a dataset is a logical grouping of data, associated in some meaningful way fromthe user's perspective."


In a DODS server, a dataset is something you can get a DAS and DAP from.

in THREDDS, a "collection" is a collection of datasets, for which the abovedefinition also works just fine. so whats the difference between a dataset and acollection? this is the same issue that Benno has pointed out: in his DODSserver, there is no distinction between collections and datasets, because theserver seamlessly moves between collections, physical files, and the fields inthe files, presenting a uniform API of datasets with their DAP and DAS.

(I am not going to try to answer the question of what's the difference between acatalog and a collection yet; hopefully others might have some ideas)

in THREDDS, a dataset has a URI, and is the smallest choosable thing in thecatalog. our goal as middleware is to present the list of dataset choices to theuser very quickly, without having to actually contact the server. once the userselects a dataset, then the user can expect some delay while a connection ismade to the server, and the "real" dataset metadata is collected. This impliesthat the catalog metadata may not be exactly right at all times (eg the list ofavailable times of the dataset), which makes life easier for implementors.

inventory = listing of datasets
No, a listing of datasets is what I refer to as a directory (not a
directory on a computer). The GCMD is an example of same. An
inventory is a listing of elements in a data set, it could be a
list of times for satellite images in an archive along with thephysical location of the data (tape C18341 on a rack, orN861230147.hat in a computer directory on my machine) or a list
of times and locations of each XBT in an XBT archive.

so is an inventory an internal thing that the server uses to construct thedatasets that are visible to the outside world?

question:
what does it mean to "group files into data sets"? like the agg server?


One mightsay that all images in this projection, from this satellite,
processed this way form a data. Or one could say that all images in

this projection, from this suite of satellites processed this wayform a data set. Or... This is the trouble with data sets, different

people call different groupings of the data a data set. This caused
a lot of blood letting between NASA and NOAA a number of years back.
The idea is NOT to call every granule or every file in the system a
data set, you know the difference between lumpers and splitters. In
order for us to make progress, we have to back off a bit and look at
the big picture, grouping things into data sets allows us to do that.
This is exactly the problem that the DODS crawler has. When it crawls
a site such as our satellite archive, it ends up with thousands of

entries and the system or the person viewing the results struggleswith a data overload, more information that s/he/it (humm... have

to be careful with these gender neutral versions) wants or needs to
locate the group of files that define the object of interest. Given
that there is no precise definition for how to group files into a
data set, I think that we can reduce the amount of information that
we have to deal with to a reasonable view of the all the data on the
system without losing much if anything. The crawler is likely to group
the files slightly differently in some cases than the human would, but
one could probably discover this pretty quickly and steer the crawler

if necessary.

ok, this seems to be similar to the "collections" vs "datasets" issue above. Ithink i need to hear Steve's tech presentation before I can understand this anydeeper.

Generating "inventories of granules in data sets" makes sense in the context of
an agg server, but is there also meaning to it in the context of a normal DODS
server?

Not sure exactly what you mean here. We have file servers which areinventories of granules in data sets. Actually the terminology is a

bit loose here also. The server in this case is a DODS FreeForm server.
It serves a table that contains a list of URLs with the characteristic(s)
that differentiate one URI from another, time in the case of our satellite
archives.

i think some of the problem is that i think of DODS narrowly as a specificclient/server protocol, and you include services and extensions that have beenbuilt with or use that protocol.

Follow-Ups:
- Re: THREDDS/DLESE Connections slides
  - From: Peter Cornillon

References:
- THREDDS/DLESE Connections slides
  - From: John Caron
- Re: THREDDS/DLESE Connections slides
  - From: Peter Cornillon
- Re: THREDDS/DLESE Connections slides
  - From: Peter Cornillon
- Re: THREDDS/DLESE Connections slides
  - From: Peter Cornillon
- Re: THREDDS/DLESE Connections slides
  - From: John Caron
- Re: THREDDS/DLESE Connections slides
  - From: Peter Cornillon
- Re: THREDDS/DLESE Connections slides
  - From: John Caron
- Re: THREDDS/DLESE Connections slides
  - From: Peter Cornillon

2001 messages navigation, sorted by:
1. Thread
2. Subject
3. Author
4. Date
5. ↑ Table Of Contents
Search the thredds archives: