[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[THREDDS #BTY-229773]: THREDDS - catalog generation of remote sites



Hi Katherine,

> Please note - new at this...
> We have our data spread out at many sites and are hoping to use
> THREDDS to create a centralized catalog of it all, in one place.
> 
> Having read your documentation, I am still unclear as to how to
> do this. I was hoping to do something like this
> --------
[snip]
> <datasetScan name="Pressure Data at Computer One" ID="ggap19972001"
>              path="ggap19972001"
>              location="http://onecomputer.ac.uk/ecmwfPressure/"; 
> harvest="true">
[snip]
> <datasetScan name="Pressure Data at Computer Two" ID="ggap199720012"
>              path="ggap199720012"
>              location="http://twocomputer.ac.uk/ecmwfPressure/"; 
> harvest="true">
[snip]

Currently, the TDS only knows how to scan and serve local datasets. (There is 
code to scan OPeNDAP server HTML index pages but it is somewhat fragile at the 
moment and we don't encourage its use.) However, this kind of capability is 
something we are interested in. We just haven't had a pressing use-case for 
this capability.

What is your current setup like? What do you get back from the location URLs 
above? A web index page that lists data files that are accessible over HTTP? It 
would certainly be possible to scan, generate catalogs, and serve from a TDS. 
It would take some coding but we've tried to allow for this kind of 
extensibility. The problem, similar to the OPeNDAP scanning I mention above, is 
that web index pages are not a standard format. They are HTML but can vary from 
server to server. So, it is hard to write code that handles all variants. 

> -----------
> But I can't seem to get that to work.  Do I need to install
> opendap and thredds at each site? And then make a catalog at
> each site, and then make a centralized "motherlode" catalog
> as the central site?

This would be an easy solution. A THREDDS Data Servers (TDS) at each site and a 
central catalog that references with a catalogRef element the main catalog for 
each site.

Also, there are performance advantages to this route. The single, central TDS 
would take a hit from accessing the data it served over HTTP rather than from 
local disk. The distributed TDS could have local access to the data they serve.

> Sorry if this is basic stuff. I do have a local site up and
> running just fine.

No problem. Distributed data and how to serve it and have a central view are 
definitely not basic.

Hope this helps. Let me know if you have more questions.

Ethan


Ticket Details
===================
Ticket ID: BTY-229773
Department: Support THREDDS
Priority: Normal
Status: Open