[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Need THREDDS metadata catalog info



Shishir S. Bharathi wrote:

OK. I assumed that the PICats were also services. This clarifies things. What I meant by mapping was that at what level is the actual search performed based on the required keywords ?

I'm trying to summarize how to get from a set of keywords to a data item (or set) that satisfies those conditions. Is this what happens ?

1. The data arrives from it's source and stored on a storage device.

yes. a lot of data is archival data, so it doesnt need to arrive.

2. Catalog generators mine this data and generate PICats (and also Dataset catalogs ? Are these different ?)

PICats are all the various THREDDS XML documents, including catalogs, aka "dataset catalogs". The Catalogs are pretty well defined, the other PICats we are still experimenting with.

2.1. Since the data can be of different forms, you generate metadata according to different schema, but the PICat itself adheres to a single schema.

yes. there are a lot of details here we are still prototyping.

3. PICat servers pull this information from the PICats
So what do PICat servers store ? XML documents like InvCatalog.0.6.xml, which is the PICat itself ?

Currently our prototype "PICAT Server", now called "Dataset Searcher" replicates the entire catalog. We will probably revisit this when scaleability becomes an issue. So it creates an in-memory database. Obviously this wont scale either. We are considering relational databases, simple BTrees, and text indexing tools such as Lucene.

4. Query the PICat server with the keywords required
5. PICat server looks at the PICats and returns id of a Dataset Catalog
  How is this done ?

Currently just look for keyword matches. That part is easy. The space/time filtering is a bit harder. Our prototype just fits it all in memory, so scanning everything is no big deal. We are considering how to make this scaleable for the next funding cycle.

we return a catalog of matches.

6. Query the dataset catalog if needed.

same step as 5.


Is this about right ?

yup.


Thanks,
Shishir