Re: Dealing with large archives

To: Tennessee Leeuwenburg <t.leeuwenburg@xxxxxxxxxx>
Subject: Re: Dealing with large archives
From: Ethan Davis <edavis@xxxxxxxxxxxxxxxx>
Date: Thu, 03 Feb 2005 18:21:33 -0700

Hi Tennesee,

Tennessee Leeuwenburg wrote:

Secondly :
I am trying to work out how to structure my data by date. I will havea number of data sets (NWP Models) which will get updated daily, oreven multiple times per day. Quite quickly I will reach the pointwhere I will have hundreds of data sets published. Even a week's worthof data at 2 per day across 3 sources is 42 data sets.
I have two tasks - one would be to automate the updating of theconfiguration files so that new data sets get incorporated as theybecome available, and the other would be structuring the data pages ina sensible way for users to access.

The THREDDS catalog generation tool can automate generation of catalogsbut it does not generate aggregation server config files. Actually, itcan generate the parts that aren't aggregations, i.e., the plain THREDDScatalogs parts of the config file. I've always wanted to extend it todeal with the aggregation part of the aggServer config but have nevergotten around to doing so.

We're currently working on the next release of the THREDDS server. TheOPeNDAP netCDF server side of that should be quite a bit easier toconfigure (e.g., give it a directory and it serves all the files in thatdirectory that match a certain pattern). The configuration for theaggregation part of the server is still up in the air but it will verylikely be different from the current configuration syntax. This shouldget ironed out in the next 3-6 months. In the mean time, you might takea look at the catalog generator(http://www.unidata.ucar.edu/projects/THREDDS/tech/cataloggen/index.html)and see if that helps any.

I was wondering what practises people might have adopted or foundsuccessful in the past with regards to handling large amounts of data?Have people typically arranged archive data as aggregations, or linkedto archive catalogs from the top-level catalog? What have people foundbest?

For some of our large and/or rapidly changing data collections, we havesetup a data collection subsetting capability. Basically, we have adocument that defines the set of allowed subsetting queries for thatcollection and then a service that responds to those queries generallywith a THREDDS catalog of the requested subset. This is pretty alphastuff and we haven't really advertised it much but we find it useful.Some rough documentation on this is available athttp://www.unidata.ucar.edu/projects/THREDDS/tech/dqc/DqcStatus.html.


Ethan

--
Ethan R. Davis                                Telephone: (303) 497-8155
Software Engineer                             Fax:       (303) 497-8690
UCAR Unidata Program Center                   E-mail:    edavis@xxxxxxxx
P.O. Box 3000
Boulder, CO  80307-3000                       http://www.unidata.ucar.edu/
---------------------------------------------------------------------------

Follow-Ups:
- Re: Dealing with large archives
  - From: Tennessee Leeuwenburg

References:
- Dealing with large archives
  - From: Tennessee Leeuwenburg

2005 messages navigation, sorted by:
1. Thread
2. Subject
3. Author
4. Date
5. ↑ Table Of Contents
Search the thredds archives: