Re: Dealing with large archives

To: James Gallagher <jgallagher@xxxxxxxxxxx>
Subject: Re: Dealing with large archives
From: "Glenn Rutledge" <Glenn.Rutledge@xxxxxxxx>
Date: Thu, 03 Feb 2005 10:48:44 -0500

Tennesse,

At NOAA's National Climatic Data Center, our NOMADS project (urlbelow), deals with Tb of data. We use the GrADS Data Server (andOPeNDAP) and wgrib and other routines to organize and index our data byday (hr/fsct proj) and by model as they come in (about 150k grids/day)since we archive all of them. You may contact Dan Swank here at NCDC todiscuss some specifics. He is cc:ed. Feel free to navigate the sitefor some additional info regarding our organization... Glenn

http://nomads.ncdc.noaa.gov/data-access.html

James Gallagher wrote:

Tennessee,

Did you ever get a reply (besides this one :-)?

James

On Jan 31, 2005, at 6:01 PM, Tennessee Leeuwenburg wrote:
Hi guys,

Firstly :
I have "solved" the problem with the bad characters. The problem isthat the NetCDF reader that thredds uses makes use itself of the"urlPath" specification when coming back with the DDS and DAS. Assuch, if use the "=" character (among others) in the urlPath (even ifit's in the path rather than the simple filename), it gets insertedinto the DDS/DAS by the NetCDF reader, which causes errors down thetrack in the parser.
I have worked around the problem by having a separate internalServicefor each dataset. The "base" section can contain the illegalcharacters without polluting the DDS/DAS of files read by the NetCDFreader. For the moment this is fine, but is less than ideal. I mayreturn to it after dealing with more pressing issues. In future Iwill look at encoding the illegal characters as escaped strings orencoded in some way, but it's tricky to be sure that you've coveredall of the cases when thinking about those techniques.
Maybe once everything goes XML the problem will simply disappear, andI can just wait it out :)
Secondly :
I am trying to work out how to structure my data by date. I will havea number of data sets (NWP Models) which will get updated daily, oreven multiple times per day. Quite quickly I will reach the pointwhere I will have hundreds of data sets published. Even a week'sworth of data at 2 per day across 3 sources is 42 data sets.
I have two tasks - one would be to automate the updating of theconfiguration files so that new data sets get incorporated as theybecome available, and the other would be structuring the data pagesin a sensible way for users to access.
I was wondering what practises people might have adopted or foundsuccessful in the past with regards to handling large amounts ofdata? Have people typically arranged archive data as aggregations, orlinked to archive catalogs from the top-level catalog? What havepeople found best?
Cheers,
-Tennessee
--
James Gallagher                jgallagher at opendap.org
OPeNDAP, Inc                   406.783.8663


--
Glenn K. Rutledge
Meteorologist / Physical Scientist
National Oceanic and Atmospheric Administration
National Climatic Data Center
151 Patton Ave
Asheville, North Carolina 28801
(828) 271-4097

References:
- Dealing with large archives
  - From: Tennessee Leeuwenburg
- Re: Dealing with large archives
  - From: James Gallagher

2005 messages navigation, sorted by:
1. Thread
2. Subject
3. Author
4. Date
5. ↑ Table Of Contents
Search the thredds archives: