Re: GRIB Aggregation
hundreds of thousands of files will be an interesting test of the TDS (maybe
too interesting, like it may not work!).
in principle we can aggregate GRID files just fine, in practice there are some
problems. Current aggregation assumes that the files are completely
homogeneous: each has exactly the same variables and coordinate systems. in
practice GRIB files often have missing records which screws things up.
we are working on this problem in a new "forecast model run" aggregation, that
will tolerate missing records. We hope to have something to try by end of July. How
homogeneous do you think your archive is?
with such a large number of files, you probably dont want to use a scan element
(too slow). Maybe best that you explicitly list all the files in the
aggregation (the catalog would just point to the ncml document). How often are
files added to the archive?
for the random example i looked at
it looks like each file has only one time coordinate (?)
the CDM already sees the time dimension, so you would use a joinExisting, and you dont need to list the variables (all the ones with time will be used).
<aggregation dimName="time" type="joinExisting">
<netcdf location="file://test/temperature/jan.nc" ncoords="1"/>
another issue is that it looks like your time coordinate unit is changing from file to file.
let me run a few tests to see how this works here.
The NARR is a reanalysis, so it don't have forecast times. I would be a
simple 03 hr chain (00 hr fct time) spanning 26 years.
See an existing GDS subset aggregation:
This will give a sense for the nature of the beast.
The directory structure is set up as such:
Heres the TDS aggregation I set up while experimenting yesterday, on a
<dataset name="OceanWinds Test Daily Aggregation"
<aggregation dimName="time" type="joinNew">
<variableAgg name="wind" />
<variable name="time" orgName="time">
<attribute name="long_name" value="Days"/>
<attribute name="units" value="days since 1987-07-09" />
Would this automatically detect the source of data were GRIB rather than
NetCDF? and it seems like you need to set the <scan> on each individual
directory... Doing so the way NARR is set up would create one chunky
configuration file. Is there anyway to have this scan a pattern
(YYYYMM/YYYYMMDD) of directories?
I understand GRIB requires a certain amount of "supplemented" metadata
for complience. Where do you enter this?
Ethan Davis wrote the following on 6/13/2006 1:21 PM:
Aggregation should work the same for GRIB as for netCDF files. The issue
would be how your GRIB files are structured and how you want to
aggregate them. Our GRIB files each contain one full model run (all
parameters, all forecast times). We haven't tried aggregating beyond that.
We have started tracking what is available for the NCEP models on our
server. This is from the TDS 3.8 announcement (with links updated):
We also are now tracking detailed inventory of NCEP model output, eg:
These are all linked from the "collection dataset" pages; For
choose the top "CONUS_12_km" link, then choose "Available Inventory"
One idea for this work is to eventually provide access to alternate
datasets, for instance, a dataset that contains all the 3hr forecast
times from the different runs, or one that contained all the 12Z valid
times from the different runs. Tracking these detailed inventories is
just the first step but aggregation and alternate groupings of the data
is pretty interesting to think about.
How are your GRIB files structured and what kind of aggregation where
you thinking about?
I've been tinkering with the TDS aggregation capabilities and they work
quite well for NetCDF data, however, I can't seem to find anything in
the docs regarding aggregating GRIB.
We want to get The NARR dataset which we have here at NCDC-NOMADS on the
TDS. It consists of hundreds of thousands of 50 Mb + GRIB files in a
Just scouting for a quick answer here:
Is aggregating the NARR GRIB currently feasable with the current release
of TDS? If so, do any docs exist which could give me a starting point?
Converting it to NetCDF will not be possible (volume).
NOTE: All email exchanges with Unidata User Support are recorded in the
Unidata inquiry tracking system and then made publicly available
through the web. If you do not want to have your interactions made
available in this way, you must let us know in each email you send to us.