[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: GRIB Aggregation



Hi dan:

hundreds of thousands of files will be an interesting test of the TDS (maybe 
too interesting, like it may not work!).

in principle we can aggregate GRID files just fine, in practice there are some 
problems. Current aggregation assumes that the files are completely 
homogeneous: each has exactly the same variables and coordinate systems. in 
practice GRIB files often have missing records which screws things up.

we are working on this problem in a new "forecast model run" aggregation, that 
will tolerate missing records. We hope to have something to try by end of July. How 
homogeneous do you think your archive is?

with such a large number of files, you probably dont want to use a scan element 
(too slow). Maybe best that you explicitly list all the files in the 
aggregation (the catalog would just point to the ncml document). How often are 
files added to the archive?

for the random example i looked at 
(http://nomads.ncdc.noaa.gov/data/narr/200605/20060520/narr-a_221_20060520_0900_000.grb)

it looks like each file has only one time coordinate (?)

the CDM already sees the time dimension, so you would use a joinExisting, and you dont need to list the variables (all the ones with time will be used).

    <netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2";>
     <aggregation dimName="time" type="joinExisting">
       <netcdf location="file://test/temperature/jan.nc" ncoords="1"/>

another issue is that it looks like your time coordinate unit is changing from file to file.

let me run a few tests to see how this works here.


dan.swank wrote:
Ethan:

The NARR is a reanalysis, so it don't have forecast times.  I would be a
simple 03 hr chain (00 hr fct time) spanning 26 years.

See an existing GDS subset aggregation:
http://nomads.ncdc.noaa.gov:9091/dods/NCEP_NARR_DAILY/narr-a_221_tmpprs.subset.info
This will give a sense for the nature of the beast.

The directory structure is set up as such:
http://nomads.ncdc.noaa.gov/data/narr/


Heres the TDS aggregation I set up while experimenting yesterday, on a non-related dataset:

  <dataset name="OceanWinds Test Daily Aggregation"
      ID="test/dailyagg" urlPath="test/agg">
    <serviceName>allTest</serviceName>
    <netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2";>
     <aggregation dimName="time" type="joinNew">
       <variableAgg name="wind" />
       <scan dateFormatMark="#yyyyMMdd"
location="/eclipse1a/ftp/pub/seawinds/SI/daily/netcdf/1980s/"
suffix=".nc" />
       <scan dateFormatMark="#yyyyMMdd"
location="/eclipse1a/ftp/pub/seawinds/SI/daily/netcdf/1990s/"
suffix=".nc" />
       <scan dateFormatMark="#yyyyMMdd"
location="/eclipse1a/ftp/pub/seawinds/SI/daily/netcdf/2000s/"
suffix=".nc" />
     </aggregation>
     <variable name="time" orgName="time">
       <attribute name="long_name" value="Days"/>
       <attribute name="units" value="days since 1987-07-09" />
     </variable>
    </netcdf>
  </dataset>


Would this automatically detect the source of data were GRIB rather than NetCDF? and it seems like you need to set the <scan> on each individual directory... Doing so the way NARR is set up would create one chunky configuration file. Is there anyway to have this scan a pattern (YYYYMM/YYYYMMDD) of directories?

I understand GRIB requires a certain amount of "supplemented" metadata
for complience.  Where do you enter this?

-Dan


Ethan Davis wrote the following on 6/13/2006 1:21 PM:

Hi Dan,

Aggregation should work the same for GRIB as for netCDF files. The issue
would be how your GRIB files are structured and how you want to
aggregate them. Our GRIB files each contain one full model run (all
parameters, all forecast times). We haven't tried aggregating beyond that.

We have started tracking what is available for the NCEP models on our
server. This is from the TDS 3.8 announcement (with links updated):

We also are now tracking detailed inventory of NCEP model output, eg:
http://motherlode.ucar.edu:8080/thredds/modelInventory/model/NCEP/NAM/CONUS_12km/


These are all linked from the "collection dataset" pages; For
example from
http://motherlode.ucar.edu:8080/thredds/catalog/model/NCEP/NAM/CONUS_12km/catalog.html


  choose the top "CONUS_12_km" link, then choose "Available Inventory"
  Documentation.

One idea for this work is to eventually provide access to alternate
datasets, for instance, a dataset that contains all the 3hr forecast
times from the different runs, or one that contained all the 12Z valid
times from the different runs. Tracking these detailed inventories is
just the first step but aggregation and alternate groupings of the data
is pretty interesting to think about.

How are your GRIB files structured and what kind of aggregation where
you thinking about?

Ethan

dan.swank wrote:


Hello,

I've been tinkering with the TDS aggregation capabilities and they work
quite well for NetCDF data, however, I can't seem to find anything in
the docs regarding aggregating GRIB.
We want to get The NARR dataset which we have here at NCDC-NOMADS on the
TDS.  It consists of hundreds of thousands of 50 Mb + GRIB files in a
YYYYMM/YYYYMMDD tree.
 Just scouting for a quick answer here:
Is aggregating the NARR GRIB currently feasable with the current release
of TDS?  If so, do any docs exist which could give me a starting point?
Converting it to NetCDF will not be possible (volume).






NOTE: All email exchanges with Unidata User Support are recorded in the Unidata inquiry tracking system and then made publicly available through the web. If you do not want to have your interactions made available in this way, you must let us know in each email you send to us.