Hi Valentijn, A few problems here. First off, The scan location in an NcML aggregation only supports local directories; it does not support remote URLs. It does not use the CrawlableDataset framework. At some point we hope to use CrawlableDatasets to implement the scan functionality but that is still a ways off. NcML aggregation can aggregate any kind of dataset the netCDF-java library can read (this includes netCDF, GRIB, and a few other data formats). If each dataset is explicitly listed (in a netcdf element location attribute), they can be local files, OPeNDAP datasets, or HTTP served netCDF. However, the location specified in a scan element must be a local file directory. Second, netcdf/aggregation elements and datasetScan elements cannot be nested. They are handled by seperate pieces of code (NcML for aggregation and TDS for datasetScan) and so don't know how to work together. I would suggest you try aggregating a small number of the remote datasets you are working with and see how that goes. You will have to list them individually in the aggregation like: <netcdf location="http://data.nodc.noaa.gov/cgi-bin/nph-dods/pathfinder/Version5.0/Monthly/1985/198501.s04m1pfv50-sst-16b.hdf " /> Not a great long term solution for a collection this size. Of course, we are still trying to figure out how aggregation will scale on large collections (and we're looking at scaling on local datasets). So, as things stand you probably wouldn't want to aggregate all the data in these collections anyway. Ethan PS I'm going to be out of the office for the next two weeks or so (my wife and I are expecting a baby on Friday). So, John will probably be answering questions for awhile. > Dear Ethan, > > I have read the documentation for aggregation: > http://www.unidata.ucar.edu/software/netcdf/ncml/v2.2/Aggregation.html > but cannot get it to work. Bas and I have spend considerable time over > the last couple of months getting the new remote dataserver > implementation in THREDSS 3.4 to be more intiutive, but to no avail. I > hope you can help us furhter. > > Below is the datasetScan that causes problems. In fact, nothing out of > the netcdf section has any effect. > Our intention with the netcdf section is the following (1-4 can be done > with the a local ncml wrapper which you send us some time ago): > 1. Rename variable sst to sea surface temperature, and its unit from > temp to degC > 2. Rename variable lat to latitude > 3. Rename variable lon to longitude > 4. Rename attribute add_off to add_offset > 5. Aggregate over time. Time is not an existing variable in the dataset, > so we should make a new one. The value of the new time variable is > extracted from the filename > > <datasetScan name="Pathfinder" path="pathfinder" > location="http://data.nodc.noaa.gov/cgi-bin/nph-dods/pathfinder" > <http://data.nodc.noaa.gov/cgi-bin/nph-dods/pathfinder> > ID="pathfinderTest" addDatasetSize="true" addLatest="true"> > <filter> > <include wildcard="*-sst*.hdf"/> > <include wildcard="*qual*.hdf"/> > </filter> > <crawlableDatasetImpl > className="thredds.crawlabledataset.CrawlableDatasetDods" /> > <metadata inherited="true"> > <serviceName>remoteopendap3</serviceName> > </metadata> > <netcdf > xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2" > <http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2> > enhance="true"> > <variable name="sst"> > <attribute name="long_name" type="String" value="sea surface > temperature" /> > <attribute name="units" type="String" value="degC" /> > <attribute name="add_offset" orgName="add_off" /> > <!--attribute name="missing_value" type="short" value="0" /--> > </variable> > <variable name="lat"> > <attribute name="long_name" type="String" value="latitude" /> > <attribute name="units" type="String" value="degrees_north" /> > </variable> > <variable name="lon"> > <attribute name="long_name" type="String" value="longitude" /> > <attribute name="units" type="String" value="degrees_east" /> > </variable> > <dimension name="time" length="0" /> > <variable name="time" type="int" shape="time"> > <attribute name="units" value="secs since 1970-01-01 00:00:00" > /> > <attribute name="_CoordinateAxisType" value="time" /> > </variable> > <aggregation dimName="time" type="JoinNew"> > <variableAgg name="sea surface temperature"/> > <scan location=" > <http://data.nodc.noaa.gov/cgi-bin/nph-dods/pathfinder> > http://data.nodc.noaa.gov/cgi-bin/nph-dods/pathfinder/Version5.0/Monthly > /1985/ > <http://data.nodc.noaa.gov/cgi-bin/nph-dods/pathfinder/Version5.0/Monthl > y/1985/198501.s04m1pfv50-sst-16b.hdf.html> " > <http://data.nodc.noaa.gov/cgi-bin/nph-dods/pathfinder> suffix=".hdf" > dateFormatMark="#yyyyMM" /> > </aggregation> > </netcdf> > </datasetScan> > > > Q1. Does datasetScan and crawlableDataset allow for aggregating > over time, and updating/enhancing existing variables? If not, can you > make this work in the forseeable future (and provide us with an example > catalog)? > > I also found a support request that covers aggregation in some > detail, but the example i copied below doesn't work either. I also > observe that CrawlableDatasetDods is not used in this example, yet > dirLocations are used to store URL's: > http://www.unidata.ucar.edu/support/help/MailArchives/netcdf/msg03368.ht > ml > > From this i quote: > > > I have enclosed a full xml-file, but here is how it looks now: > ... > <dataset name="MERSEA CLASS 1 Aggregated files"> > <service name="this" serviceType="OpenDAP" base="" /> > <service name="TOPAZ" serviceType="OpenDAP" > base="/thredds/dodsC/"> > <datasetRoot path="topaz" > > dirLocation="http://nerscweb.bccs.uib.no/nersc/nph-dods/mersea-ip/nat/me > r > sea-class1/" /> </service> > <metadata inherited="true"> > <serviceName>this</serviceName> > <dataType>Grid</dataType> > </metadata> > <dataset name="Best estimate - Atlantic" > ID="mersea-ip-topaz-class1-nat-be" > urlPath="topaz/mersea-ip-topaz-class1-nat-be"> > <serviceName>TOPAZ</serviceName> > <netcdf > xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2" > <http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2%22> ;> > <dimension name="time" length="0" /> > <variable name="time" type="int" shape="time"> > <attribute name="units" value="secs since 1970-01-01 > 00:00:00" /> > <attribute name="_CoordinateAxisType" value="time" /> > </variable> > <aggregation dimName="time" type="JoinNew"> > <netcdf > > location="http://nerscweb.bccs.uib.no/nersc/nph-dods/mersea-ip/nat/merse > a > > -class1//topaz_V2_mersea_nat_grid1to8_da_class1_b20050706_f200506299999. > nc > " coordValue="1120003200" /> > ... > > In the above exmaple, the referenced remote server serves netCDF > datafiles that do not have the coordinate dimension "Time", but the > aggregation adds this dimension (probably based on the filename). > > > Q2. How would you add "Extracting date coordinates from the > filename (joinNew)" to this catalog config? > > > Cheers, valentijn Ticket Details =================== Ticket ID: ETD-820941 Department: Support THREDDS Priority: Normal Status: Open
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata inquiry tracking system and then made publicly available through the web. If you do not want to have your interactions made available in this way, you must let us know in each email you send to us.