Due to the current gap in continued funding from the U.S. National Science Foundation (NSF), the NSF Unidata Program Center has temporarily paused most operations. See NSF Unidata Pause in Most Operations for details.
I find it amazing that things work on that large of an NcML or even FMRC collection. Just goes to show what I know. Anyway, Im about to embark on studying where the bottlenecks are. The code isnt so much poorly written, as it simply wasnt designed with high scaleability in mind. The solution is to write persistent "index" files so that, once indexed, the logical "collection datasets" can be very quickly accessed. Im going to take what I have been doing in GRIB and apply it to netCDF, and GRID data in general. An NcML aggregation like a joinExisting may be specified inside the catalog config or outside in a separate NcML file and referenced in a dataset or datasetScan. In both cases, nothing is done until it is requested by a user. At that point, if the dataset has already been constructed and is in the TDS cache, and doesnt need updating, then its fast. A featureCollection has a new set of functionality to update the dataset in the background. FMRC does some extra "persistent caching" (make some of the info persist between TDS restarts). Still not enough, but better than NcML. GRIB collections now do this well. However if the collection is changing, a seperate process (TDM) will handle updating and notifying the TDS. That keeps the code from getting too complex and greatly simplifies getting the object caching right. Read-optimized netcdf-4 files are an elegant solution indeed. Dave, maybe sometime you could share your workflow in some place we could link to in our documentation? On Sat, Mar 14, 2015 at 10:47 AM, Signell, Richard <rsignell@xxxxxxxx> wrote: > John, > > > NcML Aggregations should only be used for small collections of files ( a > few > > dozen?) , because they are created on-the-fly. > > The HFRADAR data is using a joinExisting aggregation in a THREDDS > catalog. Is that what you are calling NcML aggregation? > I was thinking that NcML aggregation referred to the practice of > writing an NcML file and dropping that into a folder along with the > data files where it can be picked up by a DatasetScan. > > > FMRC does a better job of > > caching information so things go quicker. It handles the case of a single > > time dimension as a special case of a Forecast model collection. However, > > they too are limited in how much they will scale up, (< 100 ?) > > > > So how many files and variables are in the HF Radar collection? > > There are currently 27,986 NetCDF files in the aggregation, each with > a single time record containing the HF radar data for the hour. It > seems that the FMRC is handling this just fine, with reliable WMS > response times of about one second. > > As Dave Blodgett points out, a better approach here might be to > periodically combine a bunch of these hourly files into, say, monthly > files, which would result in higher performance, less utilization of > disk space, and quicker aggregation. > > I still don't understand what is happening with the joinExisting > aggregation, however -- why it periodically (but not regularly) takes > 50 seconds or more to respond. > > -- > Dr. Richard P. Signell (508) 457-2229 > USGS, 384 Woods Hole Rd. > Woods Hole, MA 02543-1598 >
thredds
archives: