[thredds] Comments regarding GribCollection Best dataset

To: THREDDS community <thredds@xxxxxxxxxxxxxxxx>
Subject: [thredds] Comments regarding GribCollection Best dataset
From: Ryan May <rmay@xxxxxxxx>
Date: Wed, 11 Feb 2015 11:24:51 -0700

Greetings!

The recent upgrades by NCEP to the time range of the operational GFS half
degree output has raised an issue with the Best timeseries dataset on Grib
Collections in TDS. Best is designed to take all the forecast times
available in the Grib Collection and, for each one, use the GRIB record
that has the smallest forecast offset (i.e. the forecast closest to the
start of its model run).

The problem occurs where, for example, NCEP puts out GFS runs every 6
hours, but at the later parts of the forecast, the time interval within a
single run's output is 12 hours. The result is that even with no missing
data, eventually the Best time series has output every 6 hours by
interleaving two different forecasts. Given that forecasts this far into
the future can vary widely from run to run, this can yield very surprising
and confusing results. Note that this issue is not limited to the GFS, but
for any model collection where time steps in the output become larger than
the time between individual model runs.

There is also a second issue regarding a model run having multiple fields
with a 0 hour forecast offset. It turns out that (at least ideally) NCEP
will put out one version of the 0 hour forecast with the analysis flag set,
denoting the field as the one to use to initialize the model. Then a second
field is put out as 0 hour forecast (without the analysis flag) that
actually corresponds to a time after a single model integration step. This
non-analysis 0 hour forecast contains the full collection of parameters
that are available throughout the forecast, whereas the analysis only
contains the fields that result from data assimilation. Currently, the Best
time series will contain whichever version of the 0 hour forecast that
comes first in the file.

To address the first issue and make the data easier to use, we propose to
modify Best such that when a set of forecast runs is combined, the
forecasts included from a run will be consecutive in time. This implies
that any missing forecasts within a run will not be filled by a previous
run; however, an older forecast can be used to fill in gaps when a
collection is missing entire run(s). This behavior ensures that the Best
time series will no longer alternate repeatedly between different forecast
solutions, but rather only jump once from one set of forecasts (model run)
to another.

To address the second issue, we propose an additional modification to Best
such that only 0 hour Forecasts are used to make up Best, and not Analysis
fields. Additionally, we propose to add two more virtual datasets: Analysis
and Complete. Analysis will contain the collection of all analysis fields
from the grib collection. Complete behaves just as Best currently does,
containing every possible time, using the smallest forecast offset. In this
case, the forecasts included from a run will not be guaranteed to be be
consecutive in time, as they could be mixed between various runs as long as
the smallest forecast offset condition is met.

We are seeking comments on these solutions to see if they represent a
useful set of features given the problems outlined above.

Thanks!

The TDS development team

Follow-Ups:
- Re: [thredds] Comments regarding GribCollection Best dataset
  - From: Don Murray (NOAA Affiliate)

2015 messages navigation, sorted by:
1. Thread
2. Subject
3. Author
4. Date
5. ↑ Table Of Contents
Search the thredds archives: