That is a very good question and I left that out in my response.
Long term access for users in archives means we constantly have to work to
fully document, understand, track down any data provenance issues, and
verifying (to a lessor degree), the data. What it says it is- it actually
is. Its just a form of quality assurance for users. Data providers -
especially 'real time' ones don't necessarily concern themselves with these
issues. They make a product- and move on. I'll bet you are fully aware
that the WOC/Gateway does not even provide a complete DTG in the file name
for many NWP products! I used to work w/ John Stackpole (great guy)- the
original developer of Grib. He made grib as a compact communications
protocol- not, as I'll also bet you are also aware, for archives.
NOMADS has about 1+ petabyte to manage for users- we serviced a growing
550TB last year and we need to scale. By aggregating the data most used by
users (common state variables, most popular, etc.) we can allow streaming
of files/records that allows the 50K+ users and ~300 million downloads per
year on NOMADS much better. Methods such as pre-staging/caching most
requested data on disk from tape, etc. etc.
What John is attempting to do will facilitate the access for multiple
users, requesting multiple files using aggregations and other streaming
caching (I don't quite understand the details there). Now- we can't even
ascertain with any degree of confidence what is what- in order to even be
able to aggregate- let alone feel comfortable about the accuracy of the
data we are serving to users.
It does not really help users find data- per se. It will help users have
more confidence that a aggregated monthly mean product from CFSR is mean
for each cycle (0, 6, 12, ..) for individual days of the month (the
diurnals)- rather then a typical monthly mean avg'ed over the entire day.
hope that makes sense. I'm not sure what other impacts this will have for
us here - LAS? our TDS to ESGF capabilities? It's kinda scary, but John's
radical change looks to solve a major archive problem I do know that. We
will run 4.2 and 4.3 in parallel I will tell you that for some time.
Best regards, Glenn
On Tue, Feb 28, 2012 at 2:19 PM, Don Murray <don.murray@xxxxxxxx> wrote:
> Hi Glenn-
> On 2/28/12 11:43 AM, Glenn Rutledge wrote:
>> John and Community-
>> While I do not represent the NCDC Archive, for the NCDC NOMADS systems
>> and our users, I must agree that the changes John is proposing will
>> facilitate the long term use of grib data. While painful to (existing)
>> client (software | decoders), the proposed change will allow our users
>> (with a more scalable way) to -better find and use our data. I'll
>> suggest that if this is adopted, NOMADS servers could provide both 4.2
>> and 4.3 versions to (give software developers time to adapt) allow the
>> client-side to adapt.
> Could you elaborate on how you see that the new variable names will allow
> the users to better find and use your data versus the human readable names?
> For example, if I want to get the 500 hPa heights from a model in your
> archive, how will the new names facilitate that?
> Don Murray
> NOAA/ESRL/PSD and CIRES
Glenn K. Rutledge
NOMADS Team Leader
National Climatic Data Center
Asheville, NC 28801