Unidata’s TDS Keeps Humans in the Loop for Metadata Quality Assurance
NSF Unidata’s THREDDS Data Server (TDS) provides students, educators, and researchers with coherent access to a large collection of real-time and archived datasets from a variety of environmental data sources. Since the project’s inception in the early part of the 21st century, the quantity and variety of data available to users of the TDS has grown significantly, advancing users’ ability to access and analyze observational data and numerical model output. But with growth comes growing pains: not all datasets include high-quality metadata.
Metadata, or data about data, allows for the inclusion of important reference information about the numeric quantities provided by scientific data servers like the TDS. As an example, an observational data record might include a variable wv with a value of 16.3. Without associated metadata the meaning of this variable is open to many interpretations, but with the following XML metadata record the meaning becomes clear:
<variable name="wv" type="float">
<attribute name="long_name" value="Wind Speed" />
<attribute name="units" value="m/s" />
</variable>
But how to add modern metadata records to datasets with obscure, outdated, or missing documentation? While some data providers are looking to technologies like Large Language Models to augment metadata, the NSF Unidata TDS team is taking a different approach.
“For the THREDDS Data Server, we want to make sure we keep humans in the loop,” says TDS lead developer Sean Arms. “Machine learning and AI chatbots are great and all, but everyone knows they’re prone to making stuff up. We can’t take chances like that with our metadata.”
As part of a pilot project, NSF Unidata is enlisting the aid of hundreds of undergraduate Earth Systems Science students across the country to verify and update the metadata associated with datasets served by all publicly accessible THREDDS Data Servers running at institutions across the world. Students manually inspect datasets and use their budding scientific skills to clarify unhelpful metadata records. For example, by inspecting a dataset with an undocumented variable t, undergrads in the program were able to build out a working metadata record, complete with indications of uncertainty and making use of new (soon to be proposed) CF attributes:
<variable name="t" type="isaNumber">
<attribute name="likely_name" value="Probably Temperature" />
<attribute name="wag_details" value="Temperature (outdoor)" />
<attribute name="units" value="degree (maybe Kelvin?)" />
</variable>
“We like to think of this initiative as harkening back to the early days of numerical weather forecasting, when ‘computers’ were people in a room using slide rules to calculate predicted future values based on real-time weather information,” says Arms.
NSF Unidata Director Mohan Ramamurthy sees an added value to the TDS team’s choice to involve human undergraduates in the metadata refinement process. “Supporting undergraduate education has always been an important priority for Unidata,” he says. “Here, the students become deeply involved with important datasets used around the world. And in addition, the program provides a unique mechanism for the National Science Foundation to support undergraduates by providing them with Work Study jobs related to their field of study.”
Arms chimes in with another area where the TDS project is making an impact. “The environmental costs of employing students are far lower than running a datacenter to power an LLM to do this work,” he says. Where a datacenter might require millions of gallons of water for cooling, “a few pizzas and a couple of cases of beer go a long way with science undergraduates.”
Assuming the pilot project is as successful as it already appears to be, NSF Unidata Program Center staff have already been brainstorming other ways to involve students in Program offerings. One promising example: employ graphic design students to optimize color tables used in campus “weather wall” display rooms, helping to ensure students stay awake during early morning weather briefings.
Data Hallway Update
In the wake of the National Science Foundation’s decision to find a new operator for the NCAR Wyoming Supercomputing Center, many in the community have wondered whether NSF had similar plans for the Unidata Data Hallway. Long the hallmark of quality data services, the Unidata Data Hallway provides modern, efficient data services to the university community.
“As of now, we’ve heard nothing to indicate that NSF is searching for a new operator for the Data Hallway,” says NSF Unidata systems administrator Mike Schmidt. “Or even,” he adds, “that they’re aware of its existence.”
In-person applications for TDS metadata verifiers will be accepted between 8:00-8:15am on April Fool’s Day, 2026. All applicants will receive a complimentary tour of the Unidata Data Hallway.


Add new comment