Articles tagged: Data

Mar 4, 2024
Fred Rogers

K Nearest Neighbors (KNN) is a supervised machine learning method that 'memorizes' (stores) an entire dataset, then relies on the concepts of proximity and similarity to make predictions about new data. The basic idea is that if a new data point is in some sense 'close' to existing data points, its value is likely to be similar to the values of its neighbors. In the Earth Systems Sciences, such techniques can be useful for small- to moderate-scale classification and regression problems.

Feb 12, 2024

Your idea of what's entailed in setting up a supervised Machine Learning (ML) project as an Earth Systems scientist is probably not as fanciful as what an image generation algorithm came up with. But there are many little decisions ML practitioners make along the way when starting an Earth Systems Science (ESS) ML project. This article provides some tips and ideas to consider as you're getting started. These tips are not in any particular order, and like all things related to ML projects they depend on the specific types of data and project goals.

Dec 20, 2023
Datasaurus plot

Regression analysis is a fundamental concept in the field of machine learning (ML), in that it helps establish relationships among the variables by estimating how one variable affects the other.

The coefficient of determination, R2 (pronounced “R squared”), is a measure that provides information about how well the regression line suggested by a numerical model approximates the actual data (often referred to as “goodness of fit”).

Dec 8, 2023
Representation of Self Organizing Map
Representation of nodes in a Self Organizing Map.

A self-organizing map (SOM), sometimes known as a Kohonen map after its originator the Finnish professor Teuvo Kohonen, is an unsupervised machine learning technique used to produce a low-dimensional representation of a higher dimensional data set. SOMs are a specific type of artificial neural network, but use a different training strategy compared to more traditional artificial neural networks (ANNs). SOMs can be used for clustering, dimensionality reduction, feature extraction, and classification — all of which suggest that they can be important tools for understanding large Earth Systems Science (ESS) datasets.

Nov 28, 2023
NSF logo

The National Science Foundation (NSF) is seeking public input from the science and engineering research and education community on implementing the NSF Public Access Plan 2.0.

The Public Access Plan 2.0 is an update to NSF current public access requirements in response to recent White House Office of Science and Technology Policy guidance; among other things, it addresses potential equity impacts of public access requirements.

Feb 8, 2023
COSMIC logo

The Unidata Program Center is partnering with UCAR's COSMIC program to provide radio occultation data provided by Spire Global. The products described below are now available via the Internet Data Distribution (IDD) network. Data are on the EXP feed with a typical total volume of 80-110 MB per hour.

Apr 1, 2022
Hurricane Katrina NFT

Everyone loves to talk about the weather. But until now, serious collectors of weather memorabilia have been left on the sidelines. Oh, a lucky few manage to save enormous hailstones in their freezers, but most are limited to screen shots of satellite or radar imagery, or maybe articles clipped from the local newspaper.

But never fear: Unidata is preparing to bring weather collectibles into the twenty-first century by minting a series of Non Fungible Tokens (NFTs) based on significant weather events. Our inaugural series will consist of 902 distinct NFTs of Hurricane Katrina, one for each millibar of the storm's lowest recorded atmospheric pressure.

Jul 6, 2021
SSEC

The Unidata program and the University of Wisconsin–Madison's Space Science and Engineering Center (SSEC) have a long history of collaboration and cooperation to serve the needs of Unidata community members. The SSEC Satellite Data Services(SDS) group, which provides access to and distribution of real-time and archive weather satellite data, makes limited amounts of archive satellite data available to Unidata's academic community members at no cost via the “Multi-format Client-agnostic File Extraction Through Contextual HTTP” (MCFETCH) system.

May 20, 2021
CODATA logo

The Committee on Data (CODATA) of the Paris-based International Science Council promotes open data policies, working to advance the interoperability and usability of research data. The Committee is committed to supporting FAIR data principles to improve the Findability, Accessibility, Interoperability, and Reuse of digital assets.

Within the CODATA organizational umbrella, Unidata software developer Steven Emmerson has joined the Digital Representation of Units of Measure (DRUM) Task Group, which aims to raise the profile of the digital representation of units of measure in research communities, representative and governing bodies, and with funders. DRUM takes the position that support for consistent digital representations of units of measurement is of far-reaching importance for science, technology, industry, and trade.

Apr 1, 2021
Ever Given in Suez Canal
A container ship blocking the Suez Canal

For 6 days, 3 hours, and 38 minutes in late March, the Golden-class container ship Ever Given blocked the Suez canal, leaving more than 400 vessels piled up on either end of the canal as they waited for the stranded container ship to be refloated. While media coverage of the incident has focused on potential shortages of goods like petroleum, food, and bathroom tissue, little attention was paid to the potential for worldwide data shortages as a result of the reduction in shipping capacity.