At NSF Unidata, we have successfully implemented and re-used weights from several global AI-NWP (Artificial Intelligence-Numerical Weather Prediction) models (FourCastNet, Pangu) using the NVIDIA earth2mip package. We can confirm that these models are open source and can be reused on high-end, but increasingly standard, HPC hardware. While traditional numerical weather prediction requires massive supercomputing resources, these AI models can potentially deliver similar or better results using standard GPU hardware for inference.
The StatQuest Illustrated Guide to Neural Networks and AI: With hands-on examples in PyTorch!!! strikes an excellent balance between accessibility and technical depth. Josh Starmer, PhD, builds on his previous work while making neural networks approachable for both students and practitioners. This book has a similar feel and vibe to the previous book, The StatQuest guide to Machine Learning.
The Emerging Pedagogies Summit is an annual event hosted by the Learning Innovation and Lifetime Education (LILE) group at Duke University, and I, Nicole Corbin, instructional designer at NSF Unidata, had the pleasure of attending. This year's event was packed with thoughtfully curated topics relevant to the NSF Unidata higher education community, including AI and workforce development.
At NSF Unidata, we have been supporting and developing netCDF standards and packages since the original release of netCDF in 1990. We strongly believe in the usefulness of netCDF Common Data Model for Earth Systems Science data, and for other types of data! NetCDF files can be used efficiently in machine learning modeling applications and can be used as a virtual Zarr datasets.
NSF Unidata has been urged by our community to investigate options to allow netCDF to work more easily with modern cloud-based infrastructure. Based on the strong interest and rapid adoption of Zarr by the community, the netCDF team decided to begin working with the Zarr community to ensure that these two widely used data storage mechanisms can interoperate if necessary.
Convolutional Neural Networks (CNNs) are a powerful class of deep learning models widely applied in Earth science for image analysis, classification, and regression problems. Leveraging the Keras framework in python, CNNs can efficiently process and extract spatial features from 2D and 3D remote sensing, model output, and other Earth Systems Science (ESS) data types.
The Keras package is an open-source library that provides a Python interface for deep learning. Keras is intended to be a user-friendly, modular, and extensible way to enable fast experimentation with deep neural networks. With Keras version 3, the package provides APIs for using three backends: TensorFlow, Jax, and PyTorch.
K Nearest Neighbors (KNN) is a supervised machine learning method that 'memorizes' (stores) an entire dataset, then relies on the concepts of proximity and similarity to make predictions about new data. The basic idea is that if a new data point is in some sense 'close' to existing data points, its value is likely to be similar to the values of its neighbors. In the Earth Systems Sciences, such techniques can be useful for small- to moderate-scale classification and regression problems.
Your idea of what's entailed in setting up a supervised Machine Learning (ML) project as an Earth Systems scientist is probably not as fanciful as what an image generation algorithm came up with. But there are many little decisions ML practitioners make along the way when starting an Earth Systems Science (ESS) ML project. This article provides some tips and ideas to consider as you're getting started. These tips are not in any particular order, and like all things related to ML projects they depend on the specific types of data and project goals.
Regression analysis is a fundamental concept in the field of machine learning (ML), in that it helps establish relationships among the variables by estimating how one variable affects the other.
The coefficient of determination, R2 (pronounced “R squared”), is a measure that provides information about how well the regression line suggested by a numerical model approximates the actual data (often referred to as “goodness of fit”).
A self-organizing map (SOM), sometimes known as a Kohonen map after its originator the Finnish professor Teuvo Kohonen, is an unsupervised machine learning technique used to produce a low-dimensional representation of a higher dimensional data set. SOMs are a specific type of artificial neural network, but use a different training strategy compared to more traditional artificial neural networks (ANNs). SOMs can be used for clustering, dimensionality reduction, feature extraction, and classification — all of which suggest that they can be important tools for understanding large Earth Systems Science (ESS) datasets.