News@UnidataUnidata newshttps://www.unidata.ucar.edu/blogs/news/feed/entries/atom2024-03-06T11:18:50-07:00Apache Rollerhttps://www.unidata.ucar.edu/blogs/news/entry/k-nearest-neighborsK Nearest NeighborsUnidata News2024-03-04T08:24:00-07:002024-03-04T08:24:00-07:00<div class="img_l" style="width: 150px;">
<img width="150" src="/blog_content/images/2024/20240219_ml_neighbor.png" alt="Fred Rogers" />
</div>
<p>
<strong>K Nearest Neighbors</strong> (KNN) is a supervised machine learning method that
"memorizes" (stores) an entire dataset, then relies on the concepts of proximity and
similarity to make predictions about new data. The basic idea is that if a new data
point is in some sense "close" to existing data points, its value is likely to
be similar to the values of its neighbors. In the Earth Systems Sciences, such
techniques can be useful for small- to moderate-scale classification and regression
problems.
</p>
<p class="byline">
By Thomas Martin, AI/ML Software Engineer
</p>
<div class="img_l" style="width: 150px;">
<a class="lightbox" title="Fred Rogers, famous for asking people to be his neighbor (image: Wikipedia)" href="/blog_content/images/2024/20240219_ml_neighbor.png">
<img width="150" src="/blog_content/images/2024/20240219_ml_neighbor.png" alt="Fred Rogers" />
</a>
<div class="caption">
Fred Rogers, famous for asking people to be his neighbor<br>(Click to enlarge)
</div>
<p></div></p>
<p>
<strong>K Nearest Neighbors</strong> (KNN) is a supervised machine learning method that
"memorizes" (stores) an entire dataset, then relies on the concepts of proximity and
similarity to make predictions about new data. The basic idea is that if a new data
point is in some sense "close" to existing data points, its value is likely to
be similar to the values of its neighbors. In the Earth Systems Sciences, such
techniques can be useful for small- to moderate-scale classification and regression
problems; one <a href="https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2004WR003444">example</a>
uses KNN techniques to derive local-scale information about
precipitation and temperature from regional- or global-scale numerical weather
prediction model output.
</p>
<p>
When using a KNN algorithm, you select the number of "neighbors" to consider
(<strong>K</strong>), and potentially a way of calculating the "distance" between
data points. KNN algorithms can be used for both classification and regression
problems. For regression problems, KNN predicts the target variable by using an
averaging scheme. For classification problems it takes the <em>mode</em> of the
nearest neighbors; as a result, it is generally recommended that the value of
<strong>K</strong> be an odd number. Effective use of KNN often requires some
experimentation to determine the best value for <strong>K</strong>.
</p>
<div class="img_l" style="width: 500px;float:none;display:block;margin:auto;">
<a class="lightbox" title="Comparing the decision boundary between using 1 neighbor vs 20, from <a href='https://kevinzakka.github.io/2016/07/13/k-nearest-neighbor/'>Kevin Zakka’s blog</a>." href="/blog_content/images/2024/20240219_ml_k_neighbors.png">
<img width="500" src="/blog_content/images/2024/20240219_ml_k_neighbors.png" alt="Comparing decision boundary" />
</a>
<div class="caption">
Comparing the decision boundary between using 1 neighbor vs 20, from <a href='https://kevinzakka.github.io/2016/07/13/k-nearest-neighbor/'>Kevin Zakka’s blog</a>.
</div>
<p></div></p>
<p>
KNN is sometimes called a "lazy learning" method. This is because it does not
generate a new explicit model, but rather memorizes the dataset in its entirety.
While the scikit-learn API uses a <code>.fit()</code> method, this is largely to
match the rest of the scikit-learn API.
</p>
<h3>Why you might use KNN for your ML project</h3>
<ol>
<li>It's simple. Because KNN is a lazy learner, there is no complex model and only
limited math is needed to understand the inner workings.
</li>
<li>It's adaptable to different data distributions. KNN works well with odd
distributions of data.
</li>
<li>It's good for smaller datasets. Because no model is being constructed, KNNs can
be a good choice for smaller datasets.
</li>
</ol>
<h3>Some Downsides to KNN</h3>
<ol>
<li>
It's sensitive to outliers and poor feature selection. KNN does not do any
automatic feature selection like decision tree models. These types of models
can struggle in high dimensional space, both with a large number of input
features and outliers within those features.
</li>
<li>
It has a relatively high computational cost. While the analog/sample matching
behavior of KNNs are great from an explainability point of view
(model-free ML is great!), for large datasets the cost of memorizing the entire
dataset can be enormous.
</li>
<li>
It needs a complete dataset. Like many other ML models, KNNs do not handle
missing data or NaN (Not a Number) values. If your dataset is not complete, you'll
need to impute the missing values before using a KNN.
</li>
</ol>
<p>
KNNs have been discussed previously on MetPy Mondays here:
<a href="https://www.youtube.com/watch?v=Z08TSSVWcAM">MetPy Mondays #183 - Predicting Rain with Machine Learning - Using KNN</a>
</p>
<p>
KNNs are a great supervised ML model to try out if your dataset is on the smaller
side. Happy modeling! What ML model should I cover in an upcoming blog?
</p>
<h3>More reading and resources</h3>
<ul>
<li><a href="https://github.com/NCAR/ML_workshop2023/blob/main/tutorials/Day2_lesson1_supervised_knn_tree.ipynb">A short notebook that uses KNNs</a></li>
<li><a href="https://scikit-learn.org/stable/modules/neighbors.html">Scikit Learn</a></li>
<li><a href="https://arxiv.org/pdf/1708.04321.pdf">Effects of Distance Measure Choice on KNN Classifier Performance</a></li>
<li><a href="https://neptune.ai/blog/knn-algorithm-explanation-opportunities-limitations">The KNN Algorithm - Explanation, Opportunities, Limitations</a> </li>
<li><a href="https://kevinzakka.github.io/2016/07/13/k-nearest-neighbor/">A Complete Guide to K-Nearest-Neighbors with Applications in Python and R</a> </li>
<li><a href="https://scott.fortmann-roe.com/docs/BiasVariance.html">Understanding the Bias-Variance Tradeoff</a> </li>
<li><a href="https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2004WR003444">Statistical downscaling using K-nearest neighbors</a></li>
</ul>
<div class="highlight_box">
<p>
Thomas Martin is an AI/ML Software Engineer at the NSF Unidata Program Center. Have questions?
Contact <a href="mailto:support-ml@unidata.ucar.edu">support-ml@unidata.ucar.edu</a>
or book an office hours meeting with Thomas on his
<a href="https://calendar.app.google/ZsM8dLHLa65eGAr39">Calendar</a>.
</p>
</div>
https://www.unidata.ucar.edu/blogs/news/entry/quick-tips-for-ess-machineQuick Tips for ESS Machine Learning ProjectsUnidata News2024-02-12T09:10:00-07:002024-02-12T09:10:00-07:00<div class="img_l" style="width: 100px;">
<a class="lightbox" title="One AI’s view of what a geologist does when running ML models (courtesy of Bing)." href="/blog_content/images/2024/20240212_ml_generated.png">
<img width="100" src="/blog_content/images/2024/20240212_ml_generated.png" alt="Generated image of geologist" />
</a>
</div>
<p>
Your idea of what's entailed in setting up a supervised Machine Learning (ML)
project as an Earth Systems scientist is probably not as fanciful as what an image
generation algorithm came up with. But there are many little
decisions ML practitioners make along the way when starting an Earth Systems Science
(ESS) ML project. This article provides some tips and ideas to consider as you're
getting started. These tips are not in any particular order, and like all things
related to ML projects they depend on the specific types of data and project goals.
</p>
<p class="byline">
By Thomas Martin, AI/ML Software Engineer
</p>
<div class="img_l" style="width: 150px;">
<a class="lightbox" title="One AI’s view of what a geologist does when running ML models (courtesy of Bing)." href="/blog_content/images/2024/20240212_ml_generated.png">
<img width="150" src="/blog_content/images/2024/20240212_ml_generated.png" alt="Generated image of geologist" />
</a>
<div class="caption">
Generated image of a geologist using ML models<br>(Click to enlarge)
</div>
<p></div></p>
<p>
Your idea of what's entailed in setting up a supervised Machine Learning (ML)
project as an Earth Systems scientist is probably not as fanciful as what an image
generation algorithm came up with (see image at left!) But there are many little
decisions ML practitioners make along the way when starting an Earth Systems Science
(ESS) ML project. This article provides some tips and ideas to consider as you're
getting started. These tips are not in any particular order, and like all things
related to ML projects they depend on the specific types of data and project goals.
(If you have any questions about your particular project, feel free to book a
meeting with me — my contact details are at the end of this article.)
</p>
<h3>Try a Few Models</h3>
<div class="img_r" style="width: 200px;">
<a class="lightbox" title="A high level comparison of different scikit-learn classifiers" href="/blog_content/images/2024/20240212_ml_classifier_comparison_001.png">
<img width="200" src="/blog_content/images/2024/20240212_ml_classifier_comparison_001.png" alt="A high level comparison of different scikit-learn classifiers" />
</a>
<div class="caption">
(Click to enlarge)
</div>
<p></div></p>
<p>
Even if you're sure that you need a deep learning model for your
project, using some ‘shallow’ (sci-kit learn) learning models either as a baseline,
or to aid with interpretation of input features is always recommended. This is one
thing I look for when I review applied ML papers. The two links below
compare different classification and regression models on different datasets.
</p>
<ul>
<li>My own <a href="https://github.com/ThomasMGeo/regressor_compare/blob/main/Regressor_compare.ipynb">scikit-learn regressor comparison</a></li>
<li><a href="https://scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.html">A high level comparison of different scikit-learn classifiers</a></li>
</ul>
<h3>Scale Your Data</h3>
<p>
Most ML models (not all!) require pre-processing and normalization. If you are using
a decision tree type of model, while it's not required it might be a good idea for your
particular dataset and use case. Scikit-Learn has a great suite of pre-processors,
and these are even useful for non-ML use cases. Lately I have been using the
<a href="https://scikit-learn.org/stable/modules/preprocessing.html#non-linear-transformation">quantile
transformer</a> for many of my workflows, but this is very much dataset and
model dependent.
</p>
<h3>Testing, Training, and Validation Datasets</h3>
<p>
While training and testing are crucial, the often-overlooked key to robust analysis
lies in a third, independent validation dataset. This independent set serves as a
critical reality check, ensuring your model generalizes well beyond the training
data and isn't simply overfitting. However, for environmental and geoscience data,
blindly applying random sampling for validation can be a recipe for disaster.
Spatial and temporal correlations inherent in these data can lead to misleading
results if not accounted for. For a deeper dive into best practices for well-based
geoscience data validation, you're welcome to read this paper I wrote as part of my doctoral
work: <a href="https://thesedimentaryrecord.scholasticahq.com/article/36638-digitalization-of-legacy-datasets-and-machine-learning-regression-yields-insights-for-reservoir-property-prediction-and-submarine-fan-evolution-a-sub">Digitalization of Legacy Datasets and Machine Learning Regression Yields
Insights for Reservoir Property Prediction and Submarine-Fan Evolution: A Subsurface
Example From the Lewis Shale, Wyoming</a>
</p>
<h3>Drop Unnecessary Data</h3>
<p>
If after you've done some exploratory data analysis and some training and testing of
various models there are a few input features that do not seem to improve
performance, it’s best practice to remove (or drop) them before doing your final
analysis. Within the scikit-learn ecosystem, you can do this automatically using <a href="https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFE.html">
Recursive Feature Elimination
</a> (RFE) depending on the model. RFE not only simplifies and speeds up model
training, but also identifies the potentially most impactful features, giving you
better insights into your data.
</p>
<h3>Your Performance Metric Matters</h3>
<p>
In a <a href="https://www.unidata.ucar.edu/blogs/news/entry/r-sup-2-sup-downsides">previous post</a>,
I discussed the potential overuse of R2 as a metric for regression
problems. Accuracy for classification problems can also be an issue for unbalanced,
multi-class datasets which are common for ESS. It's worth experimenting with a couple
different performance metrics, and reporting more than one metric! Within the
scikit-learn ecosystem, there are many options besides R2 and accuracy. This is
especially true for ML models that are trying to predict relatively rare events.
(See "<a href="https://scikit-learn.org/stable/modules/model_evaluation.html">Metrics
and scoring: quantifying the quality of predictions</a>")
</p>
<h3>Visualize Your Data</h3>
<p>
Visualizing your data is not just something to do at the end of a project; it's a
critical sanity check throughout any quantitative analysis. Datasets like Anscombe's
quartet and the Datasaurus Dozen have shown how statistics alone do not tell the
whole story of datasets. As a data scientist, I find that visualizing data at every
step of the ML workflow, even if the plots don't make it into the final report,
helps me identify potential issues and refine my workflow. Don't underestimate the
power of a simple visualization. (See the <a href="https://www.unidata.ucar.edu/blogs/news/entry/r-sup-2-sup-downsides">previous post</a>
for some visual examples.)
</p>
<div class="highlight_box">
<p>
Thomas Martin is an AI/ML Software Engineer at the NSF Unidata Program Center. Have questions?
Contact <a href="mailto:support-ml@unidata.ucar.edu">support-ml@unidata.ucar.edu</a>
or book an office hours meeting with Thomas on his
<a href="https://calendar.app.google/ZsM8dLHLa65eGAr39">Calendar</a>.
</p>
</div>
https://www.unidata.ucar.edu/blogs/news/entry/r-sup-2-sup-downsidesR<sup>2</sup>: Downsides and Potential Pitfalls for ESS ML PredictionUnidata News2023-12-20T09:29:10-07:002023-12-20T09:29:10-07:00<div class="img_l" style="width: 200px;">
<img width="200" src="/blog_content/images/2023/20231220_ml_Dino_static.gif" alt="Datasaurus plot" />
</a>
<div class="caption">
Always plot your data!
</div>
<p></div></p>
<p>
<em>Regression analysis</em> is a fundamental concept in the field of machine learning (ML),
in that it helps establish relationships among the
variables by estimating how one variable affects the other.
</p>
<p>
The <em>coefficient of determination</em>, R<sup>2</sup> (pronounced “R squared”),
is a measure that provides information about how well the regression line suggested
by a numerical model approximates the actual data (often referred to as “goodness of
fit”).
</p>
<p class="byline">
By Thomas Martin, AI/ML Software Engineer
</p>
<div class="img_l" style="width: 200px;">
<a class="lightbox" title="Visualizations of a variety of datasets that the same summary statistics as the Datasaurus but very different distributions." href="/blog_content/images/2023/20231220_ml_DinoSequentialSmaller.gif">
<img width="200" src="/blog_content/images/2023/20231220_ml_Dino_static.gif" alt="Datasaurus plot" />
</a>
<div class="caption">
Always plot your data!<br>(Click to see why.)
</div>
<p></div></p>
<p>
<em>Regression analysis</em> is a fundamental concept in the field of machine learning (ML),
in that it helps establish relationships among the
variables by estimating how one variable affects the other.
</p>
<p>
The <em>coefficient of determination</em>, R<sup>2</sup> (pronounced “R squared”),
is a measure that provides information about how well the regression line suggested
by a numerical model approximates the actual data (often referred to as “goodness of
fit”).
</p>
<p>Quick aside: Here are a couple of datasets to ponder while reading through this blog
post: <a href="https://en.wikipedia.org/wiki/Anscombe%27s_quartet">Anscombe’s Quartet</a>
and <a href="https://jumpingrivers.github.io/datasauRus/">Datasaurus Dozen</a>.
</p>
<p>R<sup>2</sup> is often one of the initial metrics introduced in predictive
regression analysis, and while it is commonly reported, I've found it to be less
suitable for some ML applications in Earth Systems Science (ESS),
for the following reasons:
</p>
<div class="warning_box">
<h3>R<sup>2</sup> is best suited for Gaussian distributions</h3>
<p>While you can calculate R<sup>2</sup> for nonlinear models, it is less appropriate for
variables with non-gaussian distributions.
</p>
<h3>R<sup>2</sup> without slope does not tell the entire story</h3>
<p>
The R<sup>2</sup> value provides information about the proportion of variance explained, but
it does not provide insights into the direction or strength of the relationships
between variables.
</p>
<p>
It is crucial to consider the slope of the regression line. A high R<sup>2</sup> with a small
or insignificant slope may indicate a weak relationship or lack of practical
significance.
</p>
<h3>R<sup>2</sup> Is sensitive to outliers</h3>
<p>
R<sup>2</sup> is sensitive to outliers in the data, meaning that extreme values can
disproportionately influence the R<sup>2</sup> value.
</p>
<p>
Outliers can significantly impact the regression line and, consequently, the
proportion of variance explained by the model.
</p>
</div>
<p>
While R<sup>2</sup> can be useful for normally distributed prediction problems in ESS,
especially for data exploration or quick feature selection workflows, I recommend
using additional prediction <a href="https://scikit-learn.org/stable/modules/model_evaluation.html#regression-metrics">metrics</a>
(particularly <a href="https://en.wikipedia.org/wiki/Mean_absolute_error">Mean
Absolute Error</a>) for day-to-day ML work to ensure a more robust and accurate
assessment of ESS ML model performance. <strong>
Plotting your data is always a necessary step no matter what metric you use!
</strong>
</p>
<p>
By way of illustration, I've put together a short Jupyter notebook working through
some basic examples of places where R<sup>2</sup> might fall short:
<a href="https://github.com/Unidata/MLscratchpad/blob/main/BlogNotebooks/R2_playground.ipynb">R2 Playground</a>
</p>
<h3>Further Reading</h3>
<p>
If you're interested in learning more about the possible pitfalls of R<sup>2</sup>,
try these:
</p>
<ul>
<li><a href="https://en.wikipedia.org/wiki/Coefficient_of_determination">Coefficient of Determination Wikipedia page</a></li>
<li><a href="https://library.virginia.edu/data/articles/is-r-squared-useless">Is R-squared Useless?</a></li>
<li><a href="https://statisticsbyjim.com/regression/r-squared-invalid-nonlinear-regression/">R-squared Is Not Valid for Nonlinear Regression</a></li>
<li><a href="https://gmd.copernicus.org/articles/15/5481/2022/">Root-mean-square error (RMSE) or mean absolute error (MAE): when to use them or not</a></li>
<li><a href="https://www.stat.cmu.edu/~cshalizi/mreg/15/lectures/10/lecture-10.pdf">Lecture 10: F -Tests, R<sup>2</sup>, and Other Distractions</a></li>
<li><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2892436/">An evaluation of R<sup>2</sup> as an inadequate measure for nonlinear models in pharmacological and biochemical research: a Monte Carlo approach</a></li>
<li><a href="https://towardsdatascience.com/avoid-r-squared-to-judge-regression-model-performance-5c2bc53c8e2e">Avoid R-squared to judge regression model performance</a></li>
</ul>
<div class="highlight_box">
<p>
Thomas Martin is an AI/ML Software Engineer at the Unidata Program Center. Have questions?
Contact <a href="mailto:support-ml@unidata.ucar.edu">support-ml@unidata.ucar.edu</a>
or book an office hours meeting with Thomas on his
<a href="https://calendar.app.google/ZsM8dLHLa65eGAr39">Calendar</a>.
</p>
</div>
https://www.unidata.ucar.edu/blogs/news/entry/self-organizing-maps-for-earthSelf Organizing Maps for Earth Systems ScienceUnidata News2023-12-08T12:10:36-07:002023-12-08T12:10:36-07:00<div class="img_r" style="width: 200px;">
<img width="200" src="/blog_content/images/2023/20231208_ml_som_01.png" alt="Representation of Self Organizing Map" />
<div class="caption">
Representation of nodes in a Self Organizing Map.
</div>
<p></div></p>
<p>
A self-organizing map
(SOM), sometimes known as a Kohonen map after its originator the Finnish professor
Teuvo Kohonen, is an unsupervised machine learning technique used to produce a
low-dimensional representation of a higher dimensional data set.
SOMs are a specific type of
artificial neural network, but use a different training strategy compared to more
traditional artificial neural networks (ANNs). SOMs
can be used for clustering, dimensionality reduction, feature extraction, and
classification — all of which suggest that they can be important tools for
understanding large Earth Systems Science (ESS) datasets.
</p>
<p class="byline">
By Thomas Martin, AI/ML Software Engineer
</p>
<div class="img_r" style="width: 200px;">
<a class="lightbox" title="Self-Organizing Maps are a lattice or grid of neurons (or nodes) that accept and respond to a set of input signals. (Image from Seth, Beginners Guide to Anomaly Detection Using Self-Organizing Maps)" href="/blog_content/images/2023/20231208_ml_som_01.png">
<img width="200" src="/blog_content/images/2023/20231208_ml_som_01.png" alt="Representation of Self Organizing Map" />
</a>
<div class="caption">
Representation of nodes in a Self Organizing Map<br>(click to enlarge).
</div>
<p></div></p>
<p>
A <a href="https://ieeexplore.ieee.org/abstract/document/58325">self-organizing map</a>
(SOM), sometimes known as a Kohonen map after its originator the Finnish professor
Teuvo Kohonen, is an unsupervised machine learning technique used to produce a
low-dimensional representation of a higher dimensional data set.
SOMs are a specific type of
artificial neural network, but use a different training strategy compared to more
traditional artificial neural networks (ANNs). SOMs
can be used for clustering, dimensionality reduction, feature extraction, and
classification — all of which suggest that they can be important tools for
understanding large Earth Systems Science (ESS) datasets. ESS applications are
explored in some detail by Liu and Weisberg in <a href="https://doi.org/10.5772/13146">
A Review of Self-Organizing Map Applications
in Meteorology and Oceanography
</a>; see the end of this post for other interesting papers.
</p>
<p>
If you'd like to try this technique with your datasets, the Python package
<a href="https://github.com/JustGlowing/minisom">MiniSom</a> is a minimalistic and
Numpy based implementation of Self Organizing Maps. MiniSom seems to be the most up
to date and well used within the ESS community, but it’s also possible to write your
own implementation using <a href="https://numpy.org/doc/stable/">numpy</a>
in under a hundred lines of code.
</p>
<p>
Below are links to two short tutorials for using MiniSom with Atmospheric datasets. The
first one was written by Talia Kurtz of University of North Dakota. The second tutorial
was created by Kevin Goebbert of Valparaiso University and myself based on workflows
described in a paper by
<a href="https://doi.org/10.1029/2021JD036198">Ramseyer <em>et al.</em> (2022)</a> .
</p>
<ul>
<li>Kurtz: <a href="https://github.com/taliakurtz/MiniSOM_tutorial/tree/main">MiniSOM
Tutorial for 2-D Atmospheric Data and Example Using Mean Sea Level Pressure Data</a>
</li>
<li>
Goebbert and Martin: <a href="https://github.com/ThomasMGeo/MinisomTutorial_Atmo/tree/main">Two-notebook
Tutorial on the MiniSom package with an Atmospheric data examples</a>
</li>
</ul>
<p>
Both of these examples ran smoothly on Jetstream2 Jupyterhub instances running in
the Unidata Science Gateway. If you have questions or feedback on this article, or
ideas for future topics, send me an email or feel free to book an office hour
(see box below for contact details). What other Self Organizing Map
research should I read up on?
</p>
<h3>Some other interesting papers on using SOMs in ESS</h3>
<p>
If you're interesting in pursuing the use of this technique, check out:
</p>
<ul>
<li>
<a href="https://doi.org/10.1029/2005JC003117">Performance evaluation of the
self-organizing map for feature extraction</a>
</li>
<li>
<a href="https://doi.org/10.1175/2009JCLI2645.1">Attribution of Projected Changes in
Atmospheric Moisture Transport in the Arctic: A Self-Organizing Map Perspective</a>
</li>
<li>
<a href="https://doi.org/10.1177/0309133310397582">The self-organizing map in
synoptic climatological research</a>
</li>
<li>
<a href="https://doi.org/10.5772/54299">Self-Organizing Maps: A Powerful Tool for the
Atmospheric Sciences</a>
</li>
<li>
<a href="https://doi.org/10.1016/j.advwatres.2020.103676">Tools for enhancing the
application of self-organizing maps in water resources research and engineering</a>
</li>
<li>
<a href="https://doi.org/10.1002/joc.4950">Reanalysing the impacts of atmospheric
teleconnections on cold-season weather using multivariate surface weather types
and self-organizing maps</a>
</li>
<li>
<a href="https://www.analyticsvidhya.com/blog/2021/09/beginners-guide-to-anomaly-detection-using-self-organizing-maps/">Beginners
Guide to Anomaly Detection Using Self-Organizing Maps</a>
</li>
</ul>
<div class="highlight_box">
<p>
Thomas Martin is an AI/ML Software Engineer at the Unidata Program Center. Have questions?
Contact <a href="mailto:support-ml@unidata.ucar.edu">support-ml@unidata.ucar.edu</a>
or book an office hours meeting with Thomas on his
<a href="https://calendar.app.google/ZsM8dLHLa65eGAr39">Calendar</a>.
</p>
</div>
https://www.unidata.ucar.edu/blogs/news/entry/nsf-seeks-input-on-publicNSF Seeks Input on Public Access PlanUnidata News2023-11-28T11:38:24-07:002023-11-28T11:38:24-07:00<div class="img_l" style="width: 100px;">
<img width="100" src="/images/logos/nsf_logo_transparent.png" alt="NSF logo"/>
</div>
<p>
The National Science Foundation (NSF) is seeking public input from the science
and engineering research and education community on implementing the NSF
<a href="https://www.nsf.gov/pubs/2023/nsf23104/nsf23104.pdf">Public Access Plan 2.0</a>.
</p>
<p>
The Public Access Plan 2.0 is an update to NSF current public access requirements
in response to recent White House Office of Science and Technology Policy
guidance; among other things, it addresses potential equity impacts of public
access requirements.
</p>
<div class="img_l" style="width: 100px;">
<img width="100" src="/images/logos/nsf_logo_transparent.png" alt="NSF logo"/>
</div>
<p>
The National Science Foundation (NSF) is seeking public input from the science
and engineering research and education community on implementing the NSF
<a href="https://www.nsf.gov/pubs/2023/nsf23104/nsf23104.pdf">Public Access Plan 2.0</a>.
</p>
<p>
The Public Access Plan 2.0 is an update to NSF current public access requirements
in response to recent White House Office of Science and Technology Policy
guidance; among other things, it addresses potential equity impacts of public
access requirements. More information is available in the U.S. Federal Register as
<a href="https://www.govinfo.gov/app/details/FR-2023-11-16/2023-25267">88 FR 78796
- Request for Information (RFI) on NSF Public Access Plan 2.0: Ensuring Open,
Immediate, and Equitable Access to National Science Foundation Funded
Research</a>. The relevant section of the Federal Register,
which includes the list of questions from the survey, is available in
<a href="https://www.govinfo.gov/content/pkg/FR-2023-11-16/pdf/2023-25267.pdf">U.S. Federal Register pages 78796-78798</a>.
</p>
<p>
Members of the public can comment on the Public Access Plan 2.0 via an online survey:
<a href="https://www.surveymonkey.com/r/NSFpublicaccessplan">Request for
Information (RFI) on NSF Public Access Plan 2.0: Ensuring Open, Immediate, and
Equitable Access to National Science Foundation Funded Research</a>. Responses will
be accepted until 11:59 p.m. (EST) on <strong>January 2, 2024</strong>.
</p>
https://www.unidata.ucar.edu/blogs/news/entry/radio-occultation-data-from-cosmicRadio Occultation Data from COSMIC Available in the IDDUnidata News2023-02-08T09:37:09-07:002023-02-08T09:37:09-07:00<div class="img_l" style="width: 150px;margin-bottom: 0;">
<img width="150" src="/images/logos/cosmic.png" alt="COSMIC logo"/>
</div>
<p>
The Unidata Program Center is partnering with UCAR's COSMIC program to provide radio occultation data
provided by Spire Global. The products
described below are now available via the
Internet Data Distribution (IDD)
network. Data are on the EXP feed with
a typical total volume of 80-110 MB per hour.
</p>
<p><style>
h4 { margin-top:1.5em; }
</style></p>
<div class="img_l" style="width: 150px;margin-bottom: 0;">
<img width="150" src="/images/logos/cosmic.png" alt="COSMIC logo"/>
</div>
<p>
The Unidata Program Center is partnering with UCAR's <a
href="https://cosmic.ucar.edu/">COSMIC</a> program to provide radio occultation data
provided by <a href="https://www.spire.com/">Spire Global</a>. The products
described below are now available via the <a
href="https://www.unidata.ucar.edu/projects/index.html#idd">
Internet Data Distribution (IDD)
</a> network. Data are on the <a
href="https://docs.unidata.ucar.edu/ldm/current/basics/feedtypes/">EXP feed</a> with
a typical total volume of 80-110 MB per hour.
</p>
<p>
From the COSMIC announcement of the availability of these products:
</p>
<p class="quoteroman">
The NOAA <a href="https://www.space.commerce.gov/business-with-noaa/commercial-weather-data-pilot-cwdp/">Commercial
Data Program</a> Delivery Order 5 (DO5) implements unrestricted data rights for
the radio occultation data processed by UCAR COSMIC’s <a href="https://www.cosmic.ucar.edu/what-we-do/data-processing-center">CDAAC</a> in
near real time, with the goal of making data products available to a wider
audience. For DO5 Spire Global provides observation data from its satellite
constellation for processing into higher level neutral atmosphere (about 3300
profiles/day) and ionosphere products. The DO5 <a href="https://data.cosmic.ucar.edu/gnss-ro/spire/noaa/">data</a> are made
available on a next day basis at CDAAC and will now additionally be distributed
via Unidata’s Internet Data Distribution (IDD) service in near-real time.
</p>
<p>
The products covered by DO5 being made available via the IDD are listed below,
along with examples of the Product IDs, REQUEST lines and PQACT entries you would
use to configure your Local Data Manager (<a href="https://www.unidata.ucar.edu/software/ldm/">LDM</a>)
to receive the products. (In the REQUEST line examples,
replace <em>upstream_IDD_relay</em> with the host name of your site's upstream LDM.)
</p>
<h4>Atmospheric profile without moisture information (atmPrf)</h4>
<p>
Full resolution profiles of physical parameters including bending angle,
refractivity, dry pressure, dry temperature, impact parameter, etc. versus
geometric height above mean sea level. Reference:
<a href="https://cdaac-www.cosmic.ucar.edu/cdaac/cgi_bin/fileFormats.cgi?type=atmPrf">https://cdaac-www.cosmic.ucar.edu/cdaac/cgi_bin/fileFormats.cgi?type=atmPrf</a>.
</p>
<h5>Example LDM/IDD Product ID:</h5>
<pre>spireert/level2/atmPrf/2023.031/atmPrf_S115.2023.031.17.59.G07_0001.0001_nc</pre>
<h5>Example LDM Configuration file REQUEST line:</h5>
<pre>REQUEST EXP “spireert/level2/atmPrf” <em>upstream_IDD_relay</em></pre>
<h5>Example LDM pattern-action file action:</h5>
<pre>
EXP spireert/level2/atmPrf/([0-9]{4}).([0-9]{3})/(atmPrf_.*_nc)
FILE -close -overwrite
/data/ldm/pub/native/COSMIC/spireert/atmPfr/\3
</pre>
<h4>Atmospheric occultation profile with moisture information included (wetPf2)</h4>
<p>
Atmospheric occultation profile with moisture information included. A gridded
analysis or short-term forecast is used to separate the pressure, temperature and
moisture contributions to refractivity. This file is interpolated to 100 meter
height levels. References:
<a href="https://cdaac-www.cosmic.ucar.edu/cdaac/cgi_bin/fileFormats.cgi?type=wetPf2">https://cdaac-www.cosmic.ucar.edu/cdaac/cgi_bin/fileFormats.cgi?type=wetPf2</a> and
<a href="https://doi.org/10.3390/rs14215614">https://doi.org/10.3390/rs14215614</a>.
</p>
<h5>Example LDM/IDD Product ID:</h5>
<pre>spireert/level2/wetPf2/2023.031/wetPf2_S115.2023.031.17.59.G07_0001.0001_nc</pre>
<h5>Example LDM Configuration file REQUEST line:</h5>
<pre>REQUEST EXP “spireert/level2/wetPf2” <em>upstream_IDD_relay</em></pre>
<h5>Example LDM pattern-action file action:</h5>
<pre>
EXP spireert/level2/wetPf2/([0-9]{4}).([0-9]{3})/(wetPf2.*_nc)
FILE -close -overwrite
/data/ldm/pub/native/COSMIC/spireert/wetPf2/\3
</pre>
<h4>WMO BUFR (bfrPrf)</h4>
<p>
A low resolution (200 meter) atmospheric profile in the BUFR format. Contains data
taken from the atmPhs, atmPrf and wetPrf files, including: header information,
occultation location and time; bending angle versus impact parameter profiles;
refractivity versus mean sea level geometric height; pressure, temperature, and
moisture versus geopotential height profiles. References:
<a href="https://cdaac-www.cosmic.ucar.edu/cdaac/cgi_bin/fileFormats.cgi?type=bfrPrf">https://cdaac-www.cosmic.ucar.edu/cdaac/cgi_bin/fileFormats.cgi?type=bfrPrf</a> and
<a href="https://preop.romsaf.org/romsaf_bufr.pdf">https://preop.romsaf.org/romsaf_bufr.pdf</a>.
</p>
<h5>Example LDM/IDD Product ID:</h5>
<pre>spireert/level2/bfrPrf/2023.031/bfrPrf_S115.2023.031.17.59.G07_0001.0001_bufr</pre>
<h5>Example LDM Configuration file REQUEST line:</h5>
<pre>REQUEST EXP “spireert/level2/bfrPrf” <em>upstream_IDD_relay</em></pre>
<h5>Example LDM pattern-action file action:</h5>
<pre>
EXP spireert/level2/bfrPrf/([0-9]{4}).([0-9]{3})/(bfrPrf.*_bufr)
FILE -close -overwrite
/data/ldm/pub/native/COSMIC/spireert/bfrPrf/\3
</pre>
<h4>Absolute total electron content (podTec)</h4>
<p>
Absolute total electron content (TEC) and auxiliary data in netCDF format. Each file
contains a TEC track data from one receiver-transmitter satellite pair. Reference:
<a href="https://cdaac-www.cosmic.ucar.edu/cdaac/cgi_bin/fileFormats.cgi?type=podTec">https://cdaac-www.cosmic.ucar.edu/cdaac/cgi_bin/fileFormats.cgi?type=podTec</a>.
</p>
<h5>Example LDM/IDD Product ID:</h5>
<pre>spireert/level1b/podTec/2023.031/podTec_S149.2023.031.18.45.0014.G23.00_0001.0001_nc</pre>
<h5>Example LDM Configuration file REQUEST line:</h5>
<pre>REQUEST EXP “spireert/level1b/podTec” <em>upstream_IDD_relay</em></pre>
<h5>Example LDM pattern-action file action:</h5>
<pre>
EXP spireert/level1b/podTec/([0-9]{4}).([0-9]{3})/(podTec.*_nc)
FILE -close -overwrite
/data/ldm/pub/native/COSMIC/spireert/podTec/\3
</pre>
<h4>To request all products:</h4>
<h5>Example LDM Configuration file REQUEST line:</h5>
<pre>REQUEST EXP “spireert” <em>upstream_IDD_relay</em></pre>
<h4>For more information</h4>
<p>For additional information regarding the data, please contact COSMIC at <a
href="mailto:cosmicops@ucar.edu">cosmicops@ucar.edu</a>.</p>
<p>If you need assistance in configuring your LDM, or have other questions/comments
regarding the LDM or IDD, please contact <a href="mailto:support-idd@unidata.ucar.edu">support-idd@unidata.ucar.edu</a>. </p>
https://www.unidata.ucar.edu/blogs/news/entry/unidata-to-mint-nfts-ofUnidata to Mint NFTs of Popular StormsUnidata News2022-04-01T08:10:00-06:002022-04-02T10:16:27-06:00<div class="img_l" style="width: 150px;">
<img width="150" src="/blog_content/images/2022/20220401_katrina.png" alt="Hurricane Katrina NFT" />
</div>
<p>
Everyone loves to talk about the weather. But until now, serious collectors of
weather memorabilia have been left on the sidelines. Oh, a lucky few manage to save
enormous hailstones in their freezers, but most are limited to screen shots of satellite or radar imagery,
or maybe articles clipped from the local newspaper.
</p>
<p>
But never fear: Unidata is
preparing to bring weather collectibles into the twenty-first century by minting
a series of Non Fungible Tokens (NFTs) based on significant weather events. Our
inaugural series will consist of 902 distinct NFTs of Hurricane Katrina, one for
each millibar of the storm's lowest recorded atmospheric pressure.
</p>
<div class="img_l" style="width: 200px;">
<a class="lightbox" title="One of 902 potential NFTs of hurricane Katrina." href="/blog_content/images/2022/20220401_katrina_anim.gif"> <img width="200" src="/blog_content/images/2022/20220401_katrina.png" alt="Hurricane Katrina NFT" /> </a>
<div class="caption">
Artist's rendering of a Hurricane Katrina NFT<br>
(click to enlarge)
</div>
<p></div></p>
<p>
Everyone loves to talk about the weather. But until now, serious collectors of
weather memorabilia have been left on the sidelines. Oh, a lucky few manage to save
enormous hailstones in their freezers, but most are limited to screen shots of satellite or radar imagery,
or maybe articles clipped from the local newspaper.
</p>
<p>
But never fear: Unidata is
preparing to bring weather collectibles into the twenty-first century by minting
a series of Non Fungible Tokens (NFTs) based on significant weather events. Our
inaugural series will consist of 902 distinct NFTs of Hurricane Katrina, one for
each millibar of the storm's lowest recorded atmospheric pressure.
</p>
<h5>Historic Storms on the Blockchain</h5>
<p>
Weather enthusiasts have traditionally been early adopters of new technologies, and
the move to collectible NFTs is no different. “We feel compelled to help our
community move smoothly into this exciting new technological space,” says
Unidata Director Mohan Ramamurthy, explaining the program's initiatives to reduce
<em>time to crypto</em> by providing systems that demonstrate modern blockchain
workflows for the geosciences. “By encapsulating Unidata's high-quality
data streams in NFT form, scientists, educators, and students can be sure they have
access to verified unique data about historical weather events,” he continues.
</p>
<p>
While NFTs like the
<a href="https://opensea.io/assets/0xd07dc4262bcdbf85190c01c996b4c06a461d2430/322068">Pringles CryptoCrisp</a>
or real estate in the <a href="https://superrare.com/artwork-v2/mars-house-21383">Metaverse</a>
may give the impression that the whole concept is just silly, Unidata hopes to demonstrate (see the
artist's rendering above) techniques for putting the <em>smart</em> in smart contracts,
and that blockchain technology can be used for serious science.
</p>
<h5>Avoiding Environmental Impacts</h5>
<p>
Seeking to avoid the environmental impacts associated with cryptocurrency operations,
Unidata engineers have pioneered an NFT creation workflow that greatly reduces the
need for computing power and the associated energy requirements. The new process consists of:
</p>
<ol>
<li>A member of the Unidata Program Center staff locating a netCDF file containing
relevant historical storm data in Unidata's archive.
</li>
<li>
Applying a simple initial typograhical transformation, inverting the case of the
letters "netCDF" to produce "NETcdf."
</li>
<li>
Applying a second typographical transformation, removing the lowest arm of the "E"
to produce an "NFTcdf."
</li>
<li>
Uploading the resulting NFTcdf to the soon-to-be implemented Climate-and-Forecasting-focused
blockchain trading platform OpenCF.
</li>
</ol>
<div class="img_l" style="width: 200px;">
<a class="lightbox" title="Unidata's Data Hallway: Up to the task"." href="/blog_content/images/2016/20160401_datahallway.jpg"> <img width="200" src="/blog_content/images/2016/20160401_datahallway.jpg" alt="Data Hallway" /> </a>
<div class="caption">
Unidata Data Hallway: Plenty of computing power.
</div>
<p></div></p>
<p>
Note: Unidata's process is currently not well suited to conversion of
<a href="https://www.unidata.ucar.edu/blogs/news/entry/winds_of_change">GRIB</a>
or BUFR files.
</p>
<p>
Unidata Program Center systems administrator Mike Schmidt is confident that the
computing power gathered in Unidata's
<a href="https://www.unidata.ucar.edu/blogs/news/entry/the_hallmark_of_quality_data">Data
Hallway</a>
will be sufficient for the task. “We upgraded our
<a href="https://www.unidata.ucar.edu/blogs/news/entry/big-data-get-small">NUC</a>
equipment to an Intel
i7-based system during the <a href="https://www.unidata.ucar.edu/blogs/news/entry/world-data-flows-return-to">data blockage of 2021</a>”
says Schmidt. “We're pretty sure it can handle changing a few letters around
to create NFTcdfs.”
</p>
<p>
Unidata is hopeful that the OpenCF trading platform will progress beyond its current
conceptual stage quickly, and that the Hurricane Katrina NFTcdf series will be available
by <em>April Fool's Day, 2022</em>.
</p>
<p><a class="lightbox" title="Happy April Fool's Day from the Unidata Program Center." href="http://www.unidata.ucar.edu/blog_content/images/2013/20130401_aprilfools.png"></a></p>
https://www.unidata.ucar.edu/blogs/news/entry/mcfetch-provides-free-satellite-archiveMCFETCH Provides Free Satellite Archive Data Access for the Unidata Academic CommunityUnidata News2021-07-06T08:30:00-06:002021-07-27T15:20:30-06:00<div class="img_l" style="width: 150px;">
<img width="150" src="https://www.unidata.ucar.edu/blog_content/images/logos/SSEC_logo_small.png" alt="SSEC"/>
</div>
<p>
The Unidata program and the University of Wisconsin–Madison's
Space Science and Engineering Center (SSEC) have a long history of collaboration
and cooperation to serve the needs of Unidata community members.
The SSEC Satellite Data Services(SDS) group, which provides access to and distribution
of real-time and archive weather satellite data, makes limited amounts of archive
satellite data available to Unidata's academic community members at no cost via
the “Multi-format Client-agnostic File Extraction Through Contextual HTTP”
(MCFETCH) system.
</p>
<div class="img_l" style="width: 150px;">
<a href="https://www.ssec.wisc.edu/datacenter/" target="_blank"><img width="150" src="https://www.unidata.ucar.edu/blog_content/images/logos/SSEC_logo_small.png" alt="SSEC"/></a>
</div>
<p>
The Unidata program and the University of Wisconsin–Madison's
Space Science and Engineering Center (SSEC) have a long history of collaboration
and cooperation to serve the needs of Unidata community members.
The <a href="http://www.ssec.wisc.edu/datacenter/" target="_blank">SSEC Satellite Data Services</a>
(SDS) group, which provides access to and distribution
of real-time and archive weather satellite data, makes limited amounts of archive
satellite data available to Unidata's academic community members at no cost via
the “Multi-format Client-agnostic File Extraction Through Contextual HTTP”
(<a href="https://mcfetch.ssec.wisc.edu/">MCFETCH</a>) system.
</p>
<div class="img_r" style="width: 200px;">
<a class="lightbox" title="Volume Browser Display Types" href="/blog_content/images/2021/20210706_mcfetch_01.png">
<img width="200" src="/blog_content/images/2021/20210706_mcfetch_01.png" alt="Description"/>
</a>
<div class="caption">
MCFETCH web interface (click to enlarge)
</div>
<p></div>
<a class="lightbox" title="MCFETCH web interface" href="/blog_content/images/2021/20210706_mcfetch_02.png"></a></p>
<p>
The SSEC SDS archive is one of the largest online geostationary satellite data
archives in the world. In 2021, the archive is over 1.5 PB in size, contains data from every
GOES satellite and spans 40+ years. In addition, 20 years of international
geostationary satellite data are also in the archive.
</p>
<h5>How it Works</h5>
<p>
Any Unidata community member with a <code>.edu</code> e-mail address can request
access to the archive dataset for non-commercial use. (Please read the community
archive <a href="http://www.ssec.wisc.edu/datacenter/openarchive/unidata/disclaimer.html">terms of use</a>.)
Registered users can access up
to 1 GB of data per day (with a daily quota of no more than 1000 transactions).
Archive satellite data
older than 30 days are available.
</p>
<p>
To access these datasets, community members must register via the
<a href="https://mcfetch.ssec.wisc.edu/#register">MCFETCH registration page</a>.
After registration has been processed by SDS, registrants will
receive a MCFETCH API key, which can then be used to request archive data.
</p>
<p>
The following information is required to register:
</p>
<ul>
<li>
Name
</li>
<li>
Institution
</li>
<li>
E-mail address
</li>
</ul>
<p>
With the MCFETCH API key in hand, data users construct a data URL specifying the
satellite, band, date/time, location, and other relevant parameters, then use the
URL to retrieve the data. The MCFETCH interface provides mechanisms to retrieve
data in a variety of formats, including
McIDAS AREA format, binary, GeoTIFF, GIF, JPG, list, netCDF, and text.
</p>
<p>A properly constructed MCFETCH URL looks something like:</p>
<pre>https://mcfetch.ssec.wisc.edu/cgi-bin/mcfetch?dkey=00000000-0000-0000-0000-000000000000&satellite=GOES15&band=1&output=JPG&date=20121115&time=20:15&lat=37.21+111.73&size=1024+1024&mag=-2+-2</pre>
<div class="img_l" style="width: 200px;padding-top:1em;">
<a class="lightbox" title="GOES15 data from November 2012 displayed in the IDV" href="http://www.unidata.ucar.edu/blog_content/images/2013/20130128_ssec_01.gif"> <img width="200" src="http://www.unidata.ucar.edu/blog_content/images/2013/20130128_ssec_01.gif" alt="Description"/> </a>
<div class="caption">
Click to enlarge
</div>
</div>
<p>
The MCFETCH website includes a
<a href="https://mcfetch.ssec.wisc.edu/#interface">web interface</a>
that allows data users to interactively
build a URL, along with a viewing interface for quick visualization of the requested
data. Other parts of the website provide a “wizard” interface that provides
a step-by-step tutorial on constructing valid MCFETCH URLs, examples of how to
retrieve data from MCFETCH servers using programming languages such as Python,
and comprehensive lists of valid parameters for different data types.
</p>
<h5>Available Data</h5>
<p>
The <a href="https://inventory.ssec.wisc.edu/inventory/">SDS Inventory</a> page
provides an easy mechanism to search for available data
by satellite, date, location, and a variety of other parameters. Visit the Inventory
for a complete listing of currently-available data.
</p>
<h5>Support</h5>
<p>
Questions regarding data access will be handled by the
SSEC Data Center (<a href="mailto:sds@ssec.wisc.edu">sds@ssec.wisc.edu</a>). Questions regarding the
use of Unidata McIDAS-X, IDV, or satellite data use will be redirected to
<a href="mailto:support@unidata.ucar.edu">Unidata Support</a>.
</p>
https://www.unidata.ucar.edu/blogs/news/entry/toward-standardized-digital-representations-ofToward Standardized Digital Representations of Units of MeasurementUnidata News2021-05-20T08:00:00-06:002021-05-20T08:00:00-06:00<div class="img_l" style="width: 150px;">
<img width="150" src="/blog_content/images/logos/codata.png" alt="CODATA logo" />
</div>
<p>
The Committee on Data (<a href="https://codata.org/">CODATA</a>) of the
Paris-based <a href="https://council.science/">International Science Council</a>
promotes open data policies, working to advance the interoperability and usability
of research data. The Committee is committed to supporting <a href="https://www.go-fair.org/fair-principles/">FAIR data principles</a> to improve
the Findability, Accessibility, Interoperability, and Reuse of digital assets.
</p>
<p>
Within the CODATA organizational umbrella, Unidata software developer Steven
Emmerson has joined the Digital Representation of Units of Measure (DRUM) Task Group,
which aims to raise the profile of the digital representation of units of measure in
research communities, representative and governing bodies, and with funders. DRUM
takes the position that support for consistent digital representations of units of
measurement is of far-reaching importance for science, technology, industry, and
trade.
</p>
<div class="img_l" style="width: 150px;">
<img width="150" src="/blog_content/images/logos/codata.png" alt="CODATA logo" />
</div>
<p>
The Committee on Data (<a href="https://codata.org/">CODATA</a>) of the
Paris-based <a href="https://council.science/">International Science Council</a>
promotes open data policies, working to advance the interoperability and usability
of research data. The Committee is committed to supporting <a href="https://www.go-fair.org/fair-principles/">FAIR data principles</a> to improve
the Findability, Accessibility, Interoperability, and Reuse of digital assets.
</p>
<p>
Within the CODATA organizational umbrella, Unidata software developer Steven
Emmerson has joined the Digital Representation of Units of Measure (DRUM) Task Group,
which aims to raise the profile of the digital representation of units of measure in
research communities, representative and governing bodies, and with funders. DRUM
takes the position that support for consistent digital representations of units of
measurement is of far-reaching importance for science, technology, industry, and
trade.
</p>
<p>The DRUM Task Group summarizes the effort this way:</p>
<p class="quoteroman">FAIR encapsulates some important principles to better enable computer-facilitated
scientific discovery and large scale data analysis. Interoperability and Reusability
depend fundamentally on standardised definitions and digital representations of units,
as well as machine-referenceable conversions. This in turn requires the mobilisation
and the input of the scientific community and the various domains represented by the
International Scientific Unions and Associations (ISUs/ISAs).</p>
<p class="quoteroman">DRUM aims to facilitate and coordinate the engagement of ISUs/ISAs with the issue
and to develop greater understanding of the use of units in different domains and the
issues around definition, digital representation, and conversion.</p>
<p>
For deeper insight into the motivation for the DRUM Task Group, you can read their
“manifesto:”
<a href="https://doi.org/10.5281/zenodo.4081656">Units of Measure for Humans and
Machines: Making Units Clear for Machine Learning and Beyond</a>
</p>
<p>
At Unidata, Emmerson is the primary developer of the <a href="https://www.unidata.ucar.edu/software/udunits/">UDUNITS software library</a>,
which supports arithmetic manipulation of units of physical quantities and conversion
of numeric values between compatible units. He was invited to join the DRUM group in
early 2021, and has begun to take part in the group’s meetings and other activities,
one of which is a “Survey on the Use of Units in Scientific Domains.”</p>
<p>The survey aims to help the group better understand the existing practices and
needs with relation to use of units in different scientific domains. To provide your
input to the DRUM Task Group, take the survey:</p>
<p><a href="https://forms.gle/w4c2uM9ottnSYJUy7">Survey on the Use of Units in Scientific Domains</a></p>
https://www.unidata.ucar.edu/blogs/news/entry/world-data-flows-return-toWorld Data Flows Return to Normal after Suez IncidentUnidata News2021-04-01T08:30:00-06:002021-04-02T07:45:40-06:00<div class="img_l" style="width: 200px;">
<img width="200" src="/blog_content/images/2021/20210401_ship-suez.jpg"
alt="Ever Given in Suez Canal" />
<div class="caption">
A container ship blocking the Suez Canal
</div>
<p></div></p>
<p>
For 6 days, 3 hours, and 38 minutes in late March, the Golden-class container ship
<a
href="https://www.vesselfinder.com/vessels/EVER-GIVEN-IMO-9811000-MMSI-353136000">Ever
Given</a> blocked the Suez canal, leaving more than 400 vessels piled up on either
end of the canal as they waited for the stranded container ship to be refloated.
While media coverage of the incident has focused on potential shortages of goods
like petroleum, food, and bathroom tissue, little attention was paid to the
potential for worldwide data shortages as a result of the reduction in shipping
capacity.
</p>
<div class="img_l" style="width: 200px;">
<a class="lightbox" title="The container ship Ever Given blocked the Suez Canal for more than six days."
href="/blog_content/images/2021/20210401_ship-suez.jpg"> <img width="200" src="/blog_content/images/2021/20210401_ship-suez.jpg"
alt="Ever Given in Suez Canal" /> </a>
<div class="caption">
A container ship blocking the Suez Canal<br>
(click to enlarge)
</div>
<p></div></p>
<p>
For 6 days, 3 hours, and 38 minutes in late March, the Golden-class container ship
<a
href="https://www.vesselfinder.com/vessels/EVER-GIVEN-IMO-9811000-MMSI-353136000">Ever
Given</a> blocked the Suez canal, leaving more than 400 vessels piled up on either
end of the canal as they waited for the stranded container ship to be refloated.
While media coverage of the incident has focused on potential shortages of goods
like petroleum, food, and bathroom tissue, little attention was paid to the
potential for worldwide data shortages as a result of the reduction in shipping
capacity.
</p>
<p>
Most readers are likely familiar with the data storage
facilities of <a href="https://docs.docker.com/storage/">Docker containers</a>, but a
smaller number may have considered the data storage and transmission features of
physical <a href="https://en.wikipedia.org/wiki/Intermodal_container">shipping
containers</a>.
<a href="https://news.ycombinator.com/item?id=1437169">Conservative estimates</a>
suggest that a single intermodal container could hold roughly 60 Petabytes of data
(assuming use of 2 TB, 2.5-inch hard drives and adequate padding). Traveling at an average
speed of 24 knots, container ships average 30 days from Hong Kong to Rotterdam, the
Netherlands (the Ever Given’s final destination). Thus, if the ship were carrying a
single such “data container,” and experienced no delays, it would have a rough
throughput of 23 GB/second. The owners of the stranded container ship did not
release a cargo manifest, but it is known that the ship can carry 18,300 standard
shipping containers. Assuming a mere 0.1% of these contained data, the bandwidth
would increase to roughly 420 GB/sec.
</p>
<div class="img_r" style="width: 200px;">
<a class="lightbox" title="This intermodal shipping container is climate controlled, ideal for use with temperature-sensitive data."
href="/blog_content/images/2021/20210401_intermodal.jpg"> <img width="200" src="/blog_content/images/2021/20210401_intermodal.jpg"
alt="Intermodal shipping container" /> </a>
<div class="caption">
An intermodal shipping container
</div>
<p></div></p>
<p>
While such bandwidth numbers are truly staggering, one must also take into account
<em>latency</em>, or the delay before a transfer of data begins following an
instruction for its transfer. Large container ships take up to three business days to
load and unload, and latency increases with every hour the container sits in traffic
on land, while being trucked from port to its final destination. Time spent unloading
the truck and installing the hard drives also affects the latency of the data
transmission.
</p>
<p>
In contrast, Unidata’s Internet Data Delivery (<a
href="https://www.unidata.ucar.edu/projects/index.html#idd">IDD</a>) system takes
advantage of the ultra-reliable and very low latency <a
href="https://internet2.edu/">Internet2</a> system where possible, along with high-speed
public internet channels for interoperability with universities not connected to I2. Routing
issues faced by computers running the Local Data Manager (<a
href="https://www.unidata.ucar.edu/software/ldm/">LDM</a>) software are resolved
almost instantaneously by the network, and never involve waiting for physical
objects weighing over 200,000 tonnes to be shifted by tugboats. Resolution of
internet routing problems does not depend on tides or phases of the moon.
</p>
<div class="img_l" style="width: 200px;">
<a class="lightbox" title="UCAR building rules keep Unidata's Data Hallway ship-free."
href="/blog_content/images/2016/20160401_datahallway.jpg"> <img width="200" src="/blog_content/images/2016/20160401_datahallway.jpg"
alt="Data Hallway" /> </a>
<div class="caption">
Unidata Data Hallway: no ships allowed
</div>
<p></div></p>
<p>
For those worried about the possibility of a similar <em>physical</em> blockage
affecting the transmission of geoscience data they depend on, please rest assured
that Unidata is taking every reasonable precaution. While our <a
href="https://www.unidata.ucar.edu/blogs/news/entry/the_hallmark_of_quality_data">Data
Hallway</a> is only six feet wide — less than three percent of the width of
the Suez canal, and thus an unlikely candidate for a container ship accident —
UCAR rules set to take effect on <em>April Fool's Day, 2021</em>
prohibit ships of <em>any</em> size inside the Unidata Program Center.
</p>
<p><a class="lightbox" title="Happy April Fool's Day from the Unidata Program Center."
href="http://www.unidata.ucar.edu/blog_content/images/2013/20130401_aprilfools.png"></a></p>