News@UnidataUnidata newshttps://www.unidata.ucar.edu/blogs/news/feed/entries/atom2024-03-06T11:18:50-07:00Apache Rollerhttps://www.unidata.ucar.edu/blogs/news/entry/exploring-indigenous-data-sovereigntyExploring Indigenous Data SovereigntyUnidata News2024-03-06T10:45:00-07:002024-03-06T11:18:50-07:00<div class="img_l" style="width: 200px;">
<img width="200" src="/blog_content/images/2024/20240306_SDNworkshop10a.png" alt="view through met tower" />
</div>
<p>
The concept of Indigenous Data Sovereignty (IDS) asserts that data generated by
Indigenous peoples, including data generated from their land and resources, should
be governed by the people themselves. Environmental observations collected on
native lands are one small part of the IDS context, and they were the subject
of a recent workshop hosted by the Southwestern Indian Polytechnic Institute
(SIPI) in Albuquerque, New Mexico.
</p>
<p class="byline">
By Jeff Weber, NSF Unidata Community Services
</p>
<div class="img_l" style="width: 200px;">
<a class="lightbox" title="Looking up through an environmental monitoring tower at SIPI." href="/blog_content/images/2024/20240306_SDNworkshop10a.png">
<img width="200" src="/blog_content/images/2024/20240306_SDNworkshop10a.png" alt="view through met tower" />
</a>
<div class="caption">
Data monitoring tower at SIPI (click to enlarge)
</div>
<p></div></p>
<p>
The concept of Indigenous Data Sovereignty (IDS) asserts that data generated by
Indigenous peoples, including data generated from their land and resources, should
be governed by the people themselves. Environmental observations collected on
native lands are one small part of the IDS context, and they were the subject
of a recent workshop hosted by the Southwestern Indian Polytechnic Institute
(SIPI) in Albuquerque, New Mexico.
</p>
<p>
The workshop, titled <a href="https://ncar.ucar.edu/exploring-data-sovereignty-workshop">Exploring Data
Sovereignty and the Sovereign Data Network</a> was held February 13-15, 2024. More
than 40 individuals from Tribal Colleges and Universities (TCUs), R1 universities,
and other agencies and groups associated with IDS and Earth System Science (ESS)
attended, including several participants affiliated with NSF Unidata.
</p>
<p>
The workshop was supported by an NSF NCAR <a href="https://ncar.ucar.edu/who-we-are/diversity-inclusion/core-awards/indigenous-data-sovereignty">
Collaborative Opportunities for Research Engagement (CORE) award</a>, which aims
to “...develop a set of
considerations for how Earth system science programs can implement data
sovereignty principles in any efforts that involve collaborations with Indigenous
communities or data collected on Indigenous lands or related to Indigenous
resources.”
</p>
<div class="img_r" style="width: 200px;">
<a class="lightbox" title="Patrick Freeland discussing Project Red Bus and RVCC." href="/blog_content/images/2024/20240306_SDNworkshop05a.png">
<img width="200" src="/blog_content/images/2024/20240306_SDNworkshop05a.png" alt="Patrick Freeland"/>
</a>
<div class="caption">
Patrick Freeland discussing Project Red Bus and RVCC
</div>
<p></div>
<a class="lightbox" title="Jim Sanovia discussing ESIIL." href="/blog_content/images/2024/20240306_SDNworkshop06a.png"></a></p>
<p>
Day one of the workshop focused on the topic of Indigenous Data Sovereignty and
Governance, exploring the background and motivations for Data Sovereignty and
Indigenous Data Management. There were also reports from organizations working in
the IDS field, including the <a href="https://esiil.org/">Environmental Data Science Innovation & Inclusion Lab</a>
(ESIIL), <a href="https://github.com/patrickfreeland83/ProjectRedBus/blob/main/ProjectRedBus_OnePager.pdf">ProjectRedBus</a>,
and the <a href="https://www.rvcchub.org/">Rising Voices, Changing Coasts</a> hub.
</p>
<div class="img_l" style="width: 200px;">
<a class="lightbox" title="NSF Unidata's Stonie Cooper discussing Mesonets on Tribal lands." href="/blog_content/images/2024/20240306_SDNworkshop07a.png">
<img width="200" src="/blog_content/images/2024/20240306_SDNworkshop07a.png" alt="Stonie Cooper"/>
</a>
<div class="caption">
NSF Unidata's Stonie Cooper discussing Mesonets on Tribal lands.
</div>
<p></div>
<a class="lightbox" title="Jeff McWhirter discussing RAMADDA for use the in the SDN." href="/blog_content/images/2024/20240306_SDNworkshop08a.png"></a></p>
<p>
Day two discussions and activities centered around the <em>Sovereign Data Network</em>
(SDN) project, a collaboration begun by SIPI, Navajo Technical University (NTU), and
NSF Unidata and supported by a U.S. National Science Foundation (NSF) pilot grant.
There were descriptions of the pilot project, discussions about how other groups
can become part of the network as it expands. There was also a data workshop with
demonstrations and training on the use of repository storage systems including
<a href="https://ramadda.org/about">RAMADDA</a>, which was initially created at NSF Unidata and is now supported by
<a href="https://geodesystems.com/">Geode Systems</a>.
</p>
<div class="img_r" style="width: 200px;">
<a class="lightbox" title="Observing tower at SIPI." href="/blog_content/images/2024/20240306_SDNworkshop09a.png">
<img width="200" src="/blog_content/images/2024/20240306_SDNworkshop09a.png" alt="Observing tower at SIPI."/>
</a>
<div class="caption">
Observing tower at SIPI.
</div>
<p></div></p>
<p>
Day three was focused on “hands on data wrangling” and systems (servers and
dataloggers) support, and included a visit to the environmental monitoring station
on the SIPI campus, which includes a data collection tower constructed as part of
the SDN pilot project.
</p>
<p>
As a result of the workshop, several TCUs and other organizations including
</p>
<ul>
<li>Oglala Community College</li>
<li>Aaniiih Nakoda College</li>
<li>Keweenaw Bay Ojibwa College</li>
<li>The Haskell Foundation</li>
<li>The American Indian Higher Education Consortium (AIHEC)</li>
<li>Rising Voices, Changing Coasts (RVCC)</li>
</ul>
<p>
expressed interest in joining the SDN and participating in future organizing and
funding efforts.
</p>
https://www.unidata.ucar.edu/blogs/news/entry/awips-tips-exploring-satellite-imageryAWIPS Tips: Exploring Satellite Imagery using Python-AWIPSShay Carter2024-03-06T09:00:00-07:002024-03-06T09:00:00-07:00<div class="img_l" style="width: 100px;margin-top:0;">
<img width="100" src="https://www.unidata.ucar.edu/blog_content/images/logos/awips-tips.png" alt="AWIPS Tips" />
</div>
<p>Welcome back to AWIPS Tips! </p>
<p>This week we’re going to dive into a little bit of <a href="https://unidata.github.io/python-awips/index.html" target="_blank">python-awips</a> to learn more about what satellite data our EDEX has to offer. If this is your first time joining us, it may be helpful to take a quick glance over some of <a href="https://unidata.github.io/awips2/appendix/educational-resources/#python-awips" target="_blank">our previous AWIPS Tips blogs about python-awips</a>. To take a deeper look into satellite data, we’ll be highlighting some of the features and cells of the <a href="https://unidata.github.io/python-awips/examples/generated/Satellite_Imagery.html" target="_blank">Satellite Imagery example notebook</a>. </p>
<div class="img_l" style="width: 125px;">
<img width="125" src="https://www.unidata.ucar.edu/blog_content/images/logos/awips-tips.png" alt="AWIPS Tips" />
</div>
<p>Welcome back to AWIPS Tips! </p>
<p>This week we’re going to dive into a little bit of <a href="https://unidata.github.io/python-awips/index.html" target="_blank">python-awips</a> to learn more about what satellite data our EDEX has to offer. If this is your first time joining us, it may be helpful to take a quick glance over some of <a href="https://unidata.github.io/awips2/appendix/educational-resources/#python-awips" target="_blank">our previous AWIPS Tips blogs about python-awips</a>. To take a deeper look into satellite data, we’ll be highlighting some of the features and cells of the <a href="https://unidata.github.io/python-awips/examples/generated/Satellite_Imagery.html" target="_blank">Satellite Imagery example notebook</a>. All other example notebooks can be found on our website as well, and available when downloading the <a href="https://unidata.github.io/python-awips/index.html#source-code-with-examples-install" target="_blank">python-awips source code</a> and running Jupyter locally.</p>
<p>Similar to other python-awips notebooks, this example begins by creating an EDEX connection (using our public EDEX - edex.cloud.unidata.ucar.edu) and setting the <i>datatype</i>. To access satellite data, the <i>datatype</i> is set to <b>satellite</b>. If you knew exactly what data you wanted, you could proceed to refine the <a href="https://unidata.github.io/python-awips/api/IDataRequest.html" target="_blank">DataRequest</a> by setting additional filters like the time, location name, parameters, or other modifiers. For this notebook though, we want to investigate what modifiers are actually available first before requesting data.</p>
<p>In the <a href="https://unidata.github.io/python-awips/examples/generated/Grid_Levels_and_Parameters.html" target="_blank">Grid Levels and Parameters notebook</a> we go over how to investigate what is available for <i>locations</i>, <i>parameters</i>, <i>levels</i>, and <i>times</i> of the <b>grid</b> <i>datatype</i>. Here we’re going to look at a new modifier, called optional identifiers. Now, as the name suggests, these are optional, so many <i>datatypes</i> may not have any identifiers. For satellite data, we take a look at the identifiers in <a href="https://unidata.github.io/python-awips/examples/generated/Satellite_Imagery.html#investigate-available-data" target="_blank">section 4 of the notebook</a> and see this output:</p>
<p class="highlight_box" style="font-family: courier;padding-left:2em;padding-top:1em;padding-bottom:1em">Available Identifiers:<br>
- source<br>
- physicalElement<br>
- creatingEntity<br>
- sectorID</p>
<p>From here, we can then take a look at each individual identifier and see what values are available. For example, we can look at the <b>source</b> with the following:</p>
<p class="highlight_box" style="font-family: courier;padding-left:2em;padding-top:1em;padding-bottom:1em">identifier = "source"<br>
sources = DataAccessLayer.getIdentifierValues(request, identifier)<br>
print(identifier + ":")<br>
print(list(sources))</p>
<p>Which then produces the following output:</p>
<p class="highlight_box" style="font-family: courier;padding-left:2em;padding-top:1em;padding-bottom:1em">source:<br>
['GTD01', 'RAMMB', 'WCDAS', 'RBU', 'UCAR', 'NSOF', 'McIDAS']</p>
<p>The notebook continues on with the rest of the remaining identifiers and shows you their available values. Once we have an idea of what data is available, then we can make selections of the identifiers to help narrow the result to exactly what we’re looking for.</p>
<p>For this notebook, both GOES East mesoscale sections are chosen as the desired data for plotting. A simple for-loop is used to draw both mesoscale 1 and mesoscale 2 images. Channel 13 of the GOES imager was chosen for this example, and gives produces plots like these:</p>
<p><center><img src="https://www.unidata.ucar.edu/blog_content/images/2024/20240306_meso1.png"/></center></p>
<p><center><img src="https://www.unidata.ucar.edu/blog_content/images/2024/20240306_meso2.png"/></center></p>
<p>Thanks for joining us and check back in two weeks for the next blog post.</p>
<p><em>To view archived blogs, visit the <a href="https://www.unidata.ucar.edu/blogs/news/tags/awipstips" target="_blank">AWIPS Tips blog tag</a>, and get notified of the latest updates from the AWIPS team by signing up for the <a href ="https://www.unidata.ucar.edu/support/index.html#mailinglists" target="_blank">AWIPS mailing list</a>. Questions or suggestions for the team on future topics? Let us know at <a href="mailto:support-awips@unidata.ucar.edu" target="_blank">support-awips@unidata.ucar.edu</a></em></p>
<p class="highlight_box" style="text-align:center;font-style:italic;font-size:10px;padding-left:2em;padding-top:1em;padding-bottom:1em">
This blog was posted in reference to v20.3.2-1 of NSF Unidata AWIPS</p>
https://www.unidata.ucar.edu/blogs/news/entry/nsf-unidata-2024-community-equipmentNSF Unidata 2024 Community Equipment Awards: Deadline ExtendedUnidata News2024-03-04T08:31:00-07:002024-03-04T08:31:00-07:00<div class="img_l" style="width: 150px;">
<img width="150" src="/community/img/equipaward_circuitboard.png" alt="Equipment Awards" />
</div>
<p>
As a result of changes in the spring 2024 meeting schedule for the NSF Unidata Users
Committee, we are able to extend the submission deadline for this year's Community
Equipment Awards solicitation until <span class="highlight_muted">March 29, 2024</span>. All other aspects
of the 2024 program remain as described in the original announcement.
</p>
<div class="img_l" style="width: 200px;">
<img width="200" src="/community/img/equipaward_circuitboard.png" alt="Equipment Awards" />
<div class="caption">
Unidata offers computer equipment grants to support a variety of projects
</div>
<p></div></p>
<p style="font-weight: bold; font-style: italic;">
As a result of changes in the spring 2024 meeting schedule for the NSF Unidata Users
Committee, we are able to extend the submission deadline for this year's Community
Equipment Awards solicitation until <span class="highlight_muted">March 29, 2024</span>. All other aspects
of the 2024 program remain as described in the original announcement, the text of
which follows:
</p>
<p>
The NSF Unidata Program Center is pleased to announce the opening of the 2024 NSF Unidata
Community Equipment Awards solicitation. Created under the sponsorship of
the National Science Foundation, Unidata equipment awards are intended to
encourage new members from diverse disciplinary backgrounds in the Earth Systems Sciences
to join the NSF Unidata community, and to encourage existing members to continue
their active participation, enhancing the community process. For 2024, a
total of $100,000 is available for awards; proposals for amounts up to $20,000
will be considered.
</p>
<p>
Past recipients of NSF Unidata equipment awards have used the grants to procure equipment
for data sharing, to create interactive data visualization laboratories,
and to encourage the use of NSF Unidata software packages in research and education.
The proposal process is designed to be as low-impact as possible, with just a few
required elements. You can view previously accepted proposals, and take advantage
of a short proposal template if you choose. Staff at the NSF Unidata Program Center are
happy to help you define the hardware requirements for your project or answer any
other questions you may have while preparing a proposal.
</p>
<p>
<strong>Note:</strong><br />
In keeping with NSF Unidata's <a href="/publications/directorspage/proposals/2024_proposal_narrative_final.pdf">most
recent proposal</a> to the National Science Foundation for
continued program funding, additional emphasis will be placed on providing support for
institutions serving populations that are underrepresented in the broad geoscience
community. NSF Unidata is dedicated to broadening participation by minority serving
institutions, and we particularly encourage small institutions, academic departments that
have not previously submitted proposals to this program, and programs outside NSF Unidata's
traditional atmospheric sciences community to apply.
</p>
<p>
The deadline for submitting proposals is
<span class="highlight_muted"><span style="text-decoration: line-through;">March 15</span> March 29, 2024</span>. The Unidata Program Center expects to
notify submitters
of award status by May 2024. For a complete description of the Unidata Community
Equipment Awards program, the proposal format, and the proposal review criteria,
see the
<a href="/community/equipaward/RFP2024.html">Call for Proposals</a>.
</p>
<p>
For more on the Community Equipment Awards, along with previous years' accepted proposals,
see the
<a href="/community/equipaward">Equipment Awards</a> page.
</p>
https://www.unidata.ucar.edu/blogs/news/entry/k-nearest-neighborsK Nearest NeighborsUnidata News2024-03-04T08:24:00-07:002024-03-04T08:24:00-07:00<div class="img_l" style="width: 150px;">
<img width="150" src="/blog_content/images/2024/20240219_ml_neighbor.png" alt="Fred Rogers" />
</div>
<p>
<strong>K Nearest Neighbors</strong> (KNN) is a supervised machine learning method that
"memorizes" (stores) an entire dataset, then relies on the concepts of proximity and
similarity to make predictions about new data. The basic idea is that if a new data
point is in some sense "close" to existing data points, its value is likely to
be similar to the values of its neighbors. In the Earth Systems Sciences, such
techniques can be useful for small- to moderate-scale classification and regression
problems.
</p>
<p class="byline">
By Thomas Martin, AI/ML Software Engineer
</p>
<div class="img_l" style="width: 150px;">
<a class="lightbox" title="Fred Rogers, famous for asking people to be his neighbor (image: Wikipedia)" href="/blog_content/images/2024/20240219_ml_neighbor.png">
<img width="150" src="/blog_content/images/2024/20240219_ml_neighbor.png" alt="Fred Rogers" />
</a>
<div class="caption">
Fred Rogers, famous for asking people to be his neighbor<br>(Click to enlarge)
</div>
<p></div></p>
<p>
<strong>K Nearest Neighbors</strong> (KNN) is a supervised machine learning method that
"memorizes" (stores) an entire dataset, then relies on the concepts of proximity and
similarity to make predictions about new data. The basic idea is that if a new data
point is in some sense "close" to existing data points, its value is likely to
be similar to the values of its neighbors. In the Earth Systems Sciences, such
techniques can be useful for small- to moderate-scale classification and regression
problems; one <a href="https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2004WR003444">example</a>
uses KNN techniques to derive local-scale information about
precipitation and temperature from regional- or global-scale numerical weather
prediction model output.
</p>
<p>
When using a KNN algorithm, you select the number of "neighbors" to consider
(<strong>K</strong>), and potentially a way of calculating the "distance" between
data points. KNN algorithms can be used for both classification and regression
problems. For regression problems, KNN predicts the target variable by using an
averaging scheme. For classification problems it takes the <em>mode</em> of the
nearest neighbors; as a result, it is generally recommended that the value of
<strong>K</strong> be an odd number. Effective use of KNN often requires some
experimentation to determine the best value for <strong>K</strong>.
</p>
<div class="img_l" style="width: 500px;float:none;display:block;margin:auto;">
<a class="lightbox" title="Comparing the decision boundary between using 1 neighbor vs 20, from <a href='https://kevinzakka.github.io/2016/07/13/k-nearest-neighbor/'>Kevin Zakka’s blog</a>." href="/blog_content/images/2024/20240219_ml_k_neighbors.png">
<img width="500" src="/blog_content/images/2024/20240219_ml_k_neighbors.png" alt="Comparing decision boundary" />
</a>
<div class="caption">
Comparing the decision boundary between using 1 neighbor vs 20, from <a href='https://kevinzakka.github.io/2016/07/13/k-nearest-neighbor/'>Kevin Zakka’s blog</a>.
</div>
<p></div></p>
<p>
KNN is sometimes called a "lazy learning" method. This is because it does not
generate a new explicit model, but rather memorizes the dataset in its entirety.
While the scikit-learn API uses a <code>.fit()</code> method, this is largely to
match the rest of the scikit-learn API.
</p>
<h3>Why you might use KNN for your ML project</h3>
<ol>
<li>It's simple. Because KNN is a lazy learner, there is no complex model and only
limited math is needed to understand the inner workings.
</li>
<li>It's adaptable to different data distributions. KNN works well with odd
distributions of data.
</li>
<li>It's good for smaller datasets. Because no model is being constructed, KNNs can
be a good choice for smaller datasets.
</li>
</ol>
<h3>Some Downsides to KNN</h3>
<ol>
<li>
It's sensitive to outliers and poor feature selection. KNN does not do any
automatic feature selection like decision tree models. These types of models
can struggle in high dimensional space, both with a large number of input
features and outliers within those features.
</li>
<li>
It has a relatively high computational cost. While the analog/sample matching
behavior of KNNs are great from an explainability point of view
(model-free ML is great!), for large datasets the cost of memorizing the entire
dataset can be enormous.
</li>
<li>
It needs a complete dataset. Like many other ML models, KNNs do not handle
missing data or NaN (Not a Number) values. If your dataset is not complete, you'll
need to impute the missing values before using a KNN.
</li>
</ol>
<p>
KNNs have been discussed previously on MetPy Mondays here:
<a href="https://www.youtube.com/watch?v=Z08TSSVWcAM">MetPy Mondays #183 - Predicting Rain with Machine Learning - Using KNN</a>
</p>
<p>
KNNs are a great supervised ML model to try out if your dataset is on the smaller
side. Happy modeling! What ML model should I cover in an upcoming blog?
</p>
<h3>More reading and resources</h3>
<ul>
<li><a href="https://github.com/NCAR/ML_workshop2023/blob/main/tutorials/Day2_lesson1_supervised_knn_tree.ipynb">A short notebook that uses KNNs</a></li>
<li><a href="https://scikit-learn.org/stable/modules/neighbors.html">Scikit Learn</a></li>
<li><a href="https://arxiv.org/pdf/1708.04321.pdf">Effects of Distance Measure Choice on KNN Classifier Performance</a></li>
<li><a href="https://neptune.ai/blog/knn-algorithm-explanation-opportunities-limitations">The KNN Algorithm - Explanation, Opportunities, Limitations</a> </li>
<li><a href="https://kevinzakka.github.io/2016/07/13/k-nearest-neighbor/">A Complete Guide to K-Nearest-Neighbors with Applications in Python and R</a> </li>
<li><a href="https://scott.fortmann-roe.com/docs/BiasVariance.html">Understanding the Bias-Variance Tradeoff</a> </li>
<li><a href="https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2004WR003444">Statistical downscaling using K-nearest neighbors</a></li>
</ul>
<div class="highlight_box">
<p>
Thomas Martin is an AI/ML Software Engineer at the NSF Unidata Program Center. Have questions?
Contact <a href="mailto:support-ml@unidata.ucar.edu">support-ml@unidata.ucar.edu</a>
or book an office hours meeting with Thomas on his
<a href="https://calendar.app.google/ZsM8dLHLa65eGAr39">Calendar</a>.
</p>
</div>
https://www.unidata.ucar.edu/blogs/news/entry/nsf-unidata-update-february-2024NSF Unidata Update: February 2024Unidata News2024-03-01T08:00:00-07:002024-03-01T08:00:00-07:00<p>
In case you missed it — here's a recap of news from the NSF Unidata Program Center for
the month of February, 2024.
</p>
<p>
In case you missed it — here's a recap of news from the NSF Unidata Program Center for
the month of <a href="https://www.unidata.ucar.edu/blogs/news/date/202402">February, 2024</a>.
</p>
<div>
<p>
Upcoming deadlines to be aware of:
</p>
<table class="simple">
<tr>
<td class="highlight_muted">15 March</td>
<td>Proposals for <a href="https://www.unidata.ucar.edu/blogs/news/entry/call-for-proposals-unidata-2024">2024 NSF Unidata Community Equipment Award grants</a> are due.
</td>
</tr>
<tr>
<td class="highlight_muted">15 March</td>
<td>Applications for the <a href="https://www.unidata.ucar.edu/blogs/news/entry/esip-raskin-scholarship4">ESIP Raskin Scholarship</a> are due.
</td>
</tr>
<tr>
<td class="highlight_muted">3 May</td>
<td>Nominations for the <a href="https://www.unidata.ucar.edu/blogs/news/entry/2024-desouza-award-nominations">2024 Russell L. DeSouza Award</a> are due.
</td>
</tr>
</table>
</div>
<p>
Highlights from last month:
</p>
<h3 class="wrapup"><a href="https://www.unidata.ucar.edu/blogs/news/entry/2024-desouza-award-nominations">2024 DeSouza Award Nominations</a></h3>
<div class="img_l">
<img src="https://www.unidata.ucar.edu/blog_content/images/desouza_students.jpg" alt="Russell DeSouza with students" />
</div>
<p>
The Unidata Users Committee invites you to submit nominations for the Russell L.
DeSouza Award for Outstanding Community Service. This Community Service Award honors
individuals whose energy, expertise, and active involvement enable the Unidata
Program to better serve the geosciences.
</p>
<h3 class="wrapup"><a href="https://www.unidata.ucar.edu/blogs/news/entry/awips-tips-new-raws-data">AWIPS Tips: New RAWS Data</a></h3>
<div class="img_l">
<img src="https://www.unidata.ucar.edu/blog_content/images/logos/awips-tips.png" alt="AWIPS Tips" />
</div>
<p>
This installment of AWIPS-Tips describes on a relatively new dataset that was added
following a request by a community member.<br>
</p>
<h3 class="wrapup"><a href="https://www.unidata.ucar.edu/blogs/news/entry/quick-tips-for-ess-machine">Quick Tips for ESS Machine Learning Projects</a></h3>
<div class="img_l">
<img src="https://www.unidata.ucar.edu/blog_content/images/2024/20240212_ml_generated.png" alt="ne AI’s view of what a geologist does when running ML models (courtesy of Bing)." />
</div>
<p>
There are many little decisions ML practitioners make along the way when starting an
Earth Systems Science (ESS) ML project. This article provides some tips and ideas to
consider as you're getting started.
</p>
<h3 class="wrapup"><a href="https://www.unidata.ucar.edu/blogs/news/entry/esip-raskin-scholarship4">ESIP: Raskin Scholarship</a></h3>
<div class="img_l">
<img src="https://www.unidata.ucar.edu/blog_content/images/logos/ESIP.png" alt="ESIP logo" />
</div>
<p>
The Federation of Earth Science Information Partners (ESIP) is an open networked
community that brings together science, data and information technology
practitioners around Earth science issues. The ESIP Raskin Scholarship
(application deadline March 15) is open to
current graduate students in Earth or computer sciences who has an interest in
community evolution of Earth Science data systems. Preference is given to
applicants who can demonstrate a connection to ESIP-related activities.
</p>
<h3 class="wrapup"><a href="https://www.unidata.ucar.edu/blogs/news/entry/ams-2024-conference-highlights-from">AMS 2024 Conference Highlights from the NSF Unidata Staff</a></h3>
<div class="img_l">
<img src="https://www.unidata.ucar.edu/blog_content/images/logos/ams_2024_logo_small.png" alt="AMS 2024 Annual Meeting logo" />
</div>
<p>
This year's annual American Meteorological Society meeting was held 27 January - 1
February 2024 in Baltimore, MD. Several NSF Unidata staff members were able to
travel to Baltimore to lead workshops, visit with students, present papers and
posters, and otherwise take part in the conference. As always, staff members spent
some time meeting with community members at UCAR's exhibit hall booth. This article
presents some conference highlights from the perspective of NSF Unidata staff.
</p>
<h3 class="wrapup"><a href="https://www.unidata.ucar.edu/blogs/news/entry/awips-tips-awips-20-3">AWIPS 20.3.2-2 Software Release</a></h3>
<div class="img_l">
<img src="https://www.unidata.ucar.edu/images/logos/awips-75x75.png" alt="AWIPS logo" />
</div>
<p>
NSF Unidata AWIPS 20.3.2-2 has been released, incorporating many updates
and fixes from the 20.3.2-1 release. This release includes installers for CAVE
(CentOS7, Windows, VMware Player, and MacOS), and for EDEX (CentOS7).
</p>
<h3 class="wrapup"><a href="https://www.unidata.ucar.edu/blogs/news/entry/netcdf-operators-nco-version-517">NetCDF operators (NCO) version 5.2.1</a></h3>
<div class="img_l">
<img src="https://www.unidata.ucar.edu/images/logos/netcdf-75x75.png" alt="netCDF logo" />
</div>
<p>
Version 5.2.1 of the netCDF Operators (NCO) has been released. NCO is an Open Source
package that consists of a dozen standalone, command-line programs that take netCDF
files as input, then operate (e.g., derive new data, average, print, hyperslab,
manipulate metadata) and output the results to screen or files in text, binary, or
netCDF formats.
</p>
<div id="developers">
<h2>On the Developer's Blog</h2>
<p>
NSF Unidata Program Center developers write regularly on technical topics on the
<a href="https://www.unidata.ucar.edu/blogs/developer/">Unidata Developer's
Blog</a>. The ongoing
<a href="https://www.unidata.ucar.edu/blogs/developer/en/tags/metpymonday">MetPy
Mondays</a> series looks at dealing with missing data.
</p>
</div>
<div id="committees">
<h2>Governing Committee News</h2>
<p>
NSF Unidata's
<a href="https://www.unidata.ucar.edu/committees/usercom/">Users Committee</a>
met November 1-3, 2023 in a joint meeting with the Strategic Advisory Committee
at the Unidata Program Center in Boulder, CO. The Users Committee will meet
next May 13-14, 2024 in Boulder, CO.
</p>
<p>
NSF Unidata's
<a href="https://www.unidata.ucar.edu/committees/stratcom/">Strategic Advisory Committee</a>
met November 1-3, 2023 in a joint meeting with the Users Committee
at the Unidata Program Center in Boulder, CO. The Strategic Advisory Committee
will meet next May 2-3, 2023 in Washington, DC.
</p>
</div>
https://www.unidata.ucar.edu/blogs/news/entry/netcdf-operators-nco-version-517NetCDF operators (NCO) version 5.2.1Unidata News2024-02-21T10:12:41-07:002024-02-21T10:12:41-07:00<p>
Version 5.2.1 of the netCDF Operators (NCO) has been released. NCO is an Open
Source package that consists of a dozen standalone, command-line programs that
take netCDF files as input, then operate (e.g., derive new data, average, print,
hyperslab, manipulate metadata) and output the results to screen or files in text,
binary, or netCDF formats.
</p>
<p>
The NCO project is coordinated by Professor Charlie Zender of the Department of
Earth System Science, University of California, Irvine. More information about the
project, along with binary and source downloads, are available on the SourceForge
<a href="http://nco.sf.net/">project page</a>.
</p>
<p><style>
ol li {
padding-top: 1em;
}
</style></p>
<p>
Version 5.2.1 of the netCDF Operators (NCO) has been released. NCO is an Open
Source package that consists of a dozen standalone, command-line programs that
take netCDF files as input, then operate (e.g., derive new data, average, print,
hyperslab, manipulate metadata) and output the results to screen or files in text,
binary, or netCDF formats.
</p>
<p>
The NCO project is coordinated by Professor Charlie Zender of the Department of
Earth System Science, University of California, Irvine. More information about the
project, along with binary and source downloads, are available on the SourceForge
<a href="http://nco.sf.net/">project page</a>.
</p>
<p>
From the release message:
</p>
<p class="quoteroman">
Version 5.2.1 fixes an issue with <code>ncremap</code> and <code>ncclimo</code> in MPI mode.
Another small fix to enables GCC compilation in pedantic mode.
No new features are implemented, but it was too late to recall 5.2.0.
</p>
<p class="quoteroman">
Version 5.2.0 includes four major new features and various fixes.
The features: 1) All operators append draft CF Convention behavior
for metadata to encode lossy compression. 2) <code>ncclimo</code> timeseries mode
now supports all input methods (including automatic filename
generation) long-supported by <code>climo</code> mode. 3) <code>ncremap</code> Make-Weight-File
(MWF) mode has been revamped and now support specifiable lists of
algorithms. Last but not least, 4) <code>ncks --s1d</code> now converts CLM/ELM
restart files from their native, inscrutable sparse 1-D (S1D) format
to normal-looking gridded files, without loss of information.
</p>
<h5>New Features</h5>
<ol style="list-style-type: upper-alpha;">
<li>
<code>ncks</code> can now help analyze initial condition and restart datasets
produced by the E3SM ELM and CESM CLM/CTSM land-surface models.
Whereas gridded history datasets from these ESMs use a standard
gridded data format, these land-surface "restart files" employ a
custom packing format that unwinds multi-dimensional data into
sparse, 1-D (S1D) arrays that are not easily visualized. <code>ncks</code> can
now convert these S1D files into gridded datasets where all dimensions
are explicitly declared (rather than unrolled or "packed").
Invoke this conversion feature with the <code>--s1d</code> option and point
<code>ncks</code> to a file that contains the horizontal coordinates (which
restart files do not explicitly contain) and the restart file.
The output file is the fully gridded input file, with no loss
of information:
<pre>ncks --s1d --hrz=elmv3_history.nc elmv3_restart.nc out.nc</pre>
The output file contains all input variables placed on a lat-lon or
unstructured grid, with new dimensions for Plant Funtional Type (PFT)
and multiple elevation class (MEC).<br>
<a href="http://nco.sf.net/nco.html#s1d">http://nco.sf.net/nco.html#s1d</a>
</li>
<li>
<code>ncclimo</code> timeseries mode now supports all input methods (including
automatic filename generation) long-supported by <code>climo</code> mode. Previously
<code>ncclimo</code> (in timeseries mode) had to receive explicit lists of input
files, either from stdin or from the command line. Now <code>ncclimo</code> will
automatically generate the input file list for files that adhere to
common CESM/E3SM naming conventions (usually for monthly average
files). The syntax is identical to that long used in <code>climo</code> mode:
<pre>% ncclimo --split -c $caseid -s 2000 -e 2024 -i $drc_in -o $drc_out</pre>
<a href="http://nco.sf.net/nco.html#ncclimo">http://nco.sf.net/nco.html#ncclimo</a>
</li>
<li>
<code>ncremap</code> supports <code>--alg_lst=alg_lst</code>, a comma-separated list of the
algorithms that MWF-mode uses to create map-files. This option can
be used to shorten or alter the default list, which is
<code>'esmfaave,esmfbilin,ncoaave,ncoidw,traave,trbilin,trfv2,trintbilin'</code>.
Each name in the list should be the primary name of an algorithm,
not a synonym. For example, use <code>'esmfaave,traave'</code> not
<code>'aave,fv2fv_flx'</code> (the latter are backward-compatible synonyms
for the former). The algorithm list must be consistent with grid-types
supplied: ESMF algorithms work with meshes in ESMF, SCRIP, or UGRID
formats. NCO algorithms only work with meshes in SCRIP format.
TempestRemap algorithms work with meshes in ESMF, Exodus, SCRIP, or
UGRID formats. On output, <code>ncremap</code> inserts each algorithm name into the
output map-file name in this format: <code>map_src_to_dst_alg.date.nc</code>.
For example,
<pre>
% ncremap -P mwf --alg_lst=esmfnstod,ncoaave,ncoidw,traave,trbilin \
-s ocean.QU.240km.scrip.181106.nc -g ne11pg2.nc --nm_src=QU240 \
--nm_dst=ne11pg2 --dt_sng=20240201
...
% ls map*
map_QU240_to_ne11pg2_esmfnstod.20240201.nc
map_QU240_to_ne11pg2_ncoaave.20240201.nc
map_QU240_to_ne11pg2_ncoidw.20240201.nc
map_QU240_to_ne11pg2_traave.20240201.nc
map_QU240_to_ne11pg2_trbilin.20240201.nc
map_ne11pg2_to_QU240_esmfnstod.20240201.nc
map_ne11pg2_to_QU240_ncoaave.20240201.nc
map_ne11pg2_to_QU240_ncoidw.20240201.nc
map_ne11pg2_to_QU240_traave.20240201.nc
map_ne11pg2_to_QU240_trbilin.20240201.nc
</pre>
<a href="http://nco.sf.net/nco.html#alg_lst">http://nco.sf.net/nco.html#alg_lst</a><br>
<a href="http://nco.sf.net/nco.html#ncremap">http://nco.sf.net/nco.html#ncremap</a>
</li>
<li>
All NCO operators now support the draft CF Convention on encoding
metadata that describes lossy compression applied to the dataset.
See <a href="https://github.com/cf-convention/cf-conventions/issues/403">https://github.com/cf-convention/cf-conventions/issues/403</a>.
For example, all variables quantized by NCO now receive attributes
that contain the level of quantization and that point to a
container variable that describes the algorithm:
<pre>
% ncks -O -7 --cmp='btr|shf|zst' in.nc foo.nc
% ncks -m -v ts foo.nc
char compression_info ;
char compression_info ;
compression_info:family = "quantize" ;
compression_info:algorithm = "BitRound" ;
compression_info:implementation = "libnetcdf version 4.9.3-development" ;
float ts(time,lat,lon) ;
ts:standard_name = "surface_temperature" ;
ts:lossy_compression = "compression_info" ;
ts:lossy_compression_nsb = 9 ;
</pre>
<a href="http://nco.sf.net/nco.html#qnt">http://nco.sf.net/nco.html#qnt</a>
</li>
<li>
<code>ncks</code> supports a new flag, <code>--chk_bnd</code>, that reports whether all
coordinate variables in a file contain associated "bounds" variables.
This check complies with CF Conventions and with NASA's Dataset
Interoperability Working Group (DIWG) recommendations:
<pre>
$ ncks --chk_bnd ~/nco/data/in.nc
ncks: WARNING nco_chk_bnd() reports coordinate Lat does not contain
"bounds" attribute
ncks: WARNING nco_chk_bnd() reports coordinate Lon does not contain
"bounds" attribute
ncks: INFO nco_chk_bnd() reports total number of coordinates without
"bounds" attribute is 2
</pre>
<a href="http://nco.sf.net/nco.htlm/chk_bnd">http://nco.sf.net/nco.htlm/chk_bnd</a>
</li>
<li>
<code>ncremap</code> supports the TempestRemap trfv2 algorithm, a 2nd order FV
reconstruction, that is cell-integrated on the target grid.
<pre>ncremap --alg_typ=trfv2 -s grd_src.nc -g grd_dst.nc --map=map.nc</pre>
<a href="http://nco.sf.net/nco.htlm/trfv2">http://nco.sf.net/nco.htlm/trfv2</a>
</li>
</ol>
<p>
Additional details are available in the
<a href="http://nco.sourceforge.net/ChangeLog">ChangeLog</a>.
</p>
https://www.unidata.ucar.edu/blogs/news/entry/awips-tips-awips-20-3AWIPS Tips: AWIPS 20.3.2-2 Software ReleaseTiffany Meyer2024-02-21T09:00:00-07:002024-02-21T09:33:11-07:00<p>Welcome back to AWIPS Tips!</p>
<p>We are excited to announce our release of 20.3.2-2 that incorporates many updates and fixes from the 20.3.2-1 release. This release includes installers for CAVE (CentOS7, Windows, VMware Player, and MacOS), and for EDEX (CentOS7).</p>
<p>Welcome back to AWIPS Tips! </p>
<p>We are excited to announce our release of 20.3.2-2 that incorporates many updates and fixes from the 20.3.2-1 release. This release includes installers for CAVE (CentOS7, Windows, VMware Player, and MacOS), and for EDEX (CentOS7).</p>
<h2>EDEX Updates</h2>
<ul>
<li>Fix syntax error in pqact.conf.priority and pqact.grids</li>
<li>Fix regex for satellite data so we don't request duplicate data in pqact.conf.priority and pqact.goesr</li>
<li>Replace checkFileTime.pl with checkFileTime.sh which can run without root permissions via cron</li>
<li>Replace ufpy package with NSF Unidata’s python-awips package</li>
<li>Update awips_install.sh installer to disableexcludes from yum.conf</li>
<li>Update derived parameters to include GFS20 to fix Surface Precipitation</li>
<li>Updates to satellite menu:
<ul><li>Update CPSD description file</li>
<li>Add Fog derived products to GOES WCONUS menu</li>
<li>Added "Rocket" to plume menu items</li>
<li>Updated Meso and Full RGB menus to have all valid RGB's</li></ul></li>
<li>Small changes to modes to allow proper compressing of log files</li>
<li>Repair ACARS purging file</li>
</ul>
<h2>CAVE Updates</h2>
<ul>
<li>Remove GFE from Windows and Mac perspective menus (functionality not available on those Operating Systems currently)</li>
<li>Remove Edit Plot Attributes from METARs station plots (functionality also currently removed from NWS AWIPS because it’s not implemented correctly yet)</li>
</ul>
<p> <b>Windows CAVE Installers</b> <br></p>
<ul><li>Updated naming convention of the .exe file to avoid confusion </li></ul>
<p class="highlight_box"><span style="color:#06778F; font-weight: bold;">NOTE: </span>CAVE should be run using the CAVE.bat file to correctly use the packaged python</p>
<hr />
<h2>How to Install EDEX</h2>
<p>Visit our <a href="http://unidata.github.io/awips2/install/install-edex/" target="_blank">EDEX Installation Page</a> and follow the instructions for installing EDEX on a Linux machine.</p>
<h2>How to Install CAVE</h2>
<p>Visit our <a href="http://unidata.github.io/awips2/install/install-cave/" target="_blank">CAVE Installation Page</a> and see options for the Linux, Windows, MacOS or Virtual Machine installation methods.</p>
<hr />
<h2>EDEX Connection</h2>
<p>Select the server in the Connectivity Preferences dialog, or enter <strong>edex-cloud.unidata.ucar.edu</strong>.</p>
<p><img src="https://www.unidata.ucar.edu/blog_content/images/2023/20231213_edex_connection.png"></p>
<hr />
<h2>Functionality/Reporting</h2>
<p>If you come across any deficiencies or have any enhancement requests, please <a href="https://docs.google.com/forms/d/e/1FAIpQLSf6jyZtbh49g-GCBoAQYzTVwAIf_aKz0QOeAr7gDVFhPrjAmw/viewform?usp=sf_link">fill out our short reporting form</a>.</p>
<p><hr />
For notifications of the latest updates from the AWIPS team, sign up for the <a href ="https://www.unidata.ucar.edu/support/index.html#mailinglists" target="_blank">AWIPS mailing list</a>. Questions or suggestions for the team, let us know at <a href="mailto:support-awips@unidata.ucar.edu" target="_blank">support-awips@unidata.ucar.edu</a></p>
<p>Check back in two weeks for the next blog post where we review how to access and plot satellite data in python-awips. </p>
<p><em>To view archived blogs, visit the <a href="https://www.unidata.ucar.edu/blogs/news/tags/awipstips" target="_blank">AWIPS Tips blog tag</a>, and get notified of the latest updates from the AWIPS team by signing up for the <a href ="https://www.unidata.ucar.edu/support/index.html#mailinglists" target="_blank">AWIPS mailing list</a>. Questions or suggestions for the team on future topics? Let us know at <a href="mailto:support-awips@unidata.ucar.edu" target="_blank">support-awips@unidata.ucar.edu</a></em></p>
https://www.unidata.ucar.edu/blogs/news/entry/ams-2024-conference-highlights-fromAMS 2024 Conference Highlights from the NSF Unidata StaffUnidata News2024-02-14T11:34:35-07:002024-02-14T11:34:35-07:00<div class="img_l" style="width: 150px; margin-top: 0;">
<img width="150" style="padding: 0.2em 0 0 0;" src="/blog_content/images/logos/ams_2024_logo_small.png" alt="AMS 2024 Annual Meeting" />
</div>
<p>
This year's annual American Meteorological Society meeting was held 27 January -
1 February 2024 in Baltimore, MD. Several NSF Unidata staff members were able to
travel to Baltimore to lead workshops, visit with students, present papers and
posters, and otherwise take part in the conference. As always, staff members spent
some time meeting with community members at UCAR's exhibit hall booth. The following
are some of the conference highlights from the perspective of NSF Unidata staff.
</p>
<p><style>
div.img_l,
div.img_r {
margin-top: 1em;
margin-bottom: 0;
}
h3 { line-height: 1.1em;}
</style></p>
<div class="img_l" style="width: 150px; margin-top: 0;">
<img width="150" style="padding: 0.2em 0 0 0;" src="/blog_content/images/logos/ams_2024_logo_small.png" alt="AMS 2024 Annual Meeting" />
</div>
<p>
This year's annual American Meteorological Society meeting was held 27 January -
1 February 2024 in Baltimore, MD. Several NSF Unidata staff members were able to
travel to Baltimore to lead workshops, visit with students, present papers and
posters, and otherwise take part in the conference. As always, staff members spent
some time meeting with community members at UCAR's exhibit hall booth. The following
are some of the conference highlights from the perspective of NSF Unidata staff.
</p>
<h3>23rd Student Conference</h3>
<div class="img_r" style="width: 200px;">
<a class="lightbox" title="Talking with students at the AMS 2024 Student Career Fair. (Photo: JT Thielen)" href="/blog_content/images/2024/20240215_ams_01.jpg">
<img width="200" src="/blog_content/images/2024/20240215_ams_01.jpg" alt="AMS 2024 Career Fair" />
</a>
<div class="caption">
Talking with students at the AMS 2024 Student Career Fair. (Click to enlarge.)
</div>
<p></div></p>
<p>
As we try to do at every AMS Annual Meeting, NSF Unidata had a table set up for the
Student Conference Career Fair, held Saturday and Sunday evenings before the main
conference exhibition hall opened. Our table attracted many visitors, with
students interested in data and software available from Unidata as well as Unidata's
Summer Internship program . We were fortunate that one of our 2023 summer interns,
Jessica Souza from Texas Tech University, was able to join Program Center staff at the
table to talk with students and describe her experiences working at NSF Unidata.
</p>
<p>
In addition to the Career Fair, the AMS Student Conference provides opportunities
for undergraduate and graduate students to learn new skills, meet and network with
mentors and colleagues in the weather, water, and climate field, and participate in
workshops to help with their professional development. This year, NSF Unidata led
<strong>two</strong> workshops during the Student Conference — one focused on
AWIPS and one focused on Python and MetPy.
</p>
<h3>Student Conference AWIPS Workshop</h3>
<div class="img_l" style="width: 200px;">
<a class="lightbox" title="NSF Unidata developer Tiffany Meyer presents at the AMS 2024 Student Conference AWIPS workshop. (Photo: Ryan May)" href="/blog_content/images/2024/20240215_ams_03.jpg">
<img width="200" src="/blog_content/images/2024/20240215_ams_03.jpg" alt="AMS 2024 Student Conference AWIPS workshop" />
</a>
<div class="caption">
Student Conference AWIPS workshop.
</div>
<p></div></p>
<p>
On Sunday, January 28<sup>th</sup>, NSF Unidata software engineer Tiffany Meyer
partnered with Victoria Elliott (a graduate student from Texas A&M) to deliver an
in-person workshop on the Advanced Weather Interactive Processing System
(<a href="https://www.unidata.ucar.edu/software/awips2/">AWIPS</a>) for
72 Student Conference attendees. This afternoon session served as a high-level
overview on how AWIPS is structured, how it can be used in the classroom to prepare
for careers as a forecaster with the National Weather Service, and how it can best
benefit students' University programs. The workshop leaders provided demonstrations
of the Common AWIPS Visualization Environment (CAVE) and the python-awips data
access framework. The workshop took advantage of public-facing AWIPS Environmental
Data EXchange (EDEX) servers configured by NSF Unidata staff Julien Chastang and
Ana Espinoza on the NSF Jetstream2 cloud. NSF Unidata staff members Shay Carter and
Nicole Corbin helped plan and design the workshop activities.
</p>
<h3>Student Conference Python Workshop</h3>
<div class="img_r" style="width: 200px;">
<a class="lightbox" title="Max Grover helps a student at the AMS 2024 Student Conference Python workshop. (Photo: JT Thielen)" href="/blog_content/images/2024/20240215_ams_02a.jpg">
<img width="200" src="/blog_content/images/2024/20240215_ams_02a.jpg" alt="AMS 2024 Student Conference Python workshop" />
</a>
<div class="caption">
Student Conference Python workshop.
</div>
<p></div></p>
<p>
Also on Sunday, January 28<sup>th</sup>, NSF Unidata software engineer Drew Camron partnered with Max
Grover (DOE ANL/ARM, and a former NSF Unidata summer intern), Ryan May (NSF Unidata),
Jessica Souza (Texas Tech University, and a former NSF Unidata summer intern), JT Thielen
(Colorado State University, and a former NSF Unidata summer intern),
and Kevin Tyle (University at Albany), to deliver an in-person Python workshop
for 38 Student Conference attendees. This afternoon session served as an introduction to
Exploratory Data Analysis (EDA) in Python using tools like Pandas, xarray, and
<a href="https://www.unidata.ucar.edu/software/metpy/">MetPy</a>, and
concluded with a crash course in accessing HRRR model output on Amazon Web Services S3 cloud stores to
help forecast winds and rain in Baltimore. The workshop was supported by NSF Unidata
staff Julien Chastang and Ana Espinoza through provision of free Jupyter Lab computing
resources to students on the NSF JetStream2 cloud.
</p>
<h3>MetPy Short Course</h3>
<div class="img_l" style="width: 200px;">
<a class="lightbox" title="NSF Unidata developer Drew Camron helps a participant at the AMS 2024 MetPy Short Course. (Photo: JT Thielen)" href="/blog_content/images/2024/20240215_ams_04.jpg">
<img width="200" src="/blog_content/images/2024/20240215_ams_04.jpg" alt="AMS 2024 MetPy Short Course" />
</a>
<div class="caption">
AMS 2024 MetPy Short Course.
</div>
<p></div>
<a class="lightbox" title="Valparaiso University professor Kevin Goebbert presents at the AMS 2024 MetPy Short Course. (Photo: JT Thielen)" href="/blog_content/images/2024/20240215_ams_05.jpg"></a></p>
<p>
In addition to the two student conference workshops, NSF Unidata staff led an AMS
Short Course titled <em>MetPy: Creating Meteorological Python Workflows from
Scratch</em>. For this short course, software engineer Drew Camron partnered with
Dr. Kevin Goebbert (Valparaiso University) and JT Thielen (Colorado State
University) to present a crash course in <a
href="https://www.unidata.ucar.edu/software/metpy/">MetPy</a>. The course helped
attendees develop Python workflows for obtaining, manipulating, and visualizing a
variety of weather data from NSF Unidata <a
href="https://www.unidata.ucar.edu/software/tds/">THREDDS Data Servers</a>
in a realistic and near-real-time way. Twenty-seven people participated in-person,
representing universities (U.S. and international), national centers, forecasting
offices, and industry partners. Like the student conference Python workshop, this
course was supported by NSF Unidata staff Julien Chastang and Ana Espinoza through
provision of free Jupyter Lab computing resources to students on the NSF JetStream2
cloud.
</p>
<h3>Talks and Presentations</h3>
<div class="img_r" style="width: 200px;">
<a class="lightbox" title="2023 NSF Unidata summer intern Jessica Souza from Texas Tech University presented on her internship work. (Photo: Drew Camron)" href="/blog_content/images/2024/20240215_ams_06.jpg">
<img width="200" src="/blog_content/images/2024/20240215_ams_06.jpg" alt="Jessica Souza" />
</a>
<div class="caption">
2023 summer intern Jessica Souza.
</div>
<p></div>
<a class="lightbox" title="Nicole Corbin participated remotely to give a talk about microlearning strategies. (Photo: Drew Camron)" href="/blog_content/images/2024/20240215_ams_07.jpg"></a></p>
<p>
Once the Annual Meeting got fully underway, NSF Unidata staff members participated
in a variety of sessions and made presentations both in-person and remotely. Staff
members Drew Camron, Shay Carter, Julien Chastang, Nicole Corbin, Ana Espinoza,
Ryan May, Tiffany Meyer, and Yuan Ho all gave talks. 2023 summer student intern
Jessica Souza presented at two sessions! You can find online resources from Nicole
Corbin's talk <em>Microlearning Strategies for Data Readiness in the Classroom</em>
on the NSF Unidata eLearning site: see
<a href="https://elearning.unidata.ucar.edu/course/view.php?id=9">Multidimensional Data Structures</a>
and
<a href="https://elearning.unidata.ucar.edu/course/view.php?id=10">Getting Started with Siphon and THREDDS</a>.
A complete list of talks by NSF Unidata folks is included in
this post: <a href="https://www.unidata.ucar.edu/blogs/news/entry/unidata-staff-at-ams-2024">NSF Unidata Staff at AMS 2024 Meeting</a>.
</p>
<h3>Visiting with Community</h3>
<p>
In addition to interacting with community members at talks and poster sessions,
Unidata staff members spent time at the UCAR/NCAR booth in the AMS exhibit hall.
While Unidata's presence at the booth was low-key, we were happy to talk with
students and others who came by to learn about internships, software, and data
access.
</p>
<p>
What do you think of this arrangement? Were you able to find us in the AMS exhibition
space? Did you get to talk with Unidata staff members you wanted to contact? We'd
love to hear your thoughts on how we can best visit with you at AMS 2025! Drop us a
line at <a href="mailto:support@unidata.ucar.edu?subject=AMS%20booth%20thoughts">support@unidata.ucar.edu</a> if
you'd like to weigh in.
</p>
https://www.unidata.ucar.edu/blogs/news/entry/esip-raskin-scholarship4ESIP: Raskin ScholarshipUnidata News2024-02-14T10:27:26-07:002024-02-14T10:27:26-07:00<div class="img_l" style="width: 100px;">
<img width="100" src="/blog_content/images/logos/ESIP.png" alt="ESIP" />
</div>
<p>
The
Federation of Earth Science Information Partners (ESIP)
is an open networked community that brings together science, data and information
technology practitioners around Earth science issues.
</p>
<p>
The Raskin Scholarship is open to current graduate students in Earth or computer
sciences who has an interest in community evolution of Earth Science data systems.
Preference is given to applicants who can demonstrate a connection to ESIP-related
activities.
</p>
<div class="img_l" style="width: 100px;">
<img width="100" src="/blog_content/images/logos/ESIP.png" alt="ESIP" />
</div>
<p>
The
<a href="http://esipfed.org/">Federation of Earth Science Information Partners (ESIP)</a>
is an open networked community that brings together science, data and information
technology practitioners around Earth science issues.
</p>
<p>
The Raskin Scholarship is open to current graduate students in Earth or computer
sciences who has an interest in community evolution of Earth Science data systems.
Preference is given to applicants who can demonstrate a connection to ESIP-related
activities. Special attention will be given to applicants demonstrating an interest
in semantics, GIS, cyberinfrastructure, and computing in the geosciences. The
scholarship seeks to promote collaboration, research support, and exposure for
talented students and early career researchers in the Earth or computer sciences.
The Scholarship, which is awarded annually, provides a $5000 award and travel
support to the ESIP Summer Meeting, where the recipient will have an invited talk
covering their field of interest.
</p>
<p>
Applications are due
<span class="highlight_muted">March 15, 2024</span>. For full details see the
<a href="https://www.esipfed.org/get-involved/student-opportunities/raskin-scholarship">2024
Raskin Scholarship website</a>.
</p>
https://www.unidata.ucar.edu/blogs/news/entry/quick-tips-for-ess-machineQuick Tips for ESS Machine Learning ProjectsUnidata News2024-02-12T09:10:00-07:002024-02-12T09:10:00-07:00<div class="img_l" style="width: 100px;">
<a class="lightbox" title="One AI’s view of what a geologist does when running ML models (courtesy of Bing)." href="/blog_content/images/2024/20240212_ml_generated.png">
<img width="100" src="/blog_content/images/2024/20240212_ml_generated.png" alt="Generated image of geologist" />
</a>
</div>
<p>
Your idea of what's entailed in setting up a supervised Machine Learning (ML)
project as an Earth Systems scientist is probably not as fanciful as what an image
generation algorithm came up with. But there are many little
decisions ML practitioners make along the way when starting an Earth Systems Science
(ESS) ML project. This article provides some tips and ideas to consider as you're
getting started. These tips are not in any particular order, and like all things
related to ML projects they depend on the specific types of data and project goals.
</p>
<p class="byline">
By Thomas Martin, AI/ML Software Engineer
</p>
<div class="img_l" style="width: 150px;">
<a class="lightbox" title="One AI’s view of what a geologist does when running ML models (courtesy of Bing)." href="/blog_content/images/2024/20240212_ml_generated.png">
<img width="150" src="/blog_content/images/2024/20240212_ml_generated.png" alt="Generated image of geologist" />
</a>
<div class="caption">
Generated image of a geologist using ML models<br>(Click to enlarge)
</div>
<p></div></p>
<p>
Your idea of what's entailed in setting up a supervised Machine Learning (ML)
project as an Earth Systems scientist is probably not as fanciful as what an image
generation algorithm came up with (see image at left!) But there are many little
decisions ML practitioners make along the way when starting an Earth Systems Science
(ESS) ML project. This article provides some tips and ideas to consider as you're
getting started. These tips are not in any particular order, and like all things
related to ML projects they depend on the specific types of data and project goals.
(If you have any questions about your particular project, feel free to book a
meeting with me — my contact details are at the end of this article.)
</p>
<h3>Try a Few Models</h3>
<div class="img_r" style="width: 200px;">
<a class="lightbox" title="A high level comparison of different scikit-learn classifiers" href="/blog_content/images/2024/20240212_ml_classifier_comparison_001.png">
<img width="200" src="/blog_content/images/2024/20240212_ml_classifier_comparison_001.png" alt="A high level comparison of different scikit-learn classifiers" />
</a>
<div class="caption">
(Click to enlarge)
</div>
<p></div></p>
<p>
Even if you're sure that you need a deep learning model for your
project, using some ‘shallow’ (sci-kit learn) learning models either as a baseline,
or to aid with interpretation of input features is always recommended. This is one
thing I look for when I review applied ML papers. The two links below
compare different classification and regression models on different datasets.
</p>
<ul>
<li>My own <a href="https://github.com/ThomasMGeo/regressor_compare/blob/main/Regressor_compare.ipynb">scikit-learn regressor comparison</a></li>
<li><a href="https://scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.html">A high level comparison of different scikit-learn classifiers</a></li>
</ul>
<h3>Scale Your Data</h3>
<p>
Most ML models (not all!) require pre-processing and normalization. If you are using
a decision tree type of model, while it's not required it might be a good idea for your
particular dataset and use case. Scikit-Learn has a great suite of pre-processors,
and these are even useful for non-ML use cases. Lately I have been using the
<a href="https://scikit-learn.org/stable/modules/preprocessing.html#non-linear-transformation">quantile
transformer</a> for many of my workflows, but this is very much dataset and
model dependent.
</p>
<h3>Testing, Training, and Validation Datasets</h3>
<p>
While training and testing are crucial, the often-overlooked key to robust analysis
lies in a third, independent validation dataset. This independent set serves as a
critical reality check, ensuring your model generalizes well beyond the training
data and isn't simply overfitting. However, for environmental and geoscience data,
blindly applying random sampling for validation can be a recipe for disaster.
Spatial and temporal correlations inherent in these data can lead to misleading
results if not accounted for. For a deeper dive into best practices for well-based
geoscience data validation, you're welcome to read this paper I wrote as part of my doctoral
work: <a href="https://thesedimentaryrecord.scholasticahq.com/article/36638-digitalization-of-legacy-datasets-and-machine-learning-regression-yields-insights-for-reservoir-property-prediction-and-submarine-fan-evolution-a-sub">Digitalization of Legacy Datasets and Machine Learning Regression Yields
Insights for Reservoir Property Prediction and Submarine-Fan Evolution: A Subsurface
Example From the Lewis Shale, Wyoming</a>
</p>
<h3>Drop Unnecessary Data</h3>
<p>
If after you've done some exploratory data analysis and some training and testing of
various models there are a few input features that do not seem to improve
performance, it’s best practice to remove (or drop) them before doing your final
analysis. Within the scikit-learn ecosystem, you can do this automatically using <a href="https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFE.html">
Recursive Feature Elimination
</a> (RFE) depending on the model. RFE not only simplifies and speeds up model
training, but also identifies the potentially most impactful features, giving you
better insights into your data.
</p>
<h3>Your Performance Metric Matters</h3>
<p>
In a <a href="https://www.unidata.ucar.edu/blogs/news/entry/r-sup-2-sup-downsides">previous post</a>,
I discussed the potential overuse of R2 as a metric for regression
problems. Accuracy for classification problems can also be an issue for unbalanced,
multi-class datasets which are common for ESS. It's worth experimenting with a couple
different performance metrics, and reporting more than one metric! Within the
scikit-learn ecosystem, there are many options besides R2 and accuracy. This is
especially true for ML models that are trying to predict relatively rare events.
(See "<a href="https://scikit-learn.org/stable/modules/model_evaluation.html">Metrics
and scoring: quantifying the quality of predictions</a>")
</p>
<h3>Visualize Your Data</h3>
<p>
Visualizing your data is not just something to do at the end of a project; it's a
critical sanity check throughout any quantitative analysis. Datasets like Anscombe's
quartet and the Datasaurus Dozen have shown how statistics alone do not tell the
whole story of datasets. As a data scientist, I find that visualizing data at every
step of the ML workflow, even if the plots don't make it into the final report,
helps me identify potential issues and refine my workflow. Don't underestimate the
power of a simple visualization. (See the <a href="https://www.unidata.ucar.edu/blogs/news/entry/r-sup-2-sup-downsides">previous post</a>
for some visual examples.)
</p>
<div class="highlight_box">
<p>
Thomas Martin is an AI/ML Software Engineer at the NSF Unidata Program Center. Have questions?
Contact <a href="mailto:support-ml@unidata.ucar.edu">support-ml@unidata.ucar.edu</a>
or book an office hours meeting with Thomas on his
<a href="https://calendar.app.google/ZsM8dLHLa65eGAr39">Calendar</a>.
</p>
</div>