News@UnidataUnidata newshttps://www.unidata.ucar.edu/blogs/news/feed/entries/atom2024-03-06T11:18:50-07:00Apache Rollerhttps://www.unidata.ucar.edu/blogs/news/entry/rosetta-a-data-tranformation-toolRosetta: A Data Transformation Tool for ASCII FilesUnidata News2016-05-09T14:39:59-06:002017-12-21T12:37:55-07:00<div class="img_l" style="width: 200px;">
<img width="200" src="/blog_content/images/2016/20160506_rosetta_coord_vars.png" alt="Rosetta"/> </a>
<div class="caption">
Rosetta's wizard interface
</div>
<p></div></p>
<p class="byline">
By Larissa Gordon
</p>
<p>
Rosetta, one of Unidata's data transformation tools, is
helping the scientific community with the standardization of
data. Created by Unidata software engineer Sean Arms,
Rosetta is strengthening the science community's ability to
standardize raw data by providing an easy way to add
appropriate metadata to ASCII files, allowing them to save
and store the files in either an ASCII format (e.g. .csv) or
in Climate and Forecast (CF)-compliant netCDF files. Most
recently, Rosetta has helped Millersville University
transform weather balloon data collected as part of a
nationwide experiment.
</p>
<p>
Millersville University has been involved in an
experiment known as PECAN (Plains Elevated Convection at
Night). The experiment involves eight research laboratories
and fourteen Universities. They share the common goal of
finding the cause of an increase of mesoscale convective
storms (MCSes) that occur at night during the summer
months.
</p>
<p><link rel="stylesheet" type="text/css" href="/css/jquery/jquery.lightbox-0.5.css" media="screen" /></p>
<script type="text/javascript" src="/js/jquery/jquery.lightbox-0.5.min.js"></script>
<script type="text/javascript">
$(document).ready(function() {
$('a.lightbox').lightBox();
});
</script>
<!-- End Lightbox stuff -->
<p style="font-style: italic;">
Editor's Note: This is part of a series of posts written by
Unidata communications intern Larissa Gordon, highlighting
new activities and interesting projects undertaken by software
developers at the Unidata Program Center.
</p>
<div class="img_l" style="width: 200px;">
<a class="lightbox" title="Rosetta's wizard interface lets the user specify metadata relating to coordinate variables in the data file." href="/blog_content/images/2016/20160506_rosetta_coord_vars.png"> <img width="200" src="/blog_content/images/2016/20160506_rosetta_coord_vars.png" alt="Rosetta"/> </a>
<div class="caption">
Rosetta's wizard interface
<br/>
(Click to enlarge)
</div>
<p></div>
<a class="lightbox" title="Rosetta also supports adding metadata for other variable types." href="/blog_content/images/2016/20160506_rosetta_data_vars.png"></a></p>
<p class="byline">
By Larissa Gordon
</p>
<p>
Rosetta, one of Unidata's data transformation tools, is
helping the scientific community with the standardization of
data. Created by Unidata software engineer Sean Arms,
Rosetta is strengthening the science community's ability to
standardize raw data by providing an easy way to add
appropriate metadata to ASCII files, allowing them to save
and store the files in either an ASCII format (e.g. .csv) or
in Climate and Forecast (CF)-compliant netCDF files. Most
recently, Rosetta has helped Millersville University
transform weather balloon data collected as part of a
nationwide experiment.
</p>
<p>
Millersville University has been involved in an
experiment known as PECAN (Plains Elevated Convection at
Night). The experiment involves eight research laboratories
and fourteen Universities. They share the common goal of
finding the cause of an increase of mesoscale convective
storms (MCSes) that occur at night during the summer
months.
</p>
<p>
As a part of the data collection process,
Millersville students and faculty set out in the evening
from June 1st to July 15th 2015, launching weather
baloons at
various PISA (PECAN Integrated Sounding Arrays) sites.
Attached to these weather balloons was a rawisonde system,
which consists of a Vaisala MW41 data acquisition system
using a RS41-SGP radiosonde that is all attached to a 200
gram totex weather balloon.
</p>
<p>
The rawinsonde system
creates a profile of the lower to middle atmosphere,
measuring parameters such as temperature, wind direction and
velocity, relative humidity, location and pressure. It
terminates six to eight kilometers above the earth's
surface.
</p>
<p>
Measurements were taken every second during
the launch, creating a large amount of data. The raw output
from the Vaisala data acquisition system was
sent to the Earth Observing Laboratory (EOL) as a text-based
data file bundled with a metadata READme text file. At
EOL, the text files were formatted for consistency
and added to the project archives.
</p>
<p>
While data stored in text files can be opened and
analyzed in a spreadsheet program such as Microsoft Excel
or in a simple text editor, there is no way to
access the data in a generic, programmatic way because
the metadata describing the format of the text files
is not easily accessible.
</p>
<p>
If the data could be stored in a format that allowed the
user to access it programmatically, however, the
user could request specific times, data variables, and data
usage metadata, even if he or she was unfamiliar with the dataset.
</p>
<p>
With
an experiment as large as PECAN there must be an efficient
way to request a point source of data. The first step in
making point source data easily available is by converting
the data into a more useful format such as a netCDF file that
is also CF-compliant.
</p>
<p>
This is where Rosetta comes in. Prior to Rosetta, researchers would
have to know how to write a program to transform their text-based
data files into netCDF files, and they would
have to be familiar with the requirements for making the data
CF-compliant.
</p>
<p>
Now, researches can simply use
Rosetta's wizard-based interface to convert their data
into CF-compliant netCDF files. This
process takes no more effort than it takes to import and
properly format a text file in Excel.
</p>
<p>
Rosetta currently offers users two workflows: conversion
of a well known text file format, and conversion of a custom
text file format. If the text file type is well known to
Rosetta, such as the format into which EOL converted Millersville's
data, Rosetta will auto convert the data into a CF-compliant netCDF format.
</p>
<p>
If the format is not well
known, Rosetta will guide users through a step-by-step
process of documenting the dataset so that it will result in
a CF-compliant netCDF file being generated.
</p>
<p>
Rosetta also lets the user enter metadata for the specific dataset
being transformed; this information is attached to the data set
and stored in the netCDF file. Additionally, Rosetta asks
how researchers would like to format their data files, for
example what they would like their headers to be, or how they
would like the numbers to be delineated.
</p>
<p>
During this wizard-driven process, Rosetta will save the user-provided
documentation of the data format and return it to users as a
template file.
When a user returns to Rosetta with new data in the same
format (for example, after a site visit where new data are
downloaded from an existing station), Rosetta will accept
the template file and re-populate the wizard interface.
Rosetta will also allows the user to make any appropriate
corrections to the metadata, and will auto convert these
changes in the new data in a matter of seconds.
</p>
<p>
Once a researcher has converted
their data into CF-compliant netCDF files and has uploaded
it to a data server, it is now accessible to programs such
as Python or IDV. These programs in turn aid the efficiency
of researchers ability to perform analysis on their data.
</p>
<p>
Essentially, Rosetta makes standardization of data
easy. Rosetta's goal is to put the power of transforming
data into the hands of data collectors themselves, which increases the
availability and the usability of their data for the
scientific community and beyond.
</p>
<p>
Sean is not only
helping Millersville University but also countless
universities and researchers both inside and outside of UCAR
to make their data more accessible and effective.
</p>
<p>
Rosetta is still under development. If you have data that would benefit from
this transformation tool, please contact Sean Arms at
<a href="mailto:sarms@ucar.edu">sarms@ucar.edu</a>. For more information on Rosetta,
see the <a href="https://www.unidata.ucar.edu/software/rosetta/">Rosetta page</a> at
Unidata or check a demo of <a href="https://www.youtube.com/watch?v=G5pLIjjnK00">Rosetta running on the ACADIS gateway</a> on
Unidata's YouTube site.
</p>
<p>
A beta-test version of Rosetta is currently
available for users to try, and can be accessed <a href="http://rosetta.unidata.ucar.edu">here</a>.
</p>