Rosetta: A Data Transformation Tool for ASCII Files

Editor's Note: This is part of a series of posts written by Unidata communications intern Larissa Gordon, highlighting new activities and interesting projects undertaken by software developers at the Unidata Program Center.

Rosetta's wizard interface
(Click to enlarge)

Rosetta, one of Unidata's data transformation tools, is helping the scientific community with the standardization of data. Created by Unidata software engineer Sean Arms, Rosetta is strengthening the science community's ability to standardize raw data by providing an easy way to add appropriate metadata to ASCII files, allowing them to save and store the files in either an ASCII format (e.g. .csv) or in Climate and Forecast (CF)-compliant netCDF files. Most recently, Rosetta has helped Millersville University transform weather balloon data collected as part of a nationwide experiment.

Millersville University has been involved in an experiment known as PECAN (Plains Elevated Convection at Night). The experiment involves eight research laboratories and fourteen Universities. They share the common goal of finding the cause of an increase of mesoscale convective storms (MCSes) that occur at night during the summer months.

As a part of the data collection process, Millersville students and faculty set out in the evening from June 1st to July 15th 2015, launching weather baloons at various PISA (PECAN Integrated Sounding Arrays) sites. Attached to these weather balloons was a rawisonde system, which consists of a Vaisala MW41 data acquisition system using a RS41-SGP radiosonde that is all attached to a 200 gram totex weather balloon.

The rawinsonde system creates a profile of the lower to middle atmosphere, measuring parameters such as temperature, wind direction and velocity, relative humidity, location and pressure. It terminates six to eight kilometers above the earth's surface.

Measurements were taken every second during the launch, creating a large amount of data. The raw output from the Vaisala data acquisition system was sent to the Earth Observing Laboratory (EOL) as a text-based data file bundled with a metadata READme text file. At EOL, the text files were formatted for consistency and added to the project archives.

While data stored in text files can be opened and analyzed in a spreadsheet program such as Microsoft Excel or in a simple text editor, there is no way to access the data in a generic, programmatic way because the metadata describing the format of the text files is not easily accessible.

If the data could be stored in a format that allowed the user to access it programmatically, however, the user could request specific times, data variables, and data usage metadata, even if he or she was unfamiliar with the dataset.

With an experiment as large as PECAN there must be an efficient way to request a point source of data. The first step in making point source data easily available is by converting the data into a more useful format such as a netCDF file that is also CF-compliant.

This is where Rosetta comes in. Prior to Rosetta, researchers would have to know how to write a program to transform their text-based data files into netCDF files, and they would have to be familiar with the requirements for making the data CF-compliant.

Now, researches can simply use Rosetta's wizard-based interface to convert their data into CF-compliant netCDF files. This process takes no more effort than it takes to import and properly format a text file in Excel.

Rosetta currently offers users two workflows: conversion of a well known text file format, and conversion of a custom text file format. If the text file type is well known to Rosetta, such as the format into which EOL converted Millersville's data, Rosetta will auto convert the data into a CF-compliant netCDF format.

If the format is not well known, Rosetta will guide users through a step-by-step process of documenting the dataset so that it will result in a CF-compliant netCDF file being generated.

Rosetta also lets the user enter metadata for the specific dataset being transformed; this information is attached to the data set and stored in the netCDF file. Additionally, Rosetta asks how researchers would like to format their data files, for example what they would like their headers to be, or how they would like the numbers to be delineated.

During this wizard-driven process, Rosetta will save the user-provided documentation of the data format and return it to users as a template file. When a user returns to Rosetta with new data in the same format (for example, after a site visit where new data are downloaded from an existing station), Rosetta will accept the template file and re-populate the wizard interface. Rosetta will also allows the user to make any appropriate corrections to the metadata, and will auto convert these changes in the new data in a matter of seconds.

Once a researcher has converted their data into CF-compliant netCDF files and has uploaded it to a data server, it is now accessible to programs such as Python or IDV. These programs in turn aid the efficiency of researchers ability to perform analysis on their data.

Essentially, Rosetta makes standardization of data easy. Rosetta's goal is to put the power of transforming data into the hands of data collectors themselves, which increases the availability and the usability of their data for the scientific community and beyond.

Sean is not only helping Millersville University but also countless universities and researchers both inside and outside of UCAR to make their data more accessible and effective.

Rosetta is still under development. If you have data that would benefit from this transformation tool, please contact Sean Arms at For more information on Rosetta, see the Rosetta page at Unidata or check a demo of Rosetta running on the ACADIS gateway on Unidata's YouTube site.

A beta-test version of Rosetta is currently available for users to try, and can be accessed here.


Post a Comment:
Comments are closed for this entry.
News and information from the Unidata Program Center
News and information from the Unidata Program Center



Developers’ blog

Take a poll!

What if we had an ongoing user poll in here?

Browse By Topic
Browse by Topic
« July 2024