The Big Data Project (BDP) is an initiative undertaken by the National Oceanic and Atmospheric Administration (NOAA) to increase public availability of large volumes of environmental data collected and generated by the agency. As part of the Big Data Project, Unidata is working in collaboration with Amazon Web Services (AWS) on a demonstration project to provide access to a more than twenty years of archived NEXRAD Level II radar data — augmented continuously with new, real-time data — stored in Amazon's Simple Storage Service (S3) environment. In addition to assisting AWS with ingesting new data flowing from the NEXRAD sites, Unidata Program Center staff have set up a THREDDS Data Server in the AWS environment to provide services allowing community access to the stored data.
About the Big Data Project
According to NOAA's BDP web page, “the Big Data Project is an innovative approach to publishing NOAA's vast data resources and positioning them near cost-efficient high performance computing, analytic, and storage services provided by the private sector.” In practice, this means that NOAA is making selected data assets available for five “Infrastructure as a Service” (IaaS) providers to upload to their cloud systems if they choose: Amazon Web Services (AWS), Google, IBM, Microsoft, and the Open Cloud Consortium. NOAA will continue to provide public access to the data via its traditional mechanisms as well.
What Data are Available
The project data collection consists of NEXRAD Level II radar data collected between 1991-2015, stored at NOAA's National Centers for Environmental Information (formerly the National Climatic Data Center). The data set consists of more than 250 TB of compressed data (1 Petabyte uncompressed), approximately half of which was stored on magnetic tape. The complete archive is now available on AWS; transfers to some of the other IaaS providers are still in progress.
In addition to the archive data, new Level II data are being added to the collection in near real time. NEXRAD Level II scans are performed continuously at 160 radar sites in North America. At each radar site, as each “chunk” (100 radial degrees, 1 tilt) of a scan is completed, the data is distributed in via Unidata's Local Data Manager (LDM) technology to subscribing sites. As part of this project, the individual chunks are delivered to AWS and stored temporarily in an S3 bucket, awaiting the remaining chunks that comprise the full 3-dimensional volume scan. Once all of the chunks that make up one scan are determined to be present, the chunks are combined into an aggregate volume dataset and stored permanently in the collection S3 bucket.
Accessing the Data via TDS
Members of Unidata's university community can access the
collection via this THREDDS Data Server:
(To connect using the IDV, substitute
when entering the URL in the Data Chooser.)
We encourage community members to experiment with accessing
the collection via the TDS. Note, however, that because this is
a demonstration project, we cannot guarantee long-term access
to the server. Similarly, because Unidata has limited resources
available for this demonstration, access to this particular
TDS is restricted to connections from
Accessing the Data in the Amazon S3 Environment
For those comfortable with the AWS environment, access to the
collection S3 bucket is unrestricted. If you have an appropriate
client, you can connect to the S3 bucket using this URL:
Inside the S3 bucket, data are stored in the following format described in this document.
Those who can create an AWS EC2 instance in the US East AWS zone can mount the archive S3 bucket directly as described in the Amazon EC2 documentation for S3.
Additionally, those who are interested in the fastest access
to the chunked data before it is aggregated into a 3D volume
scan can connect to this URL:
or mount the temporary S3 bucket directly as described in the Amazon EC2 documentation for S3. Inside the S3 bucket, data are stored in the following format described in this document. Note that the chunked radar data only persists in this S3 bucket for a maximum of 24 hours before being scrubbed.
Unidata community members who run into issues accessing the AWS NEXRAD archive are encouraged to contact Unidata support for assistance. Additional details regarding this AWS Public Data Set, including links to several tutorials on accessing the data, are available in this post on the Amazon Web Services blog.
Access using Python
Unidata developer Ryan May has created a Jupyter (formerly iPython) notebook to demonstrate how to access the THREDDS Data Server (TDS) instance that is serving up archived NEXRAD Level II data hosted on Amazon S3. Check out Using Python to Access NCEI Archived NEXRAD Level 2 Data for details.