The following article by Lucas Sterzinger describes a project he did as a senior after taking Dr. Gretchen Mullendore's Fall 2016 Numerical Methods for Meteorologists course at the University of North Dakota. His investigation into the economics of cloud computing arose as part of Dr. Mullendore's 2016 Unidata Community Equipment Award project, which looked at distributed data solutions for the Big Weather Web.
For my capstone research project at the University of North Dakota (UND), I investigated how cloud computing services could be used to run weather models, specifically for small businesses. However, many of the overarching conclusions of the study also have applications for universities and as such, I was invited to write this article.
What exactly is “cloud computing,” and how can it benefit the atmospheric science community? In current language (and this has changed quite a bit over the past few years), cloud computing refers to services that offer quickly-deployable and scalable computing resources for a fraction of the cost of buying and maintaining the server yourself (or… maybe not. More on this later). The goal of this project was to a) evaluate if cloud computing could be a cost effective way to run weather models and b) to create a 100% cloud-based automated workflow for a real-time numerical weather prediction system.
As I soon found out, the best way to approach this project was to do step “b” before step “a”. That is, create the workflow and see how much it costs to run. The first step was to find a good hosting provider for the servers. At the time, I looked at Amazon Web Services (AWS), Microsoft Azure, and Google Cloud. I decided on Amazon Web Services after applying for, and receiving, a $1,000 resource allocation grant through their AWS Cloud Credits for Research program. AWS is a very popular cloud hosting provider currently, with many scientific organizations using the service. (See, for example, these NOAA and NASA datasets hosted in AWS).
Once I chose AWS as a provider, I needed to compare it to something that was currently running a weather model for forecasting purposes. As it turns out, the University of North Dakota Department of Atmospheric Sciences has a high performance compute server called WOPR (named for the supercomputer from the classic movie WarGames) that’s used for running the Weather Research and Forecasting (WRF) model for research purposes, as well as giving up-to-date forecast charts for the North Dakota Atmospheric Resource Board (NDARB). The hardware specifications for UND's WOPR are given in Table 1.
So my goal at this point was to, as best I could, replicate what WOPR was being used for in AWS. Unfortunately, at the time of this project, AWS did not have a 24 core instance (this could change); the closest comparable instances were the m4.4xlarge (16 cores and 64 GB RAM) and m4.10xlarge (40 cores and 160 GB RAM). Since the 16 core option was closer to WOPR’s 24 than 40 is (and because it’s cheaper and my resources were limited), I decided to use that instance.
Next it was time to set up a real-time workflow. This was the most complicated part of the project. It’s not that working with cloud providers like AWS is fundamentally more difficult than with physical hardware, but having to navigate their ecosystem can be a challenge. Learning the different types of compute and storage resources available and which ones are better suited to certain tasks consumed a large portion of my time. Amazon provides terrific documentation on all of their products, but it can be extremely frustrating to navigate. I ended up resorting to searching for my questions on Google and clicking on whatever AWS support site showed up, rather than try and find the correct article through their main support site. There wasn’t any need for me to contact their support directly, but I have heard very good things about their support system.
I set up a m4.4xlarge instance on AWS, which has 16 processor cores and 60 GB of RAM running Ubuntu Server 16.04. I installed WRF and its dependencies, and wrote scripts that would pull the latest NAM data (downloaded via the NOMADS FTP server) for initial and boundary conditions, run the preprocessor, run the simulation, and visualize the data. Once these scripts were complete, I created a job with cron to run them at fixed times throughout the day. The images and output data would be pushed to Amazon’s Simple Storage Service (S3), which is a low-cost storage bucket run through AWS that can be managed through command line arguments. In all, there was not much to change in the scripts between running on WOPR or AWS, apart from having to work around a difference in operating system and drive location. Scripts used for the AWS portion of this project can be found in this GitHub repository.
I also created a t2.micro instance on AWS (the cheapest one available) to act as a webserver. A t2.micro instance is covered under the AWS free tier, which is included for the first year of any account. The t2.micro instance hosted a website to show the latest generated images and served as as controller for the larger, more expensive server. Why did I need a controller? It’s simple: Amazon only bills for compute servers that are actually online; servers that are shut down are billed for storage only. So by shutting down the compute server when it was running idle, I was able to drastically decrease the amount billed to my account. The t2.micro server was not included in the price comparison below because a) most users would have at least a desktop computer that’s online 24/7 that could run the AWS API commands and b) most users would have some sort of web hosting set up already (the t2.micro instance is probably too small to handle any sort of real traffic in any case).
At the time of this project, the m4.4xlarge instance cost $0.796/hour to run. Assuming the model run took 3 hours, Table 2 shows the cost of running the instance for 2, 4, or 8 runs/day. Note that 8 runs/day means the server is running 24/7.
In order to get the webserver to be able to control the computer server, I used the AWS Command Line Interface (CLI). This allows any function that can be accomplished in the AWS console to be done by the command line, which can in turn be scripted. By setting up a simple cron job, I was able to command the compute instance to power on 5 minutes before the model was supposed to run. Then, after the simulation was complete, the server would shut down until the next call to power on.
But what does this all cost? After all, that was the point of this project. As it turns out, the answer is complicated. To be able to make a good comparison, a few assumptions must be made: First, I’m going to assume that a physical server like WOPR would be replaced approximately once every three years. Since it cost approximately $15,000 to purchase, the yearly cost is roughly $5,000. Second, AWS has different storage options available, mainly Elastic Block Storage (EBS), which consists of SSDs mounted to the server filesystem and the previously mentioned S3, which allows for very rapid upload/download functions but can’t be mounted to the filesystem. WOPR has 14 TB of storage, but most of that doesn't need to be accessed frequently, so my price comparison assumes that 1 TB will be used on AWS EBS SSD drives, with the other 13 TB using the less expensive S3 for data storage. I am also assuming that the AWS server will be running for four 3-hour runs per day. The price comparison is as follows:
|AWS Price/Month||AWS Price/Year||WOPR/Year|
(prorated, includes all hardware costs)
|1 TB EBS||$100||$1,200.00|
|13 TB S3||$299||$3,588.00|
Now while it seems like the physical WOPR server is clearly the better choice over the more expensive AWS option, there are several very important caveats to consider. First of all, you don’t need a dedicated IT team to manage an AWS instance. While this doesn’t matter much for large universities who have the infrastructure and manpower to manage such resources, it does matter to small businesses (the original subject of this project) or small universities and colleges with limited IT resources. Second, with a physical server, you also have to pay for repairs when things inevitably break. Hard drives fail, computers overheat, and all that costs money to replace. Finally, educational institutions might have to factor in different overhead costs for purchased hardware and cloud computing services.
In AWS, not only is everything fully managed, but multiple layers of redundancy are built in to the system that would be extremely expensive to implement for a University. On AWS, prices are constantly dropping as more powerful resources become more affordable. If the price you’re willing to pay each month is the same, you essentially receive “free” upgrades. In addition, if you’re careful with uptime and use the best resources for the job, costs can be cut dramatically.
However, the biggest downside with AWS is that it operates on “shared” resources. In essence, Amazon rents out more resources than it can actually supply, banking that some servers will be running idle and can relinquish some of their resources at any given moment. If suddenly every server on the shared machine begins running at 100%, everyone on that machine will be slowed down. If 100% guaranteed resources are needed, Amazon sells guaranteed instances — but they are much more expensive.
Looping back to the original question of this project: Is hosting servers in the cloud a reasonable alternative to buying them physically? Maybe. It really depends on what you’re planning to do and what resources you have on hand already. I would highly recommend applying for a research grant on AWS, or opening an AWS educator account that gives some free credits to educators and students. Not all projects can benefit from cloud-hosted servers, but as the cost decreases and the performance increases, it will become a more viable option.
Lucas Sterzinger is a first-year doctoral student in the Atmospheric Science Graduate Group at the University of California, Davis. He received his Bachelor of Science in Atmospheric Sciences from the University of North Dakota in 2017. He can be reached at firstname.lastname@example.org.
> However, the biggest downside with AWS is that it operates on “shared” resources. In essence, Amazon rents out more resources than it can actually supply, banking that some servers will be running idle and can relinquish some of their resources at any given moment.
This is not my understanding of shared instances on AWS or other cloud computing providers. "Shared" refers to multiple tenancy on a single piece of hardware, not over-allocation. Dedicated instances are intended for compliance and regulatory purposes, e.g., if you are working sensitive information, and don't trust the compartmentalization provided by virtualization. This is unlikely to be the case for weather forecasting.
You should also consider sustained/committed use discounts and pre-emptible instance pricing. For example, if you are OK with occasion pre-emptions (e.g., if your job uses check-pointing), Google Cloud charges about 1/3 the price for pre-emptible vs. standard VMs.
Disclaimer: I work for Google (not on Cloud).
Posted by Stephan Hoyer on November 07, 2017 at 08:16 PM MST #