Running in the “cloud” is not only a buzzworthy term, but also it can lead to real financial savings for your company and an increase in the happiness of your devops. Like all things, you have to do it correctly.
In this blog post, we will discuss the mistakes that we made when we started, how we’ve improved our process and how Packer helps us with that.
What We Did Wrong
Since the beginning of our company, ThreatSim has been hosted in Amazon Web Services. We weren’t taking full advantage of the full suite of services - we really only used the Elastic Compute Cloud to run virtual servers and the Simple Storage Service to host our static assets.
We liked the scalability and the reliability of those services, but always had the feeling that we weren’t doing things the “Cloud Way”. For instance, we had long running servers and used tools like Capistrano to deploy our applications and Ansible to manage our infrastructure. This “worked” and kept us going for the first few years of our existence.
But something about it didn’t feel right. Somewhere, Werner Vogels was silently judging us. We were ignoring the Elastic in Elastic Compute Cloud.
Put simply, we were in the Amazon Cloud but we acted like we had a data center with physical machines. We were using these EC2 instances as if they were metal servers, which they weren’t.
Other concerns that we had included:
- We were unable to be highly available because we only had one instance of each type of server we ran.
- Updating the system packages required scheduled maintenance windows where we planned the least impactful upgrade path for our system.
- We wasted money by not being able to scale elastically when we saw increased traffic/load. We ran extra servers just to be safe.
- When Amazon had an outage in our availability zone, we went down completely with no real disaster recovery scenario.
- We were slowly but surely getting out of date with the latest versions of Ubuntu. As a security company, that was absolutely untenable.
So, we sat down and discussed how we were going to get highly available, scalable, easily upgradeable and make use of the best that Amazon AWS had to offer. Initially, we wanted to utilize our existing investment in tools like Capistrano and Ansible, but were running into serious constraints with them.
A Ship Arrives
We have been long-term users of Codeship for our continuous integration and build/test pipeline. They offer a great service at a reasonable price with amazing support. We also are avid subscribers to their technical blog.
Fortuitously for us, at the same time we were discussing solutions to our “problems”, they posted a really interested series of articles on immutable infrastructure (which admittedly stood on the shoulders of articles by Chad Fowler amongst others.)
Go over and read their blogs and then come back here. We’ll wait.
After discussing immutable infrustructure internally, we decided that we wanted to take the plunge and move our application to use AWS properly (read: immutably).
Early on, we separated our application by function so we had already done the hardest part conceptually. For example, we had web servers for our different web applications and worker servers for our different job related applications. We had defined roles for our servers and kept all functionality separated.
Since we were moving to immutable components, we needed to have a central place to store state. We made our application layers more rigid so that state stays where it is supposed to. We no longer saved anything outside of logs to our filesystem (and all logs were pushed to a remote log aggregation service).
The main advantage when it comes to state in immutable infrastructure is that it is siloed. The boundaries between layers storing state and the layers that are ephemeral are clearly drawn and no leakage can possibly happen between those layers. There simply is no way to mix state into different components when you can’t expect them to be up and running the next minute. - Codeship
Since we have server types that are somewhat similar (such as multiple types of web servers), we decided to create a base Packer JSON template and build specialized servers from that AMI. Our “base_ami” template starts from the latest Ubuntu Cloud Image and configures users and packages like build-essentials, curl, Ruby, etc. that we need on every instance.
We make heavy use of the shell provisioners in Packer. To make management of our shell scripts easier, we broke the functionality into multiple scripts. Also, we realized after some time that running the largest compute instance for this build sped the process up dramatically and led to overall lower costs. Just because you build on a c4.4xlarge doesn’t mean you can’t run on a smaller instance for your production use.
So that our install_packages.sh looks something like this:
While our install_aws_tools.sh would look like this:
We build that Packer template
packer build --var-file=production.json base_ami.json
When it completes, we are given the id of the built AMI, which we will use in the next stage
After we build that AMI using Packer, we pass the id from the output to the next templates that are more specialized. For example, we have a template that is used as the base for all of our web servers. We want our web servers to have the same version of nginx installed and similar base nginx configurations for security purposes.
We use the AMI id from our base ami in place of the Ubuntu Cloud AMI id.
When we build that, we are given the id for an AMI that has all of our base packages and nginx configured and ready to serve. From this AMI, we will build our custom servers for our web applications.
In future posts, we’ll discuss how we get our servers to set themselves up with configuration details for their specific roles. Thanks for taking the time to read this and we hope it helps.