It’s a snowy day in Denver. I can almost feel the light pitter, patter of the snow on my head. A steaming hot cup of coffee would be nice. How about Coffee as a service, like a barista service! Imagine you send a piece of JSON in, and out comes a freshly brewed “Double Ristretto Venti Half-Soy Nonfat Decaf Organic Chocolate Brownie Iced Vanilla Double-Shot Gingerbread Frappuccino Extra Hot With Foam Whipped” or my regular, a latte. Infrastructure as Code works in a way similar to this. Reimagine the barista service, but this time you put your configuration into a simple black box, and out you get an array of connected and configured servers ready for you!

Amazon Web Services (AWS) provides infrastructure that can be controlled via an API. On AWS, everything can be controlled via an API. You interact with AWS by making calls to the REST API using HTTPS protocol, can start a server with a single API call, create storage, or start a Hadoop cluster over the API. Calling the API directly using plain HTTPS requests is inconvenient, and is more convenient by using command-line interface(CLI) or SDKs.

Infrastructure as Code (IaC) describes the idea of using a high-level programming language to control IT systems. If our infrastructure can be treated as code, we can apply the same techniques to infrastructure code that we do to our application code. Infrastructure as Code approaches have become increasingly widespread with the adoption of cloud computing and Infrastructure as a Service (IaaS). IaC supports IaaS, but should not be confused with it.

Suppose we want to create an infrastructure that consists of :

  • Load balancer (LB)
  • Virtual servers
  • Database (DB)
  • DNS entry
  • Content delivery network (CDN)
  • Bucket for static files

Here is a JSON configuration to describe the infrastructure :

{
  "region": "us-east-1",

  "resources": [{
    "type": "loadbalancer",
    "id": "LB",
    "config": {
      "server": {
        "cpu": 2,
        "ram": 4,
        "os": "ubuntu",
        "waitFor": "$DB"
      },
      "servers": 2
    }
  }, {
    "type": "cdn",
    "id": "CDN",
    "config": {
      "defaultSource": "$LB",
      "sources": [{
        "path": "/static/\*",
        "source": "$BUCKET"
      }]
    }
  }, {
    "type": "database",
    "id": "DB",
    "config": {
      "password": "\*\*\*",
      "engine": "MySQL"
    }
  }, {
    "type": "dns",
    "config": {
      "from": "www.mydomain.com",
      "to": "$CDN"
    }
  }, {
    "type": "bucket",
    "id": "BUCKET"
}] }

The question is how can this JSON looking file can be turned into AWS API calls?

  • Parse the JSON input.
  • create a dependency graph.
  • use commands to traverse through the dependency graph.
  • the commands can the be translated into AWS API calls.

The dependency graph is turned into a linear flow of commands using pseudo language. The pseudo language represents the steps that are needed to create all the resources in the correct order. The nodes at the bottom have no dependencies and are therefore easy to create.

Here is a part of the shell script generated by walking the dependency graph :

#!/bin/bash -e
AMIID=$(aws ec2 describe-images --filters "Name=description, \
Values=Amazon Linux AMI 2015.03.? x86_64 HVM GP2" \
--query "Images[0].ImageId" --output text)
VPCID=$(aws ec2 describe-vpcs --filter "Name=isDefault, Values=true" \
--query "Vpcs[0].VpcId" --output text)
SUBNETID=$(aws ec2 describe-subnets --filters "Name=vpc-id, \
Values=$VPCID" --query "Subnets[0].SubnetId" --output text)
SGID=$(aws ec2 create-security-group --group-name mysecuritygroup \
--description "My security group" --vpc-id $VPCID --output text)
aws ec2 authorize-security-group-ingress --group-id $SGID \
--protocol tcp --port 22 --cidr 0.0.0.0/0
INSTANCEID=$(aws ec2 run-instances --image-id $AMIID --key-name mykey \
--instance-type t2.micro --security-group-ids $SGID \
--subnet-id $SUBNETID --query "Instances[0].InstanceId" --output text)
echo "waiting for $INSTANCEID ..."
aws ec2 wait instance-running --instance-ids $INSTANCEID
PUBLICNAME=$(aws ec2 describe-instances --instance-ids $INSTANCEID \
--query "Reservations[0].Instances[0].PublicDnsName" --output text)
...

This generated shell script can be used to make AWS API calls to create our infrastructure. Automation of infrastructure also enhances the automation of deployment pipeline. Script gives the most accurate documentation and can be reproduced again and again.

So, are we done with code for the infrastructure automation? Conceptually, we are but this approach of manually coding every infrastructure detail is laden with issues. For instance :

  • The script above is lengthly, complex and still doesn’t have any exception handling.
  • It doesn’t take advantage of concurrent processing. For example, it could start creating Bucket and CDN while provisioning the DB.
  • Everything is hard coded, so to change the number of instances, we will have to understand the entire script.
  • This code is not modular and doesn’t render well for testing :smile: Yes, we need to test that too.
  • There is no support for incremental infrastructure change

and the list goes on, but we are not out of luck here!

There are many products out there that take the right half of the picture above and makes it available as a service. Chef, Puppet, SaltStack, Ansible, relatively modern Terraform and CloudFormation to name a few. Your milage might vary according to your needs. Care to learn more about how we do it? How We Use Packer In Amazon’s Cloud.