Archive for May, 2017

May 7 0 Automating AWS infrastructure with Terraform

When you start using cloud hosting solutions like Amazon Web Services, Microsoft Azure or Rackspace Cloud, it doesn’t take long to feel overwhelmed by the choice and abundance of features of the platforms. Even worse, the initial setup of your applications or Web sites on a cloud platform can be very cumbersome; it involves a lot of clicking, configuring and discovering how the different parts fit together.

With tools like Terraform, building your infrastructure becomes a whole lot easier and manageable. Terraform basically allows system administrators to sit down and script their whole infrastructure stack, and connect the different parts together, just like assigning a variable in a programming language. Instead, with Terraform, you’re assigning a load balancer’s backend hosts to a list of servers, for example.

In this post I’ll walk you through a configuration example of how to set up a complete load balanced infrastructure with Terraform, and in the end you can download all the files and modify it to your own needs. I’ll also talk a little about where you can go from here if you want to go further with Terraform.

Getting up and running

To start using Terraform, you’ll need to install it. It’s available as a single binary for most platforms, so download the zip file and place it somewhere in your PATH, like /usr/local/bin. Terraform runs completely on the command-line, so you’ll need a little experience executing commands on the terminal.

Variables

A core part of Terraform is the variables file, variables.tf, which is automatically included due to the file name. It’s a place where you can define the hard dependencies for your setup, and in this case we have two:

  1. a path to a SSH public key file,
  2. the name of the AWS region we wish to create our servers in.

Both of these variables have defaults, so Terraform won’t ask you to define them when running the planning step which we’ll get to in a minute.

Create a folder somewhere on your harddrive, create a new file called variables.tf, and add the following:

variable "public_key_path" {
  description = "Enter the path to the SSH Public Key to add to AWS."
  default = "~/.ssh/id_rsa.pub"
}

variable "aws_region" {
  description = "AWS region to launch servers."
  default     = "eu-central-1"
}
variables.tf

Main file

Terraform’s main entrypoint is a file called main.tf, which you’ll need to create. Add the following 3 lines:

provider "aws" {
  region = "${var.aws_region}"
}

This clause defines the provider. Terraform comes bundled with functionality for some providers, like Amazon Web Services which we’re using in this example. One of the things you can configure it with is the default region, and we’re getting that from the variables file we just created. Terraform looks for a variables.tf file and includes it automatically. You can also configure AWS in other ways, like explicitly adding an AWS Access Key and Secret Key, but in this example we’ll add those as environment variables. We’ll also get to those later.

Network

Next we’ll start adding some actual infrastructure, in Terraform parlance that’s called a resource:

resource "aws_vpc" "vpc_main" {
  cidr_block = "10.0.0.0/16"
  
  enable_dns_support = true
  enable_dns_hostnames = true
  
  tags {
    Name = "Main VPC"
  }
}

resource "aws_internet_gateway" "default" {
  vpc_id = "${aws_vpc.vpc_main.id}"
}

resource "aws_route" "internet_access" {
  route_table_id          = "${aws_vpc.vpc_main.main_route_table_id}"
  destination_cidr_block  = "0.0.0.0/0"
  gateway_id              = "${aws_internet_gateway.default.id}"
}

# Create a public subnet to launch our load balancers
resource "aws_subnet" "public" {
  vpc_id                  = "${aws_vpc.vpc_main.id}"
  cidr_block              = "10.0.1.0/24" # 10.0.1.0 - 10.0.1.255 (256)
  map_public_ip_on_launch = true
}

# Create a private subnet to launch our backend instances
resource "aws_subnet" "private" {
  vpc_id                  = "${aws_vpc.vpc_main.id}"
  cidr_block              = "10.0.16.0/20" # 10.0.16.0 - 10.0.31.255 (4096)
  map_public_ip_on_launch = true
}
Network setup

To contain our setup, an AWS Virtual Private Cloud is created and configured with an internal IP range, as well as DNS support and a name. Next to the resource clause is aws_vpc, which is the resource we’re creating. After that is the identifier, vpc_main, which is how we’ll refer to it later.

We’re also creating a gateway, a route and two subnets: one for public internet-facing services like the load balancers, and a private subnet that don’t need incoming network access.

As you can see, different parts are neatly interlinked by referencing them like variables.

Trying it out

At this point, we can start testing our setup. You’ll have two files in a folder, variables.tf and main.tf with the content that was just listed. Now it’s time to actually create it in AWS.

To start, enter your AWS Access Keys as environment variables in the console, simply type the following two lines:

AWS_ACCESS_KEY_ID="AKIA..."
AWS_SECRET_ACCESS_KEY="Your secret key"

Next, we’ll create the Terraform plan file. Terraform will, with your AWS credentials, check out the status of the different resources you’ve defined, like the VPC and the Gateway. Since it’s the first time you’re running it, Terraform will instill everything for creation in the resulting plan file. Just running the plan command won’t touch or create anything in AWS.

terraform plan -o terraform.plan

You’ll see an overview of the resources to be created, and with the -o terraform.plan argument, the plan is saved to a file, ready for execution with apply.

terraform apply terraform.plan

Executing this command will make Terraform start running commands on AWS to create the resources. As they run, you’ll see the results. If there’s any errors, for example you already created a VPC with the same name before, you’ll get an error, and Terraform will stop.

After running apply, you’ll also see a new file in your project folder: terraform.tfstate – a cache file that maps your resources to the actual ones on Amazon. You should commit this file to git if you want to version control your Terraform project.

So now Terraform knows that your resources were created on Amazon. They were created with the AWS API, and the IDs of the different resources are saved in the tfstate file – running terraform plan again will result in nothing – there’s nothing new to create.

If you change your main.tf file, like changing the VPC subnet to 192.168.0.0/24 instead of 10.0.0.0/16, Terraform will figure out the necessary changes to carry out in order to to update the resources. That may result in your resources (and their dependents) being destroyed and re-created.

More resources

Having learnt a little about how Terraform works, let’s go ahead and add some more things to our project.

We’ll add 2 security groups, which we’ll use to limit network access to our servers, and open up for public load balancers using the AWS ELB service.

# A security group for the ELB so it is accessible via the web
resource "aws_security_group" "elb" {
  name        = "sec_group_elb"
  description = "Security group for public facing ELBs"
  vpc_id      = "${aws_vpc.vpc_main.id}"

  # HTTP access from anywhere
  ingress {
    from_port   = 80
    to_port     = 80
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }
  
  # HTTPS access from anywhere
  ingress {
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  # Outbound internet access
  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

# Our default security group to access the instances over SSH and HTTP
resource "aws_security_group" "default" {
  name        = "sec_group_private"
  description = "Security group for backend servers and private ELBs"
  vpc_id      = "${aws_vpc.vpc_main.id}"

  # SSH access from anywhere
  ingress {
    from_port   = 22
    to_port     = 22
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  # HTTP access from the VPC
  ingress {
    from_port   = 80
    to_port     = 80
    protocol    = "tcp"
    cidr_blocks = ["10.0.0.0/16"]
  }
  
  # Allow all from private subnet
  ingress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["${aws_subnet.private.cidr_block}"]
  }

  # Outbound internet access
  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

Our elb security group is only reachable from port 80 and 443, HTTP and HTTPS, while the default one only has public access on port 22, SSH. It also allows access from the whole VPC (including public facing load balancers) on port 80, as well as full access from other servers. Both allow all outgoing traffic.

After the ELBs, we need to define a public key which is placed on the instances we create later. Here, we use the pre-defined variable to specify the path on the local filesystem.

resource "aws_key_pair" "auth" {
  key_name   = "default"
  public_key = "${file(var.public_key_path)}"
}

Modules

You probably thought that there was a lot of duplicate code in those two security groups, and you’re right. To combat that, Terraform provides custom modules, which is basically like including files.

Since we need to configure quite a few things in our EC2 instances, but the things we configure are almost always the same across them, we’ll create a module for our instances. Do do that, create a new folder called instance.

In the instance folder, create 3 new files:

variable "private_key_path" {
  description = "Enter the path to the SSH Private Key to run provisioner."
  default = "~/.ssh/id_rsa"
}

variable "aws_amis" {
  default = {
    eu-central-1 = "ami-060cde69"
  }
}

variable "disk_size" {
  default = 8
}

variable "count" {
  default = 1
}

variable "group_name" {
  description = "Group name becomes the base of the hostname of the instance"
}

variable "aws_region" {
  description = "AWS region to launch servers."
  default     = "eu-central-1"
}

variable "instance_type" {
  description = "AWS region to launch servers."
  default     = "t2.small"
}

variable "subnet_id" {
  description = "ID of the AWS VPC subnet to use"
}

variable "key_pair_id" {
  description = "ID of the keypair to use for SSH"
}

variable "security_group_id" {
  description = "ID of the VPC security group to use for network"
}
instance/variables.tf
resource "aws_instance" "instance" {
  count = "${var.count}"

  instance_type          = "${var.instance_type}"
  ami                    = "${lookup(var.aws_amis, var.aws_region)}"
  key_name               = "${var.key_pair_id}"
  vpc_security_group_ids = ["${var.security_group_id}"]
  subnet_id              = "${var.subnet_id}"
  
  root_block_device {
      volume_size = "${var.disk_size}"
  }
  
  tags {
      Name = "${format("%s%02d", var.group_name, count.index + 1)}" # -> "backend02"
      Group = "${var.group_name}"
  }
  
  lifecycle {
    create_before_destroy = true
  }
  
  # Provisioning
  
  connection {
    user = "ubuntu"
    private_key = "${file(var.private_key_path)}"
  }

  provisioner "remote-exec" {
    inline = [
      "sudo apt-get -y update",
    ]
  }
}
instance/main.tf
# Used for configuring ELBs.
output "instance_ids" {
    value = ["${aws_instance.instance.*.id}"]
}
instance/output.tf

In the variables file, we have a few things worth mentioning:

  • a default path to the private key of the public key – we’ll need the private key for connecting via SSH and launching the provisioner,
  • we define a list of AMIs, or more specifically a map. Here, since we’re only focusing on Amazon’s EU Central 1 region, we’ve only defined an AMI for that region (It’s Ubuntu 16.04 LTS). You need to go browse Amazon’s AMI library if you use another region, or you want to use another operating system,
  • some defaults are defined, like the count of instances, disk size, etc. These can be overwritten when invoking the module,
  • some variables don’t have defaults – weirdly, Terraform doesn’t let you automatically inherit variables, which is why I’ve chosen to place the private key path here. Otherwise I’d have to pass the main Terraform variable to every module.

The output file allows the module to export some properties – you have to explicitly define outputs for everything you want to reference later. The only thing I have to reference is the actual instance IDs (for use in the ELBs), so that’s the only output.

Using the Tags array, we can add some info to our instances. I’m using one of Terraforms built-in functions, format, to generate a friendly hostname based on the group name and a 1-indexed number. Also, the provisioner clause is a little bare. Instead, one would typically reference an Chef or Ansible playbook, or just run some commands to set up your environment and bootstrap your application.

Back in your main Terraform file, main.tf, you can now start referencing your AWS EC2 Instance module:

module "backend_api" {
    source                 = "./instance"
    subnet_id              = "${aws_subnet.private.id}"
    key_pair_id            = "${aws_key_pair.auth.id}"
    security_group_id      = "${aws_security_group.default.id}"
    
    count                  = 2
    group_name             = "api"
}

module "backend_worker" {
    source                 = "./instance"
    subnet_id              = "${aws_subnet.private.id}"
    key_pair_id            = "${aws_key_pair.auth.id}"
    security_group_id      = "${aws_security_group.default.id}"
    
    count                  = 2
    group_name             = "worker"
    instance_type          = "t2.medium"
}

module "frontend" {
    source                 = "./instance"
    subnet_id              = "${aws_subnet.private.id}"
    key_pair_id            = "${aws_key_pair.auth.id}"
    security_group_id      = "${aws_security_group.default.id}"
    
    count                  = 2
    group_name             = "frontend"
}

module "db_mysql" {
    source                 = "./instance"
    subnet_id              = "${aws_subnet.private.id}"
    key_pair_id            = "${aws_key_pair.auth.id}"
    security_group_id      = "${aws_security_group.default.id}"
    
    count                  = 3
    disk_size              = 30
    group_name             = "mysql"
    instance_type          = "t2.medium"
}

Instead of resource, the modules are referenced using the module clause. All modules have to have a source reference, pertaining to the directory of where the module’s main.tf file is located.

Again, since modules can’t automatically inherit or reference parent resources, we’ll have to explicitly pass the subnet, key pair and security groups to the module.

This example consists of 9 instances:

  • 2x backend,
  • 2x backend workers,
  • 2x frontend servers,
  • 3x MySQL servers.

Load balancers

To finish our terraform file, we add the remaining component: load balancers.

# Public Backend ELB
resource "aws_elb" "backend" {
  name = "elb-public-backend"

  subnets         = ["${aws_subnet.public.id}", "${aws_subnet.private.id}"]
  security_groups = ["${aws_security_group.elb.id}"]
  instances       = ["${module.backend_api.instance_ids}"]

  listener {
    instance_port     = 80
    instance_protocol = "http"
    lb_port           = 80
    lb_protocol       = "http"
  }
  
  health_check {
    healthy_threshold   = 2
    unhealthy_threshold = 2
    timeout             = 3
    target              = "HTTP:80/healthcheck.php"
    interval            = 30
  }
}

# Public Frontend ELB
resource "aws_elb" "frontend" {
  name = "elb-public-frontend"

  subnets         = ["${aws_subnet.public.id}", "${aws_subnet.private.id}"]
  security_groups = ["${aws_security_group.elb.id}"]
  instances       = ["${module.frontend.instance_ids}"]

  listener {
    instance_port     = 80
    instance_protocol = "http"
    lb_port           = 80
    lb_protocol       = "http"
  }
  
  health_check {
    healthy_threshold   = 2
    unhealthy_threshold = 2
    timeout             = 3
    target              = "HTTP:80/healthcheck.php"
    interval            = 30
  }
}

# Private ELB for MySQL cluster
resource "aws_elb" "db_mysql" {
  name = "elb-private-galera"

  subnets         = ["${aws_subnet.private.id}"]
  security_groups = ["${aws_security_group.default.id}"]
  instances       = ["${module.db_mysql.instance_ids}"]
  internal        = true

  listener {
    instance_port     = 3306
    instance_protocol = "tcp"
    lb_port           = 3306
    lb_protocol       = "tcp"
  }
  
  health_check {
    healthy_threshold   = 2
    unhealthy_threshold = 2
    timeout             = 3
    target              = "HTTP:9222/" # Galera Clustercheck listens on HTTP/9222
    interval            = 30
  }
}

The load balancers provide the entrypoints for our application. One thing to note here is how the instances are referenced[Footnote 1].

Main output file

To put a cherry on top, we’ll create an output file for our main project, output.tf. Again, due to the filename, Terraform will automatically pick it up.

# Public Load Balancers

output "api_address" {
  value = "${aws_elb.backend.dns_name}"
}

output "frontend_address" {
  value = "${aws_elb.frontend.dns_name}"
}

# Private Load Balancers

output "galera_address" {
  value = "${aws_elb.db_mysql.dns_name}"
}
output.tf

This will display the hostnames of our ELBs in a friendly format after running terraform apply, which is handy for copying into a configuration file or your browser.

You can now run terraform plan again like before, but since you’re using modules, you’ll have to run terraform get first to include them.

Then you can see that it will create the remaining infrastructure when you do terraform apply.

You can clone, fork or download the full project over on Github.

Next steps

Where can you go from here? I have a couple ideas:

  • Move your DNS to Amazon Route53 and automate your DNS entries with the outputs from the ELBs.
  • In addition to Route53, see what other AWS services you can provision using Terraform, like S3 buckets, autoscaling groups, AMIs, IAM groups/policies…
  • Further use modules to simplify your main file, for example by nesting multiple resources in one file. You could, for example, have all your network setup in a single module to make the base main.tf file more concise.
  • Integrate with provisioning software like Ansible, using their EC2 inventory to easily provision new instances.

Footnotes

  1. Yes, the instance IDs are inside a string, which is how all resources or modules are references, even though they technically are arrays and (in my opinion) shouldn’t be encapsulated in a string. But that’s how it is.