Browse the Repo

file-type-icon.circleci
file-type-iconexamples
file-type-iconmodules
file-type-iconasg-rolling-deploy
file-type-iconserver-group
file-type-iconrolling-deploy
file-type-iconREADME.md
file-type-iconmain.tf
file-type-iconoutputs.tf
file-type-iconvars.tf
file-type-icontest
file-type-icon.gitignore
file-type-icon.pre-commit-config.yaml
file-type-iconCODEOWNERS
file-type-iconCONTRIBUTING.md
file-type-iconLICENSE.txt
file-type-iconREADME.md
file-type-iconterraform-cloud-enterprise-private-module-...

Browse the Repo

file-type-icon.circleci
file-type-iconexamples
file-type-iconmodules
file-type-iconasg-rolling-deploy
file-type-iconserver-group
file-type-iconrolling-deploy
file-type-iconREADME.md
file-type-iconmain.tf
file-type-iconoutputs.tf
file-type-iconvars.tf
file-type-icontest
file-type-icon.gitignore
file-type-icon.pre-commit-config.yaml
file-type-iconCODEOWNERS
file-type-iconCONTRIBUTING.md
file-type-iconLICENSE.txt
file-type-iconREADME.md
file-type-iconterraform-cloud-enterprise-private-module-...
Auto Scaling Group (stateful)

Auto Scaling Group (stateful)

Run an Auto Scaling Group for stateful apps. Supports zero-downtime, rolling deployment, auto healing, IAM Roles, EBS Volumes, and ENIs.

Code Preview

Preview the Code

mobile file icon

README.md

down

Server Group Module

This module allows you to run a fixed-size cluster of servers that can:

  1. Attach EBS Volumes to each server.
  2. Attach Elastic Network Interfaces (ENIs) to each server.
  3. Do a zero-downtime, rolling deployment, where each server is shut down, the EBS Volume and/or ENI detached, and new server is brought up that reattaches the EBS Volume and/or ENI.
  4. Integrate with an Application Load Balancer (ALB) or Elastic Load Balancer (ELB) for routing and health checks.
  5. Automatically replace failed servers.

The main use case for this module is to run data stores such as MongoDB and ZooKeeper. See the Background section to understand how this module works and in what use cases you should use it instead of an Auto Scaling Group (ASG).

Quick start

Check out the server-group examples for sample code that demonstrates how to use this module.

How do you use this module?

To use this module, you need to do the following:

  1. Add the module to your Terraform code
  2. Optionally create ENIs and EBS Volumes for each server
  3. If you created ENIs, optionally create DNS records
  4. Optionally integrate a load balancer for health checks
  5. Attach an ENI and EBS Volume during boot

Add the module to your Terraform code

As with all Terraform modules, you include this one in your code using the module keyword and pointing the source URL at this repo:

module "servers" {
  source = "git::git@github.com:gruntwork-io/terraform-aws-asg.git//modules/server-group?ref=v0.3.1"

  name          = "my-server-group"
  size          = 3
  instance_type = "t3.micro"
  ami_id        = "ami-abcd1234"

  aws_region = "us-east-1"
  vpc_id     = "vpc-abcd12345"
  subnet_ids = ["subnet-abcd1111", "subnet-abcd2222", "subnet-abcd3333"]
}

The code above will spin up 3 t3.micro servers in the specified VPC and subnets. Any server that fails EC2 status checks will be automatically replaced. Any time you update any of the parameters and run terraform apply, it will kick off a zero-downtime rolling deployment (see How does rolling deployment work? for details).

Optionally integrate a load balancer for health checks

By default, the server-group module uses EC2 status checks to determine server health. This is used both during a rolling deployment (i.e., only replace the next server when the previous server is healthy) and for auto-recovery (i.e., replace any server that has failed). While EC2 status checks are good enough to detect when the EC2 Instance has completely died or is malfunctioning, they do NOT determine if the code running on that EC2 Instance is actually working (e.g., is your database or application actually running and capable of serving traffic).

Therefore, we strongly recommend associating a load balancer with your server-group. The load balancer can perform health checks on your application code by actually making HTTP or TCP requests to the application, which is a far more robust way to tell if the server is healthy.

Here is how to associate an ELB with your server-group and use it for health checks:

module "servers" {
  source = "git::git@github.com:gruntwork-io/terraform-aws-asg.git//modules/server-group?ref=v0.3.1"
  
  # (other params omitted)
    
  health_check_type = "ELB"
  elb_names         = ["${aws_elb.my_elb.name}"]
}

And here is how to associate an ALB with your server-group module and use it for health checks:

module "servers" {
  source = "git::git@github.com:gruntwork-io/terraform-aws-asg.git//modules/server-group?ref=v0.3.1"
  
  # (other params omitted)
    
  health_check_type     = "ELB"
  alb_target_group_arns = ["${aws_alb_target_group.my_target_group.arn}"]
}

Note: The health_check_type value above is not a typo, it should be ELB in both cases.

Once you've associated a load balancer with your server-group, new servers will automatically register with the load balancer while deploying, deregister while undeploying, and use the load balancer's health checks to determine when a server is healthy or needs replacing.

Optionally create ENIs and EBS Volumes for each server

By default, the server-group module does not create any ENIs or EBS Volumes. If you would like to create ENIs, set the num_enis parameter to the number of ENIs you want per server:

module "servers" {
  source = "git::git@github.com:gruntwork-io/terraform-aws-asg.git//modules/server-group?ref=v0.3.1"
  
  # (other params omitted)
    
  num_enis = 1
}

If you would like to create EBS Volumes, set the ebs_volumes parameter to a list of volumes to create for each server:

module "servers" {
  source = "git::git@github.com:gruntwork-io/terraform-aws-asg.git//modules/server-group?ref=v0.3.1"
  
  # (other params omitted)
    
  ebs_volumes = [{
    type      = "gp2"
    size      = 100
    encrypted = false
  },{
    type      = "standard"
    size      = 500
    encrypted = true
  },{
    type      = "io1"
    size      = 500
    iops      = 2000
    encrypted = true  
  }]
}

Note: When using an io1 disk type, the iops parameter must be specified.

Each ENI and server pair will get a matching eni-xxx tag (e.g., eni-0, eni-1, etc). Each EBS Volume and server pair will get a matching ebs-volume-xxx tag (e.g., ebs-volume-0, ebs-volume-1, etc). You will need to attach these ENIs and Volumes while your server is booting, as described in the next section.

If you created ENIs, optionally create DNS records

You may wish to have a DNS record associated with each ENI. This has the special advantage that even if a server is replaced, the new server will attach the existing ENI and retain the same IP address! This means that the DNS record will be permanently valid as long the Server Group size does not shrink.

If you would like to create DNS records, set the route53_hosted_zone_id parameter to the Route53 Hosted Zone where DNS records should be created and the dns_name_common_portion parameter to the common portion of the DNS name to be shared by each server in the Server Group. For example:

module "servers" {
  source = "git::git@github.com:gruntwork-io/terraform-aws-asg.git//modules/server-group?ref=v0.3.1"
  
  # (other params omitted)
    
  size = 3
  route53_hosted_zone_id = "<obtain-this-from-another-terraform-module>"
  dns_name = "kafka.internal"
}

will create the following DNS records that point to each ENI:

0.kafka.internal
1.kafka.internal
2.kafka.internal

Attach an ENI and EBS Volume during boot

While the server-group module can create ENIs and EBS Volumes for you, you have to attach them to your servers yourself. The easiest way to do that is to use the following modules from terraform-aws-server:

Here's how it works:

  1. Install the attach-eni and/or persistent-ebs-volume modules in the AMI that gets deployed in your server-group. The easiest way to do this is to use the Gruntwork Installer in a Packer template:

    gruntwork-install --module-name 'persistent-ebs-volume' --repo 'https://github.com/gruntwork-io/terraform-aws-server' --tag 'v0.8.0'
    gruntwork-install --module-name 'attach-eni' --repo 'https://github.com/gruntwork-io/terraform-aws-server' --tag 'v0.8.0'
    
  2. Run the attach-eni and/or mount-ebs-volume scripts while each server is booting, typically as part of User Data. You can use the --eni-with-same-tag and --volume-with-same-tag parameters of the scripts, respectively, to automatically mount the ENIs and/or EBS Volumes with the same eni-xxx and ebs-volume-xxx tags as the server.

Optionally Order the Deployment of Other Terraform Resources

There are times when you may wish to block a Terraform resource from being created until the resources deployed by this module are finished. For example, when you deploy a Kafka cluster, you also need to deploy a Zookeeper cluster and the Kafka cluster cannot boot until the Zookeeper cluster is fully booted. To avoid messy log entries of Kafka failing while Zookeeper boots, you could just start the creation of the Kafka cluster after the Zookeeper cluster has finished booting.

Unfortunately, as of June 7, 2018, Terraform does not support module dependencies, so we have to hack this support by making clever use of modules outputs and inputs (variables).

Here's how to use the ordering feature of the server-group module:

  1. Suppose you have two Terraform modules: Module A and Module B, both of which are instances of this server-group module. You want Module B to be created after Module A.

  2. The following code will achieve the desired ordering:

module "a" {
  # Be sure to update to the latest version of this module
  source = "git::git@github.com:gruntwork-io/terraform-aws-asg.git//modules/server-group?ref=v1.0.8"

  ...
}

module "b" {
  source = "git::git@github.com:gruntwork-io/terraform-aws-asg.git//modules/server-group?ref=v1.0.8"

  wait_for = "${module.a.rolling_deployment_done}"
  ...
}

Make sure that you specifically use the rolling_deployment_done output value of Module A, not just any arbitrary output value.

With this pattern, Module A will now fully deploy, and only then will Module B create its Launch Configuration and Auto Scaling Group and begin the rolling deploy.

Background

  1. Why not an Auto Scaling Group?
  2. How does this module work?
  3. How does rolling deployment work?

Why not an Auto Scaling Group?

The first question you may ask is, how is this different than an Auto Scaling Group (ASG)? While an ASG does allow you to run a cluster of servers, automaticaly replace failed servers, and do zero-downtime deployment (see the asg-rolling-deploy module), attaching ENIs and EBS Volumes to servers in an ASG is very tricky:

  1. Using ENIs and EBS Volumes with ASGs is not natively supported by Terraform. The aws_network_interface_attachment and aws_volume_attachment resources only work with individual EC2 Instances and not ASGs. Therefore, you typically create a pool of ENIs and EBS Volumes in Terraform, and your servers, while booting, use the AWS CLI to attach those ENIs and EBS Volumes.

  2. Attaching ENIs and EBS Volumes from a pool requires that each server has a way to uniquely pick which ENI or EBS Volume belongs to it. Picking at random and retrying can be slow and error prone.

  3. With EBS Volumes, attaching them from an ASG is particularly problematic, as you can only attach an EBS Volume in the same Availability Zone (AZ) as the server. If you have, for example, three AZs and five servers, it's entirely possible that the ASG will launch a server in an AZ that does not have any EBS Volumes available.

The goal of this module is to give you a way to run a cluster of servers where attaching ENIs and EBS Volumes is easy.

How does this module work?

The solution used in this module is to:

  1. Create one ASG for each server. So if you create a cluster with five servers, you'll end up with five ASGs. Using ASGs gives us the ability to automatically integrate with the ALB and ELB and to replace failed servers.
  2. Each ASG is assigned to exactly one subnet, and therefore, one AZ.
  3. Create ENIs and EBS Volumes for each server, in the same AZ as that server's ASG. This ensures a server will never launch in an AZ that doesn't have an EBS Volume.
  4. Each server and ENI pair and each server and EBS Volume pair get matching tags, so each server can always uniquely identify the ENIs and EBS Volumes that belong to it.
  5. Zero-downtime deployment is done using a Python script in a local-exec provisioner. See How does rolling deployment work? for more details.

How does rolling deployment work?

The server-group module will perform a zero-downtime, rolling deployment every time you make a change to the code and run terraform apply. This deployment process is implemented in a Python script called rolling_deployment.py which runs in a local-exec provisioner.

Here is how it works:

  1. The rolling deployment process
  2. Deployment configuration options

The rolling deployment process

The rolling deployment process works as follows:

  1. Wait for the server-group to be healthy before starting the deployment. That means the server-group has the expected number of servers up and running and passing health checks.

  2. Pick one of the ASGs in the server-group and set its size to 0. This will terminate the Instance in that ASG, respecting any connection draining settings you may have set up. It will also detach any ENI or EBS Volume.

  3. Once the instance has terminated, set the ASG size back to 1. This will launch a new Instance with your new code and reattach its ENI or EBS Volume.

  4. Wait for the new Instance to pass health checks.

  5. Once the Instance is healthy, repeat the process with the next ASG, until all ASGs have been redeployed.

Deployment configuration options

You can customize the way the rolling deployment works by specifying the following parameters to the server-group module in your Terraform code:

  • script_log_level: Specify the logging level the script should use. Default is INFO. To debug issues, you may want to turn it up to DEBUG. To quiet the script down, you may want to turn it down to ERROR.

  • deployment_batch_size: How many servers to redeploy at a time. The default is 1. If you have a lot of servers to redeploy, you may want to increase this number to do the deployment in larger batches. However, make sure that taking down a batch of this size does not cause an unintended outage for your service!

  • skip_health_check: If set to true, the rolling deployment process will not wait for the server-group to be healthy before starting the deployment. This is useful if your server-group is already experiencing some sort of downtime or problem and you want to force a deployment as a way to fix it.

  • skip_rolling_deploy: If set to true, skip the rolling deployment process entirely. That means your Terraform changes will be applied to the launch configuration underneath the ASGs, but no new code will be deployed until something triggers the ASG to launch new instances. This is primarily useful if the rolling deployment script turns out to have some sort of bug in it.

Questions? Ask away.

We're here to talk about our services, answer any questions, give advice, or just to chat.

Ready to hand off the Gruntwork?