aws

Gruntwork Newsletter, November 2018

Once a month, we send out a newsletter to all Gruntwork customers that describes all the updates we’ve made in the last month, news in the…
Gruntwork Newsletter, November 2018
YB
Yevgeniy Brikman
Co-Founder
Published October 10, 2018

Once a month, we send out a newsletter to all Gruntwork customers that describes all the updates we’ve made in the last month, news in the DevOps industry, and important security updates. Note that many of the links below go to private repos in the Gruntwork Infrastructure as Code Library and Reference Architecture that are only accessible to customers.

Hello Grunts,

In the last month, we added production-ready usage patterns for ELK, updated our ECS modules to do deployment checks, added support for Aurora serverless, made a large number of updates to Terratest, including a log parser that makes it easier to debug failing tests, and fixed a large number of bugs.

As always, if you have any questions or need help, email us at support@gruntwork.io!

Gruntwork Updates

ELK Usage Pattern

Motivation: Last August, we released a set of modules for running the ELK Stack (Elasticsearch, Logstash, Kibana) on top of AWS: package-elk. We got requests from customers about the best ways to use those modules, both in pre-prod and production environments.

Solution: We’ve updated our Acme Reference Architecture with two examples:

  • Multi-cluster ELK deployment: This code shows how to run all the different components of the ELK stack (Elasticsearch master nodes, Elasticsearch data nodes, Logstash nodes, Kibana nodes) in separate clusters (that is, separate Auto Scaling Groups). This is the recommended deployment for production, as it lets you scale each of these clusters separately.
  • Single-cluster ELK deployment: This code shows how to run all the different components of the ELK stack in a single cluster (a single Auto Scaling Group). This is the recommended deployment for pre-prod environments, as it allows you to keep the number of instances small and save money.

What to do about it: Use the code above to deploy package-elk in your own AWS accounts!

ECS service deployment checks

Motivation: When using Terraform to deploying Docker containers to ECS, the built in aws_ecs_service resource will return report “success” as soon as the container has been scheduled for deployment. However, this doesn’t check if ECS actually managed to deploy your service! So terraform apply completes successfully, and you don’t get any errors, but in reality, the container may fail to deploy due to a bug or the ECS cluster being out of resources.

Solution: All of the ecs-service modules in module-ecs will now run a separate binary as part of the deployment to verify the container is actually running before completing apply. This binary will wait for up to 10 minutes (configurable via the deployment_check_timeout_seconds input parameter) before timing out the check. Upon a check failure, the binary will output the last 5 events on the ECS service, helping you debug potential deployment failures during the terraform apply. In addition, if you setup an ALB or NLB with the service, the binary will check the ALB/NLB to verify the container is passing health checks.

What to do about it: The binary will automatically be triggered with each deploy when you update to module-ecs, v0.10.0. This binary requires a working python install to run (supports versions 2.7, 3.5, 3.6, and 3.7). If you do not have a working python install, you can get the old behavior by setting the enable_ecs_deployment_check module input to false.

Aurora serverless (plus, deletion protection)

Motivation: Customers have been asking us for the ability to use Aurora Serverless, which is an an on-demand relational database (MySQL compatible) that will start-up, shut-down, and scale on demand, without you having to provision or manage servers in advance. This is especially useful in pre-prod environments and for sporadically used apps, where you want the database to power down when not in use, so you don’t have to pay for it.

Solution: Our aurora module now supports Aurora serverless! Just set the engine_mode parameter to "serverless" and you’re good to go! You can also configure scaling settings using the new scaling_configuration_xxx parameters and enable deletion protection using the deletion_protection parameter.

What to do about it: Update to module-data-storage, v0.7.1 and give Aurora Serverless a try!

Terratest log parser helper binary for breaking out test logs

Motivation: Infrastructure tests can be slow. Therefore, you typically (a) log every action the test code takes so that you can debug issues purely from the logs, without having to re-run the slow tests and (b) run as many tests in parallel as you can. However, when you do this, all the logs get interleaved due to the concurrent nature of test execution. This makes it difficult to piece out what is going on when a test fails.

Solution: Terratest now ships a log parsing binary that can be used to piece out what is happening in automated tests written in Go. To use the binary, you must first extract the logs to a file and then feed that file to the log parser. Here’s an example:

# Run your Go tests and send the output to a file
go test | tee test-logs.txt
# Pass the file through the log parser
terratest_log_parser --testlog test-logs.txt --outputdir /tmp/logs

The command will then break out the interleaved entries by test, outputting each test log to its own file in a specified directory. Here’s what that output looks like in CircleCI:

In addition, the log parser will emit a junit XML report so that it can be used by CI engines for additional insights, such as making it much easier to see which test failed:

What to do about it: You can install the helper binary using the gruntwork-installer and then take a look at the README for a walk-through of how to use the command. There is no need to upgrade your test code to terratest v0.13.8, as the binary does not depend on any updates to the tests themselves.

RDS snapshot Lambda function refactor

Motivation: We have several Lambda functions in module-data-storage that make it easy to automatically back up databases running in RDS to another AWS account on a scheduled basis. We wrote these Lambda functions a while ago, and they were using our old Lambda code, had deprecation warnings, and did not allow names to be customized, so it was possible to end up with a name that exceeded the maximum length allowed by AWS.

Solution: We’ve updated and refactored all these Lambda functions! The lambda-cleanup-snapshots, lambda-copy-shared-snapshot, lambda-create-snapshot, and lambda-share-snapshot modules now all use package-lambda under the hood (instead of the older lambda code that used to live in module-ci) and expose optional lambda_namespace and schedule_namespace parameters that you can use to completely customize all the names of resources created by these modules.

What to do about it: Update to module-data-storage, v0.7.0.

Terratest updates

  • Terratest, v0.13.1: Add a new terraform.OutputList function for reading and parsing lists returned by terraform output.
  • Terratest, v0.13.2: Add new methods for looking up info about AMIs, including GetAmiPubliclyAccessible and GetAccountsWithLaunchPermissionsForAmi.
  • Terratest, v0.13.3: Add a collection of functions that support SSHing into a GCP Compute Instance.
  • Terratest, v0.13.4: Add new methods for looking up info about GCP Instance Groups, including GetInstances and GetPublicIps.
  • Terratest, v0.13.6: Add OutputMap / OutputMapE functions to read and parse maps from terraform output.
  • Terratest, v0.13.7: In GCP, the GetRandomZone() function now accepts an argument for forbiddenRegions.
  • Terratest, v0.13.8: Added the terratest_log_parser as described earlier in this blog post.
  • Terratest, v0.13.9: adds a new convenience method for building multiple packer templates concurrently. Prior to this release, if you needed to build several AMIs in your tests then you would have to write the parallelization code yourself or to run the packer builds one after another. See the release page for more details.
  • Terratest, v0.13.10: Update README and help text for terratest_log_parser.
  • Terratest, v0.13.11: Removes broken symlink from test fixtures, fixing error when terratest dir is copied by packer.
  • Terratest, v0.13.12: Fixed two issues that affect the ability to scp files from the remote servers that you are testing. See full details here.

Open source updates

  • fetch, v0.3.2: Update dependencies from github.com/codegangsta/cli to its new name, gopkg.in/urfave/cli.v1. There should be no change in behavior.
  • Terragrunt, v0.17.1: Fix a bug where prompts on stdin were not showing up correctly.
  • cloud-nuke, v0.1.4: cloud-nuke will now delete ECS services and tasks.
  • bash-commons, v0.0.7: A new function file_fill_template was added to allow replacing specific template strings in a file with actual values.
  • terraform-aws-consul, v0.4.1: You can now configure a service-linked role for the ASG used in the consul-cluster module using the new (optional) service_linked_role_arn parameter.
  • terraform-gcp-consul, v0.3.0, v0.3.1: Use different GCP projects for launching your cluster, fetching your compute image, and referencing your network resources.

Other updates

  • package-lambda, v0.3.0: The scheduled-lambda-job module now namespaces all of its resources with the format "${var.lambda_function_name}-scheduled" instead of "${var.lambda_function_name}-scheduled-lambda-job". This makes names shorter and less likely to exceed AWS name length limits. If you wish to override the namespacing behavior, you now set a new input variable called namespace.
  • module-aws-monitoring, v0.10.0: Fix the alarm name used by the asg-disk-alarms module to include the file system and mount path. This ensures that if you create multiple alarms for multiple disks on the same auto scaling groups, they each get a unique name, rather than overwriting each other.
  • module-asg, v0.6.18: Fix the asg-rolling-deploy module so the script it uses within works with either Python 2 or Python 3.
  • module-asg, v0.6.19: You can now launch two or more server-group modules in a sequential order instead of only in parallel. This is useful when creating a collection of clusters where Cluster A may depend on Cluster B.
  • package-zookeeper, v0.4.8: Update to module-asg version v0.6.18 so that the rolling deploy script works with either Python 2 or 3. Upgrade Oracle JDK installer to version 8u192-b12.
  • package-zookeeper, v0.4.9: Update zookeeper_servers to use the latest module-asg and then expose the rolling_deployment_done as an output so that other modules can be launched after this module deploys.
  • module-ci, v0.13.3: Update the git-add-commit-push script to check there are files staged for commit before trying to commit.
  • package-kafka, 0.4.2: The run-kafka script now exposes params to configure SSL protocols and ciphers, SASL authentication, ACLs, ZooKeeper chroot, and JMX. The Kafka Connect, Schema Registry, and REST Proxy modules now allow you to configure a keystore for validating SSL connections.
  • module-ecs, v0.9.0: The ecs-service-with-discovery module will now create an IAM role for the ECS task that can be extended with custom policies, similar to the ecs-service module. Note: this is a backwards incompatible change. Refer to the release notes for more information.

DevOps News

AWS Lambda functions can now run for up to 15 minutes!

What happened: AWS has announced that Lambda functions can now run for up to 15 minutes!

Why it matters: Lambda functions used to be limited to a max runtime of 5 minutes. This made them useful for short, one-off tasks, but any workload that took longer than that would have to be executed elsewhere (e.g., in an ECS Cluster). The time limit has now been increased to 15 minutes, which means you can use Lambda functions for an even larger variety of use cases.

What to do about it: This feature is available everywhere immediately. Simply set the max timeout to your lambda function to 15 minutes (900 seconds), and you’ll be good to go!

Terraform Enterprise changes

What happened: At HashiConf 2018, the HashiCorp team announced some major changes with Terraform Enterprise:

  • Three tiers: a free tier for individuals and small teams, a business tier for small to medium companies, and an enterprise tier for large companies.
  • Remote state: Remote Terraform state management within Terraform Enterprise is now free for teams of all sizes.
  • Remote plan and apply: For enterprise customers, Terraform can now run your plan and apply commands remotely, in the Terraform Enterprise SaaS product (rather than on your own computer), while still streaming logs and data back to your own computer.
  • Atlantis: The Atlantis team is joining HashiCorp and will be integrating their work into Terraform Enterprise.

Why it matters: It looks like the HashiCorp team is making a push for all Terraform users to move to Terraform Enterprise (at one of the three tiers) to simplify collaboration and team workflows.

What to do about it: For now, the only thing you can do is to sign up for a waitlist for the free remote state management functionality. However, keep your eye on this space, as new functionality will likely be rolling out soon.

Amazon ECS-optimized Linux 2 now available

What happened: Amazon’s ECS-optimized AMI now supports Amazon Linux 2.

Why it matters: Amazon Linux 2 includes systemd, newer versions of the Linux kernel, C library, compiler, and tools, and access to more/newer software packages. It’s also the version of Amazon Linux that will get long-term support (at least through 2023).

What to do about it: If you are using ECS, you may wish to update your Packer templates to use the new ECS-optimized AMIs. You can find the AMI name pattern to put into the source_ami_filter param in your Packer template, as well as the latest AMI IDs, on this page. Note that you need to use the following versions of Gruntwork modules to get Amazon Linux 2 support:

  • bash-commons: at least v0.0.3 (latest is v0.0.7)
  • module-ecs: Any version should work
  • module-aws-monitoring: at least v0.9.2 (latest is v0.10.0)
  • gruntkms: at least v0.0.7 (latest is v0.0.7)
  • module-security: at least v0.9.0 (latest is v0.15.3). Note, to install fail2ban on Amazon ECS-optimized Linux 2, you’ll need to install yum-utils first (e.g., add sudo yum install -y yum-utils earlier in your Packer template).

Once you’ve built a new AMI, follow this guide to roll it out across your ECS cluster.