Gruntwork Newsletter, June 2019
Once a month, we send out a newsletter to all Gruntwork customers that describes all the updates we’ve made in the last month, news in the…

Once a month, we send out a newsletter to all Gruntwork customers that describes all the updates we’ve made in the last month, news in the DevOps industry, and important security updates. Note that many of the links below go to private repos in the Gruntwork Infrastructure as Code Library and Reference Architecture that are only accessible to customers.
Hello Grunts,
In the last month, we integrated Kubernetes into the Gruntwork Reference Architecture, wrote a new blog post series on how to build an end-to-end production-grade architecture on AWS, updated Terratest and Terragrunt and many of our modules to work with Terraform 0.12 (which is now out officially!), updated the Reference Architecture to use OpenJDK, fixed a security bug in the SQS module, and much more.
As always, if you have any questions or need help, email us at support@gruntwork.io!
Gruntwork Updates
Kubernetes in the Gruntwork Reference Architecture

Motivation: For the last few years, the Gruntwork Reference Architecture has supported Auto Scaling Groups (ASGs) and EC2 Container Service (ECS) as the primary ways to run workloads. As Kubernetes has grown in popularity, we got steadily more and more requests to add support for it as a first-class offering. Today, we’re excited to announce that we can now offer Kubernetes as a new option for running workloads in the Gruntwork Reference Architecture!
Solution: We’ve created a number of production-grade modules for running Kubernetes on AWS and integrated them into the Reference Architecture (including hooking them into monitoring, alerting, networking, CI/CD, and so on). Under the hood, we run the Kubernetes control plane on top of Amazon’s Elastic Kubernetes Service (EKS), so it’s a fully managed service. On top, we run Helm and Tiller to make it easy to deploy and manage workloads in your Kubernetes cluster. And in between, we’ve spent a lot of time configuring everything for high availability (see our Zero Downtime Server Updates For Your Kubernetes Cluster blog post series), scalability, security (including TLS auth, namespaces, and strict RBAC controls), and testing (see our blog post, Automated Testing for Kubernetes and Helm Charts using Terratest).
If you’re a Gruntwork customer, you can see an example of what the Reference Architecture integration looks like in our Acme Company examples below (and if you’re not a customer, sign up now to get access!):
- eks-cluster: A module to manage the EKS cluster with its workers.
- eks-core-services: A module to deploy and manage core administrative services on your EKS cluster.
- k8s-namespace-with-tiller: A module to provision a new Kubernetes Namespace with a deployed Tiller (Helm Server) so that you can use helm to install services into that Namespace.
- k8s-service: A module for deploying a dockerized app on to Kubernetes using helm.
- k8s-tiller: A module for managing Tiller (Helm Server).
See also the corresponding changes in the infrastructure-live
repository to see how they are deployed.
What to do about it: We can deploy a Kubernetes-based Reference Architecture for you in about one day as part of the Gruntwork Subscription. Alternatively, if you’re already a subscriber, check out the links in the previous section to learn how to deploy Kubernetes into your existing infrastructure-modules
and infrastructure-live
repos. Let us know how it works for you, and if you’ve got any comments questions, contact us at support@gruntwork.io!
AWS Reference Architecture Overview Blog Post Series

Motivation: Many people have asked us about the details of what it takes to go to production on AWS. We’ve captured these details in the Gruntwork Reference Architecture, but haven’t done a great job of explaining what those details include. Potential customers wanted to know the specific components of the architecture and how they were set up before purchasing. Existing customers wanted to know about some of the design choices we made.
Solution: We wrote a new blog post series, How to Build an End to End Production-Grade Architecture on AWS! This series is designed to build up to the Reference Architecture from the perspective of addressing the various concerns that need to be answered when going to production on AWS. This includes both an overview of which infrastructure components to choose (e.g., Kubernetes, VPCs, KMS, Jenkins, etc), as well as why those choices make sense.
What to do about it: Click on the links below and start reading!
- Part 1: Network Configuration, Kubernetes, Microservices, and Load Balancing
- Part 2: CI/CD, Multiple Accounts, Secrets Management, CDN, VPN, and Monitoring
- Part 3: Bootstrap Your Production-Grade Infrastructure in a Day
Terraform 0.12 updates
Motivation: Terraform 0.12 final is now out (see the DevOps News section below), so we’ve been hard at work updating all of our modules and tooling to work with it.
Solution: Here are the latest updates:
Terragrunt v0.19.0 and above now supports Terraform 0.12! As a bonus, we’re now using HCL2 syntax with Terragrunt, which (a) makes your code cleaner and (b) allows you to use built-in Terragrunt functions everywhere in your Terragrunt configuration! Make sure to read the migration guide for upgrade instructions. Also, check out Terragrunt: how to keep your Terraform code DRY and maintainable for an overview of how to use Terragrunt in 2019.
Terratest v0.15.8 and above now supports Terraform 0.12. See below for more info on Terratest updates.
Infrastructure as Code Library: we’ve updated a number of modules in the Infrastructure as Code Library—see the module version compatibility chart—, but we still have quite a few more to go. Note that these are backwards incompatible releases, so the latest versions of our modules will no longer support Terraform 0.11.
What to do about it: Since we are still in the process of upgrading all of our modules to work with Terraform 0.12, and since the upgrade process is backwards incompatible, for the time being, we recommend that you continue to use Terraform 0.11.x. Once everything is ready to go with Terraform 0.12.x, we’ll send out full upgrade instructions. We know you’re excited to upgrade, so we’re making every effort to have everything ready by the end of June, but take that as a good faith estimate, and be aware of the usual caveats about DevOps time estimates and yak shaving!
Important SQS Security Fix
Motivation: We discovered that our sqs
module had a very unsafe default configuration that allowed unauthenticated incoming requests from any IP.
Solution: We’ve updated the sqs
module so that IP-based access is now disabled completely by default. Unless you intend to allow unauthenticated IP-based, we strongly recommend updating to this new version. If you do need to allow IP-based access, set apply_ip_queue_policy
to true
and specify the IPs that should be able to access the queue via allowed_cidr_blocks
.
What to do about it: We strongly recommend updating to package-messaging, v0.2.0 ASAP.
Reference Architecture Open JDK Fix
Motivation: Last month we discovered that Oracle changed their policies to require authentication for all downloads of their JDK, which broke our install-oracle-jdk
module. As a solution, we introduced an install-open-jdk
module, and updated all our Java based infrastructure packages to use it Kafka, Zookeeper, ELK. However, customers were asking how to apply these changes to their Reference Architectures.
Solution: This month, we updated our Reference Architecture examples to point to the install-open-jdk
module where it was referencing install-oracle-jdk
. If you use Kafka, Zookeeper, or ELK, you will want to apply the same update to your Packer templates.
What to do about it: Check out this commit for an example of the locations you will need to update.
Reference Architecture Script local readonly Bug Fix
Motivation: In our bash scripts for the Reference Architecture, we have been using local readonly
to mark variables as locally scoped and immutable. However, this does not actually do what you would think it would do.
Solution: We updated all our bash scripts in the Reference Architecture to replace the usage of local readonly
with local -r
. We also took care to mark read only arrays using local -r -a
.
What to do about it: Check out this commit for an example of the locations you will need to update.
Terratest updates
Motivation: We needed to make a number of Terratest updates, including improving adding support forour on-going work to update to Terraform 0.12, improved GCP support, and improved features to work around flaky tests.
Solution: We’ve made the following updates:
- terratest, v0.15.6: You can now specify
-var-file
in thepacker
module to use json files as variable input. Check out our example usage. - terratest, v0.15.7: Added support for deleting SSH Public Keys attached to a Google user identity.
- terratest, v0.15.8: Fix a bug where GCP credentials authentication can sometimes transitively fail. We handle this by introducing a retry loop. Fix a regression introduced in v0.15.2 which broke the handling of lists and maps in the vars for Terraform.
- terratest, v0.15.9: Improved the resiliency of the GCP methods for obtaining an OAuth2 token by adding retries. This helps work around intermittent “TLS handshake timeout” errors. Fix a bug in how Terratest was setting
-backend-config
parameters duringterraform init
. We were using a space as a separator, but Terraform requires using equals. - terratest, v0.15.10: Improves stability of the terratest CI build.
- terratest, v0.15.11: Added
GetEc2InstanceIdsByFilters
which provides an interface to retrieve EC2 instances by defining filters as amap
. This release also introduced functionality for testing dynamodb. - terratest, v0.15.12: Added support for testing
terragrunt
. Check out the release notes for more info. - terratest, v0.15.13: Fixed the
terraform.OutputList
andterraform.OutputMap
methods to work with Terraform 0.12. - terratest, v0.16.0: Added a new
DoWithRetryableErrors
method that takes in a map of retryable errors and an action to execute, and if the action returns an error, retries it if the error or the action's stdout/stderr matches one of the retryable errors. Updated theterraform
code to use thisDoWithRetryableErrors
method under the hood for retries. Added support for retryable errors for Packer builds via the newRetryableErrors
,MaxRetries
, andTimeBetweenRetries
settings inpacker.Options
. - terratest, v0.16.1:
NewAuthenticatedSession
inmodules/aws
now supports returning credentials set by assuming a role. This can be done by setting the environment variableTERRATEST_IAM_ROLE
to the ARN of the IAM role that should be assumed. When this env var is not set, it reverts to the old behavior of looking up credentials from the default location. - terratest, v0.17.0:
InitAndPlan
andInitAndPlanE
now return the text output fromstdout
andstderr
, instead of the exit code as an integer. The original versions that returned the exit code have been renamed toInitAndPlanWithExitCode
andInitAndPlanWithExitCodeE
. As a part of this, introducedPlan
andPlanE
functions, which can be used to just runterraform plan
. These will return thestdout
andstderr
outputs.
Open source updates
- terraform-google-gke, v0.2.0:
logging_service
andmonitoring_service
defaults were changed to use Stackdriver Kubernetes Engine Monitoring instead of the legacy Stackdriver support. - terraform-google-gke, v0.1.2: Enabled parallel tests and updated the examples to allow custom kubectl config paths.
- terraform-google-network, v0.2.1: Fixed an issue where a data provider was referencing the public network instead of private.
- terraform-google-load-balancer, v0.1.2: Introduced new
internal-load-balancer
-module that can be used to create Internal TCP/UDP Load Balancers using internal forwarding rules. - Terragrunt, v0.19.0: Terragrunt now supports Terraform 0.12! Please see the migration guide for upgrade instructions.
- kubergrunt, v0.4.0 and v0.5.0: Introduces the
helm revoke
command, which will remove access to Tiller from the specified RBAC entities.
Other updates
- module-aws-monitoring, v0.12.4: This release adds conditional support for the
logs/load-balancer-access-logs
module. You can now setcreate_resources = false
on the module call to avoid creating the S3 bucket. - module-aws-monitoring, v0.12.5: This release fixes the principal on
logs/load-balancer-access-logs
module’s policy so that NLBs can write to the S3 bucket. - module-aws-monitoring, v0.12.6: Fix the
period
setting for the SQS alarm to use a minimum of 5 minutes rather than 1 minute, as SQS metrics are only collected once every 5 minutes, so trying to alert more often doesn't work. - module-security, v0.16.3: You can now tell the
iam-groups
module to not create the "access-all" group by setting the new input variableshould_create_iam_group_cross_account_access_all
to false. This can help work around an AWS limitation where we exceed the max IAM policy length. - module-security, v0.16.4: You can now configure an optional SNS delivery notification topic for the
cloudtrail
module using a newsns_delivery_topic
input variable. - package-elk, v0.2.9: Switched the
elasticsearch-cluster-backup
andelasticsearch-cluster-restore
modules over to using Node 8.10 as the runtime, as 6.10 has been deprecated. The runtime is now also configurable via thelambda_runtime
input variable. - module-ecs, v0.13.4: All the ECS service modules now allow you to optionally specify a custom prefix to use for the IAM execution role. The default value is to use the service name as before.
- module-server, v0.6.2: The
attach-eni
script is now compatible with Ubuntu 18.04. - module-vpc, v0.5.8:
var.custom_tags
now propagate to EIP resources created in the VPCs.
DevOps News
Terraform 0.12 (and 0.12.1) is out!
What happened: HashiCorp has released Terraform 0.12 final. They also followed up shortly after with 0.12.1, which fixes some important bugs.
Why it matters: Terraform 0.12 brings with it a number of powerful new features, but will also require a significant upgrade.
What to do about it: See the “Terraform 0.12 update” section above.
Amazon MSK is now Generally Available
What happened: Amazon’s managed Kafka service, MSK, is now generally available in all AWS accounts.
Why it matters: Before, MSK was only available in “preview mode” to select accounts. The service is now a bit more mature and available everywhere as a managed way to run Apache Kafka (and Apache ZooKeeper).
What to do about it: Give MSK a shot and let us know what you think! We do not have a dedicated module for it, but you can try out the aws_msk_cluster resource to deploy it yourself.
ECS now supports increased ENI limits with awsvpc networking mode
What happened: AWS has added support for trunking, which allows certain instance types to have a higher ENI limit for ECS Tasks in awsvpc
networking mode.
Why it matters: When using awsvpc
networking mode, each ECS Task gets its own IP address by way of an Elastic Network Interface (ENI). Under the hood, each ECS Task runs on an EC2 Instance, and those instances typically had very low limits on how many ENIs you could attach (e.g., typically only 1–2 until you got to really large instance types). That meant you would often run out of ENIs long before you ran out of CPU or memory resources. Now, if you enable the new awsvpcTrunking
mode, certain instance types will allow you to attach 3–8x as many ENIs as before, allowing you to make much better use of your CPU and memory resources.
What to do about it: Check out the announcement blog post for instructions.
AWS Lambda adds support for Node.js v10, deprecates Node.js v6
What happened: AWS Lambda now allows you to use Node.js v10 as a runtime, while the older Node.js v6 runtime is now deprecated.
Why it matters: If you were using Node.js v6, you need to update immediately, as it will stop working soon. Node.js v10 includes a number of performance improvements and is generally a safe upgrade.
What to do about it: If you’re using package-lambda to manage your Lambda functions, update your runtime parameter to nodejs10.x
.
Security Updates
Below is a list of critical security updates that may impact your services. We notify Gruntwork customers of these vulnerabilities as soon as we know of them via the Gruntwork Security Alerts mailing list. It is up to you to scan this list and decide which of these apply and what to do about them, but most of these are severe vulnerabilities, and we recommend patching them ASAP.
Multiple CPU Vulnerabilities
- On May 14th, 2019 multiple teams of security researchers around the world independently discovered various CPU vulnerabilities: The RIDL and Fallout speculative execution attacks allow attackers to leak confidential data across arbitrary security boundaries on a victim system, for instance compromising data held in the cloud or leaking your information to malicious websites. ZombieLoad attack uncovers a novel Meltdown-type effect in the processor’s previously unexplored fill-buffer logic. Store-To-Leak Forwarding exploits CPU optimizations introduced by the store buffer to break address randomization, monitor the operating system or to leak data when combined with Spectre gadgets.