aws

Gruntwork Newsletter, May 2019

Once a month, we send out a newsletter to all Gruntwork customers that describes all the updates we’ve made in the last month, news in the…
Gruntwork Newsletter, May 2019
YB
Yevgeniy Brikman
Co-Founder
Published May 13, 2019

Once a month, we send out a newsletter to all Gruntwork customers that describes all the updates we’ve made in the last month, news in the DevOps industry, and important security updates. Note that many of the links below go to private repos in the Gruntwork Infrastructure as Code Library and Reference Architecture that are only accessible to customers.

Hello Grunts,

In the last month, we’ve made a major upgrade to the Infrastructure as Code Library: in partnership with Google, we’ve added a collection of open source, production-grade, reusable modules for deploying your infrastructure on Google Cloud Platform (GCP)! We also launched a new documentation website, replaced our OracleJDK module with an OpenJDK module, added a module to automatically issue and validate TLS certs, made major updates to our Kubernetes/EKS code (including support for private endpoints, log shipping, ingress controllers, external DNS, etc), and fixed a number of critical bugs.

As always, if you have any questions or need help, email us at support@gruntwork.io!

Gruntwork Updates

Gruntwork for Google Cloud Platform (GCP)!

Motivation: Up until recently, we had been primarily focused on AWS, but this month, we’re excited to announce, in partnership with Google, that we’ve added first-class support for Google Cloud Platform (GCP)! And best of all, thanks to this partnership, all of our GCP modules are open source!

Solution: We worked directly with Google engineers to develop a set of reusable, production-grade infrastructure modules, including:

We also now offer commercial support for both AWS and GCP. Check out our announcement blog post for the details.

What to do about it: To get started with these modules, check out our post on the Google Cloud Blog, Deploying a production-grade Helm release on GKE with Terraform. This blog post will walk you through setting up a Kubernetes cluster, configuring Helm, and using Helm to deploy a web service on Google Cloud in minutes. You can even try out the code samples from that blog post directly in your browser, without having to install anything or write a line of code, using Google Cloud Shell!

New documentation website (docs.gruntwork.io)

Motivation: DevOps is hard. There seem to be 1,001 little details to get right, and you never have the time to learn them all.

Solution: We’ve launched a new Gruntwork Docs site that helps you get up and running even faster! You can already find guides for Deploying a Dockerized App on GCP/GKE and Deploying a Production Grade EKS cluster.

What to do about it: Head to the Gruntwork Docs site at docs.gruntwork.io. We’ll be adding much more content in the future, so let us know in the comments and via support what DevOps issues you’re struggling with, and we’ll do our best to write up guides to answer your questions.

New OpenJDK installer module

Motivation: We discovered that Oracle has changed their policies to require authentication for all downloads of their JDK. This broke our install-oracle-jdk module. This in turn has impacted all of our Java based infrastructure packages: Kafka, Zookeeper, ELK.

Solution: We created a new install-open-jdk module that will install OpenJDK instead of Oracle’s JDK. It was created to be a drop-in replacement for our other module. In the past, the Oracle JDK used to be the best option, as OpenJDK was missing many features, had worse performance, and didn’t offer commercial support. However, in recent years, the differences between the JDKs in terms of features and performance have become negligible and Oracle no longer allows you to use their JDK for free (a paid license is required for production usage!). Therefore, most teams are now better off going with OpenJDK, which you can install using this module. Note that if you need commercial support for the JDK, you may wish to use Azul or AdoptOpenJdk instead. We’re updating all of our own Java based infrastructure packages to use this new module.

What to do about it: The new OpenJDK installer module is available as part of Zookeeper’s v0.5.4 release. Check it out and use it instead of install-oracle-jdk as we will be deprecating and removing it shortly.

acm-tls-certificate: new module to issue & validate TLS certificates

Motivation: AWS Certificate Manager (ACM) makes it easy to issue free, auto-renewing TLS certificates. So far, we’ve mostly been creating these certificates manually via the AWS Console, but we’ve always wanted to manage them as code.

Solution: We’ve created a new Terraform module called acm-tls-certificate that can issue and automatically validate TLS certificates in ACM! Usage couldn’t be simpler:

# Create a TLS certificate for example.your-domain.com
module "cert" {
source = "git::git@github.com:gruntwork-io/module-load-balancer.git//modules/acm-tls-certificate?ref=v0.13.2"
domain_name    = "example.your-domain.com"
hosted_zone_id = "ZABCDEF12345"
}

You pass in the domain name you want to use and the ID of the Route 53 Hosted Zone for that domain, and you get a free, auto-renewing TLS certificate that you can use with ELBs, CloudFront, API Gateway, etc! For example, here’s how you can use this certificate with an Application Load Balancer (ALB):

# Create a TLS certificate for example.your-domain.com
module "cert" {
source = "git::git@github.com:gruntwork-io/module-load-balancer.git//modules/acm-tls-certificate?ref=v0.13.2"
domain_name    = "example.your-domain.com"
hosted_zone_id = "ZABCDEF12345"
}
# Attach the TLS certificate to an ALB
module "alb" {
source = "git::git@github.com:gruntwork-io/module-load-balancer.git//modules/alb?ref=v0.13.2"
alb_name         = "example-alb"
https_listener_ports_and_acm_ssl_certs = [
{
port            = 443
tls_domain_name = "${module.cert.certificate_domain_name}"
},
]
# ... other params omitted ...
}

And now your load balancer is using the TLS certificate on its listener for port 443!

What to do about it: The acm-tls-certificate module is available in module-load-balancer, v0.13.2. Check out this example for fully-working sample code.

EKS Updates

Motivation: Since December of last year, we have been busy building up a production grade IaC module for EKS that makes it 10x easier to deploy and manage EKS. What makes infrastructure production grade depends on how many items of our Production Grade Checklist is covered. This month we shipped multiple new modules that enhance the security and monitoring capabilities of the EKS cluster deployed with our modules.

Solution: Over the last month we enhanced our EKS module with the following updates:

  • We now support EKS private endpoints for clusters launched using the eks-cluster-control-plane module. Check out the module docs for more info. (v0.2.3)
  • We now support directly accessing the tokens in the Terraform code, as opposed to requiring setup of kubectl to access the cluster using the kubernetes provider and kubergrunt. (v0.3.0)
  • We enhanced support for managing multiple worker groups: we added support for taints, tolerations, and affinity rules for any infrastructure deployed using helm, such as the fluentd-cloudwatch module and introduced a module to create reciprocating security group rules (the eks-cluster-workers-cross-access module). (v0.3.1)
  • We now support EKS control plane log shipping via the enabled_cluster_log_types variable. You can read more about this feature in the official AWS documentation. (v0.4.0)
  • We added support for deploying the AWS ALB Ingress Controller in the modules eks-alb-ingress-controller and eks-alb-ingress-controller-iam-policy, which allows you to map Ingress resources to AWS ALBs. See the module documentation for more information. (v0.5.0)
  • We added support for deploying the external-dns application in the modules eks-k8s-external-dns and eks-k8s-external-dns-iam-policy, which allows you to map Ingress resource host paths to route 53 domain records so that you automatically configure host name routes to hit the Ingress endpoints. See the module documentation for more information. (v0.5.1, v0.5.2, v0.5.3)
  • We added support for linking ELBs to the worker ASG. (v0.5.4)

What to do about it: Upgrade to the latest version of terraform-aws-eks (v0.5.4) to start taking advantage of all the new features!

vpc-dns-forwarder: New module to create Route 53 Resolver Endpoints

Motivation: In Ben Whaley’s VPC reference architecture, it is common to setup a management VPC that acts as a gateway to other application VPCs. In this setup, operators typically VPN into the management VPC and access the other VPCs in your infrastructure over a VPC peering connection. One challenge with this setup is that domain names in Route 53 private hosted zones are not available to the peering VPC.

Solution: To allow DNS lookups of private hosted zones over a peering connection, we can use Route 53 Resolvers to forward DNS queries for specific endpoints to the application VPCs. We created two new modules in module-vpc to support this use case: vpc-dns-forwarder and vpc-dns-forwarder-rules.

What to do about it: The vpc-dns-forwarder and vpc-dns-forwarder-rules are available in module-vpc, v0.5.7. Take a look at the updated vpc-peering example for fully working sample code.

Fixes for the server-group health-checker

Motivation: The Gruntwork server-group module includes a script called rolling_deployment.py which can be used for hooking up a load balancer to perform health checks on the server-group. This script relied on an API call which recently started throwing an exception that we were not handling. This resulted in a situation where the unhandled exception in the health-checker script could cause a deployment of the server-group to fail erroneously.

Solution: We updated the rolling_deployment script to properly handle the exception. See this PR for more details

What to do about it: Update to module-asg v0.6.26 to pick up the fix.

Fixes for the ECS zero-downtime rollout script

Motivation: The Gruntwork ecs-cluster module includes a script called roll-out-ecs-cluster-update.py which can be used to roll out updates (e.g., a new AMI or instance type) to the Auto Scaling Group that underlies the ECS cluster. This script should work without downtime, but recently, one of our customers ran it, and when the script finished, it had left the cluster with some of the instances updated to the new AMI, but some still running the old AMI, and the old ones were stuck in DRAINING state. Clearly, something was wrong!

Solution: It looks like AWS made backwards incompatible changes to the default termination policy for Auto Scaling Groups, and as the roll-out-ecs-cluster-update.py depended on the behavior of this termination policy as part of its roll-out procedure, this change ended up breaking the script. To fix this issue, we’ve updated the ecs-cluster module to expose the termination policy via a new termination_policies input variable, and we’ve set the default to OldestInstance (instead of Default) to fix the roll out issues.

What to do about it: Update to module-ecs, v0.13.0 to pick up the fix. Update, 05.09.19: it’s possible this does not fix the issue fully. See #134 for ongoing investigation.

Reference Architecture Mgmt VPC CIDR Block fix

Motivation: One of our customers was connected to VPN servers in two different accounts (stage and prod) and noticed connectivity wasn’t working quite right. It turns out the cause was that the Gruntwork Reference Architecture was using the conflicting CIDR blocks for the “mgmt VPCs” (where the VPN servers run) in those accounts.

Solution:** We’ve updated the Reference Architecture to use different CIDR blocks for the mgmt VPCs in each account. The app VPCs were already using different CIDR blocks.

What to do about it: If you wish to connect to multiple VPN servers at once, or you need to peer the various mgmt VPCs together for some reason, you’ll want to ensure each one has a different CIDR block. The code change is easy: see this commit for an example. However, VPC CIDR blocks are considered immutable in AWS, so to roll this change out, you’ll need to undeploy everything in that mgmt VPC, undeploy the VPC, deploy the VPC with the new CIDR block, and then deploy everything back into the VPC.

Open source updates

  • health-checker, v0.0.5: Added single-flight -mode preventing long-running health checks from piling up.
  • terratest, v0.14.6: This release introduces the SetStrValues argument for helm.Options, which corresponds to the --set-string argument. This can be used to force certain values to cast to a string as opposed to another data type.
  • terratest, v0.15.0: The GetAccountId and GetAccountIdE methods now use STS GetCallerIdentity instead of IAM GetUser under the hood, so they should now work whether you're an IAM User, IAM Role, or other AWS authentication method while running Terratest.
  • terratest, v0.15.1: This release extends AWS ECS support with GetEcsService and GetEcsTaskDefinition, which can be used to retrieve ECS Service and ECS Task Definition objects respectively. Check out the new Terraform example and corresponding test to see it in action.
  • terratest, v0.15.2: This release adds support for Terraform 12 by stripping surrounding quotes from values passed to the -var command line option.
  • terratest, v0.15.3: This release introduces support for AWS SSM, providing functions to access parameters: GetParameter, GetParameterE, PutParameter, PutParameterE.
  • terratest, v0.15.4: This release adds support for checking S3 Bucket Versioning configuration: PutS3BucketVersioning, PutS3BucketVersioningE, GetS3BucketVersioning, GetS3BucketVersioningE, AssertS3BucketVersioningExists, AssertS3BucketVersioningExistsE.
  • terratest, v0.15.5: This release introduces support for S3 Bucket Policy assertions and access functions: PutS3BucketPolicy, PutS3BucketPolicyE, GetS3BucketPolicy, GetS3BucketPolicyE, AssertS3BucketPolicyExists, AssertS3BucketPolicyExistsE .
  • fetch, v0.3.5: GitHub Enterprise users can now download assets.
  • Terragrunt, v0.18.4: You can now set skip = true in your Terragrunt configuration to tell Terragrunt to skip processing a terraform.tfvars file. This can be used to temporarily protect modules from changes or to skip over terraform.tfvars files that don't define infrastructure by themselves.
  • Terragrunt, v0.18.5: Added a new terragrunt-info command you can run to get a JSON dump of Terragrunt settings, including the config path, download dir, working dir, IAM role, etc.
  • terraform-aws-consul, v0.6.1: Fix a bug where we were not registering Consul properly in systemd, so it would not automatically start after a reboot.
  • terraform-aws-vault, v0.12.1: You can now tell the run-vault script to run Vault in agent mode rather than server mode by passing the --agent argument, along with a set of new --agent-xxx configs (e.g., --agent-vault-address, --agent-vault-port, etc). The Vault agent is a client daemon that provides auto auth and caching features.
  • terraform-aws-vault, v0.12.2: Fix a bug where we were not registering Vault properly in systemd, so it would not automatically start after a reboot.

Other updates

  • kubergrunt, v0.3.7: This release introduces the k8s wait-for-ingress sub command which can be used to wait until an Ingress resource has an endpoint associated with it.
  • kubergrunt, v0.3.8: This release updates the tls gen command to use the new way of authenticating to Kubernetes (specifically passing in server and token info directly) and using JSON to configure the TLS subject. This release also introduces a new command helm wait-for-tiller which can be used to wait for a tiller deployment to roll out Pods, and have at least one Pod that can be pinged. This enables chaining calls to helm after helm is deployed when using a different helm deployment process that doesn't rely on the helm client (e.g creating deployment resources manually).
  • kubergrunt, v0.3.9: This release updates kubergrunt helm configure with a new option --as-tf-data, which enables you to call it in an external data source. Passing this flag will cause the command to output the configured helm home directory in the output json on stdout at the end of the command.
  • terraform-kubernetes-helm, v0.3.0: This release introduces a new module k8s-tiller, which can be used to use manage Tiller deployments using Terraform. The difference with the kubergrunt approach is that this supports using Terraform to apply updates to the Tiller Deployment resource. E.g you can now upgrade Tiller using Terraform, or update the number of replicas of Tiller Pods to deploy. Note that this still assumes the use of kubergrunt to manage the TLS certificates.
  • terraform-kubernetes-helm, v0.3.1: k8s-namespace and k8s-namespace-roles modules now support conditionally creating the namespace and roles via the create_resources input variable.
  • package-terraform-utilities, v0.0.8: This release introduces a new module list-remove which can be used to remove items from a terraform list. See the module docs for more info.
  • module-ci, v0.13.13: You can now set the redirect_http_to_https variable to true on the jenkins-server module to automatically redirect all HTTP requests to HTTPS.
  • module-load-balancer, v0.13.3: This release fixes an issue with multiple duplicate ACM certs — e.g. you’re rotating to a new cert and still have systems using the old cert — where previously it errored out if multiple ACM certs matched the domain. Instead, we will now pick the newer one.
  • module-ecs, v0.13.1: This release adds and exposes a task execution Iam Role so the ECS tasks can pull private images from ECR and read secrets from AWS Secrets Manager.
  • module-ecs, v0.13.2: This release fixes a bug where the fargate_without_lb resource incorrectly set a health_check_grace_period_seconds. From the terraform documentation, "Health check grace period is only valid for services configured to use load balancers".
  • module-ecs, v0.13.3: You can now set a custom name prefix for the IAM roles created by the ecs-service module using the new task_execution_name_prefix input variable. The default is var.service_name, as before.
  • module-security, v0.16.2: This release fixes #89, where fail2ban was not correctly working on non-ubuntu instances. Specifically, fail2ban had a bug that prevented it from correctly banning brute force SSH attempts on CentOS and Amazon Linux 1 platforms. Checkout the release notes for more details.
  • module-aws-monitoring, v0.12.3: You can now (a) set tags on all the alarms modules via a new tags input variable and (b) configure the snapshot period and snapshot evaluation period for the elasticsearch-alarms module using the new snapshot_period and snapshot_evaluation_period input variables, respectively.

DevOps News

Terraform 0.12 rc1

What happened: HashiCorp has released Terraform 0.12, release candidate 1 (rc1).

Why it matters: The final release of Terraform 0.12 draws closer and closer! Terraform 0.12 brings with it a number of powerful new features, but will also require a significant upgrade. We’ve already started updating our modules with support for 0.12, including updating Terratest to work with 0.11 and 0.12.

What to do about it: For now, continue to sit tight, and await 0.12 final, as well as our word that all of our modules have been updated. We’ll send upgrade instructions when everything is ready!

AWS S3 will no longer support path-style URLs

What happened: AWS has announced, rather quietly, that path-style S3 URLs will no longer be supported after September 30th, 2020. Update: AWS just released a new blog post that says path-style URLs will only be deprecated for new S3 buckets created after September 30th, 2020.

Why it matters: In the past, for an S3 bucket called my-bucket, you could build S3 URLs in one of two formats:

  1. Path-style URLs: s3.amazonaws.com/my-bucket/image.jpg
  2. Virtual-host style URLs: foo.s3.amazonaws.com/image.jpg

The former supported both HTTP and HTTPS, whereas the latter used to only support HTTP. Now, both support HTTPs, but the path-style URLs will no longer be supported after September 30th 2020. Update: AWS just released a new blog post that says path-style URLs will continue to work for S3 buckets created before September 30th, 2020, but will not be available for S3 buckets created after that date.

What to do about it: If you’re using path-style S3 URLs, update your apps to use virtual-host style URLs instead. Note that if your bucket name contains dots, virtual-host style URLs will NOT work, so you’ll have to migrate to a new S3 bucket!

Security Updates

Below is a list of critical security updates that may impact your services. We notify Gruntwork customers of these vulnerabilities as soon as we know of them via the Gruntwork Security Alerts mailing list. It is up to you to scan this list and decide which of these apply and what to do about them, but most of these are severe vulnerabilities, and we recommend patching them ASAP.

Docker Hub Data Breach

  • On Thursday, April 25th, 2019 it was discovered that Docker Hub had been breached, exposing user data of approximately 190,000 users. The exposed data includes Docker Hub usernames, hashed password, Github and BitBucket oauth access tokens. If you are an affected user, you should have received an email from Docker Hub. Even if you weren’t directly affected, you may still need to take action to protect your organization, as a compromise of any employee at your company may give an attacker access to ALL of your code! The most sensitive piece of information leaked are the Github and BitBucket access tokens, which typically grant read/write access to all repos of the org. These tokens are granted to Docker Hub for users who use the autobuild feature. The tokens of affected users have already been revoked by Docker, but if you have not received an email from Docker Hub notifying you of the revocation, we recommend revoking the tokens in GitHub or BitBucket. Additionally, we recommend auditing the security logs to see if any unexpected actions have taken place. You can view security actions on your GitHub or BitBucket accounts to verify if any unexpected access has occurred (see this article for GitHub and this article for BitBucket). Now would also be a good time to review your organization’s OAuth App access, and consider enabling access restrictions on your org.**** We notified the Security Alerts mailing list about this vulnerability on May 1st, 2019.