aws

Gruntwork Newsletter, February 2018

Once a month, we send out a newsletter to all Gruntwork customers that describes all the updates we’ve made in the last month, news in the…
Gruntwork Newsletter, February 2018
Josh Padnick
Co-founder and CEO
Published December 7, 2017

Once a month, we send out a newsletter to all Gruntwork customers that describes all the updates we’ve made in the last month, news in the DevOps industry, and important security updates. Note that many of the links below go to private repos in the Gruntwork Infrastructure as Code Library and Reference Architecture that are only accessible to customers.

Hello Grunts,

We haven’t written a newsletter since last December, so we wish you a belated happy new year! In the last couple months, we updated all of our modules to work with Terraform 0.11, released a new health-checker module, and made lots of important bug fixes. Make sure to check out the Security Updates section for information about Spectre and Meltdown, as they are two of the most severe security vulnerabilities in recent history.

As always, if you have any questions or need help, email us at support@gruntwork.io!

Gruntwork Updates

Terraform 0.11 updates

Motivation: Terraform 0.11 came out, but it wasn’t possible to upgrade to it because it included a backwards incompatible change that broke many modules.

Solution: We’ve gone through all the modules in our Infrastructure as Code Library and updated them to work with Terraform 0.11.

What to do: It should now be safe to upgrade to Terraform 0.11. You’ll need to:

  1. Update the required_version setting (if you’re using it) in all the main.tf files in your infrastructure-modules repo.
  2. If you’re using modules from any of the Gruntwork repos in the list below, update their versions as specified in the list, as they all had to be updated to work with Terraform 0.11. Make sure to check the Releases page for each module to find the latest version number and see if there are any backwards changes:

gruntwork-io/module-aws-monitoring, v0.9.0 gruntwork-io/module-data-storage, v0.5.0 gruntwork-io/module-ecs, v0.6.1 gruntwork-io/module-security, v0.7.0 gruntwork-io/module-server, v0.3.0 gruntwork-io/module-vpc, v0.4.0 gruntwork-io/package-lambda, v0.2.0 gruntwork-io/package-messaging, v0.1.0

  1. Install the latest version of Terraform (0.11.3 as of this writing) and Terragrunt (v0.14.0 as of this writing) on all dev computers and CI servers.
  2. Run terragrunt plan on each of your modules and keep your eyes open for warnings like this: Warning: must use splat syntax to access xxx.yyy attribute “zzz”, because it has “count” set. This is the backwards incompatibility in Terraform 0.11 rearing its ugly head. We fixed it in all of our modules, but you’ll need to fix it in your own code in the infrastructure-modules repo. The fix is to find the offendingxxx.yyy resource that is using a count parameter and to update any references to it from xxx.yyy.zzz to element(concat(xxx.yyy.*.zzz, list("")), 0). For example, if you had an aws_instance called foo with a count parameter and you wanted to access the public_ip attribute, instead of looking it up by doing aws_instance.foo.public_ip, you’d have to do element(concat(aws_instance.foo.*.public_ip, list("")), 0) (see Terraform 0.11 upgrade guide for more info). Yes, it’s ugly.
  3. If you have CI scripts that call terraform apply or terragrunt apply, add the -auto-approve flag to them. By default, the apply command in Terraform 0.11 is interactive, so if you don’t add this flag, your CI build will hang!

Check out this commit (you must have access to the Acme sample Reference Architecture) for an example of the changes you’ll have to make.

health-checker

Motivation: While setting up the Confluent tools, we needed the ability to have a single health check report the uptime of multiple separate services.

Solution: We wrote a new open source tool, health-checker, that exposes an HTTP listener that responds with a 200 OK if a TCP connection can be successfully opened to one or more configurable ports.

What to do about it: If you need such a health-checker, consider downloading one of the binaries!

logrotate fixes

Motivation: The syslog module in module-aws-monitoring configures logrotate on all servers to automatically rotate and clean up old syslog files so they don’t take up too much disk space. Unfortunately, the configuration had an issue where a process could maintain a file handle to the old log file and continue writing to it, even after rotation, allowing that file to grow indefinitely and eat up lots of disk space.

Solution: We’ve fixed the logrotate config in module-aws-monitoring, v0.8.0 [Update: actually, please use module-aws-monitoring, v0.8.1 due to a minor bug fix] using the copytruncate and maxsize settings.

What to do about it: Update your Packer templates to use v0.8.1 of the syslog module and redeploy your servers to pick up the fix!

Oracle JDK fixes

Motivation: The install-oracle-jdk module was no longer working, so Packer builds that were trying to install JDK 8 (e.g., for ZooKeeper or Kafka) were failing.

Solution: It seems that Oracle deletes old versions of the JDK when it releases new ones, so the URLs used to install the JDK fail. Moreover, new versions of the JDK require you to specify the new checksum, which there’s obviously no way of getting ahead of time. For now, we’ve released v0.3.0 of package-zookeeper with a patched install-oracle-jdk module, but this is likely to break again in the future, so we may have to find a different way to manage JDK installs.

What to do about it: Update your Packer templates to use v0.3.0 of package-zookeeper.

CloudTrail / KMS changes

What happened: Whereas before, you could set the user and admin lists empty with our CloudTrail module, it seems that AWS has changed its validation logic and all KMS keys, including the ones used to encrypt CloudTrail logs, must now have at least one user and admin associated with them.

Why it matters: If kms_key_administrator_iam_arn or kms_key_user_iam_arns are empty (as they were, by default, in all accounts excepthe security account of the multi-account Reference Architecture), next time you run apply in a cloudtrail folder in infrastructure-live, you’ll get a validation error.

What to do about it: Specify the ARN of (trusted!) users in your security account for the kms_key_administrator_iam_arn and kms_key_user_iam_arns settings.

Terragrunt fixes

  • terragrunt, v0.13.24: The check for a backend { ... } block now also checks .tf.json files.
  • terragrunt, v0.13.25: Terragrunt will now properly read in state files from local backends.
  • terragrunt, v0.14.0: The apply-all command now automatically sets the -auto-approve parameter so applyhappens non-interactively with Terraform 0.11.

Other fixes

  • module-asg, v0.6.4: The server-group module now supports the option of adding DNS records to each ENI.
  • module-asg, v0.6.5: The server-group module now allows users to specify their own list of names to be used when creating DNS records, as well as to associate an Elastic IP address with each ENI.
  • module-aws-monitoring, v0.8.1 and v0.8.2: The ecs-cluster-alarms and ecs-service-alarms modules now expose new input variables you can use to configure what the alarms should do if no data is being emitted (default is missing).
  • module-data-storage, v0.4.1: Fix the default param group name for SQL server, which uses a different format than all the other DBs.
  • module-data-storage, v0.5.1: The aurora module now exposes a db_cluster_parameter_group_name parameter you can use to set a custom parameter group name.
  • module-ecs, v0.6.1: The roll-out-ecs-cluster-update.py script will now display better error messages if it can't find your ECS cluster for some reason (e.g., you specified the wrong region).
  • package-static-assets, v0.2.0: The cloudfront module now enables gzip compression by default.
  • package-openvpn, v0.5.1: The root volume size and type of the openvpn-server module are now configurable.
  • module-vpc, v0.4.1:You can now set different tags for each of the different types of subnets (public subnets, private app subnets, etc).
  • terraform-aws-vault, v0.1.0: Remove the s3 backend and use Consul for both storage and high availability.
  • terraform-aws-vault, v0.1.1: Fix permissions and symlinks to work on CentOS.
  • terraform-aws-consul, v0.1.1: Fix symlink issues on CentOS.
  • terraform-aws-nomad, v0.1.0: Update versions of Nomad and Consul.

DevOps News

AWS-Native Service Discovery with Route53

What happened: AWS announced an Auto Naming API for Service Name Management and Discovery with Route53.

Why it matters: Previously, the only way for Microservice A to connect to Microservice B in an “AWS-native” way was to go through an ALB, which added an extra network hop and made it difficult to run a single microservice with both public and private access endpoints.

With this release, AWS allows a microservice to register itself with Route53 on boot, and Route53 will confirm those addresses with either Route53 globally distributed health checks (for publicly accessible services) or API queries to your ALB or NLB to determine the health check status of your microservice instance.

That means that it’s no longer necessary to run a service like Consul just to gather an up-to-date list of microservice endpoints.

What to do about it: Feel free to play around with this on your own. In the future, we’ll plan to add support to our ECS Docker Cluster package (and in the future Fargate and EKS Docker Cluster packages) so that you can start using this out of the box!

AWS is rolling out Spectre/Meltdown fixes

What happened: After the Spectre and Meltdown vulnerabilities were announced, AWS has periodically been rolling out fixes.

Why it matters:**** Since these are hardware vulnerabilities in the CPU itself, the fixes often result in performance degradation. The impact is not entirely clear, but if in the last month you saw a sudden drop in CPU performance, despite absolutely no change on your behalf, it’s possible that this is a fix AWS has rolled out to protect you. See this post for an example.

What to do about it: There’s nothing you can do about the performance impact. However, you should also ensure you update the kernel on all of your servers—and install updates on all developer computers—to ensure these vulnerabilities don’t affect you.

AWS Auto Scaling

What happened: AWS has launched a new, unified way to manage Auto Scaling for EC2 instances, ECS, Aurora, DynamoDB, and other resources.

Why it matters: This gives you a more intuitive and centralized way to manage auto scaling for everything in your AWS account.

What to do about it: It does not look like Terraform supports this yet, so for now, you have to manually use the AWS Auto Scaling console.

Kubernetes now available in Docker for Mac

What happened:**** The latest version of the Docker app for Mac allows you to run Kubernetes locally!

Why it matters: If you’re planning on using Kubernetes in the cloud (e.g., when Amazon’s managed Kubernetes, EKS, hits general availability), being able to run it locally will make local development and testing easier.

What to do about it: Follow these docs to run a Kubernetes cluster locally.

AWS Lambda now supports Go

What happened: AWS has announced that it now officially supports writing Lambda functions in Go.

Why it matters: You no longer need hacks to run Go code using Lambda.

What to do about it: Deploy your lambda functions as always (e.g., using package-lambda), but specify go1.x as your runtime.

Terraform plugin for Ansible

What happened: Ansible has released a Terraform plugin.

Why it matters: If you’re using both Ansible and Terraform, this allows you to manage everything from Ansible.

What to do about it: Check out the Terraform plugin docs.

Publish RDS logs to CloudWatch

What happened: AWS now allows you to publish MySQL and MariaDB logs from RDS to CloudWatch Logs.

Why it matters: You can now see the general log, slow query log, audit log, and error log for your database directly in CloudWatch Logs, which will make it easier to debug and troubleshoot.

What to do about it: This feature is not yet supported in Terraform (see issue #3056), so if you’d like to use it, you’ll have to enable it manually in the AWS console.

Security Updates

Below is a list of critical security updates that may impact your services. We notify Gruntwork customers of these vulnerabilities as soon as we know of them via the Gruntwork Security Alerts mailing list. It is up to you to scan this list and decide which of these apply and what to do about them, but most of these are severe vulnerabilities, and we recommend patching them ASAP.

MAJOR security vulnerabilities: Meltdown and Spectre

  • https://meltdownattack.com/: Two major security vulnerabilities have been discovered that affect virtually all CPUs released in the last 20 years. These allow attackers to read any memory on your computer — potentially even from JavaScript code executing in your browser! Securing all the different attack vectors will likely take multiple patches. AWS has announced that it has patched just about all physical EC2 Instances on its end already. However, we strongly recommend that you also patch the OS running in your VMs, Docker containers, and personal computers ASAP. For example, for servers, run yum update kernel or apt-get upgrade in your Packer or Docker builds and roll out the new images to all your servers. For personal computers, install the latest Windows and OS X updates. More patches will likely be released in the near future, so keep your eyes open and be ready to update. We sent an email about this vulnerability to the security alerts mailing list on January 4, 2018.

Node.js

  • Security Release, December 2017: As an effect of CVE-2017–3737 in OpenSSL, Node.js was vulnerable to an attacker sending data directly to a Node.js application using the core TLS or HTTP/2 modules. This vulnerability did not affect the standard HTTP module or the HTTPS module, but did affect TLS all active Node.js release lines and in HTTP/2 in the Node.js 8.x and 9.x release lines.