Gruntwork Newsletter, October 2018
Once a month, we send out a newsletter to all Gruntwork customers that describes all the updates we’ve made in the last month, news in the…

Once a month, we send out a newsletter to all Gruntwork customers that describes all the updates we’ve made in the last month, news in the DevOps industry, and important security updates. Note that many of the links below go to private repos in the Gruntwork Infrastructure as Code Library and Reference Architecture that are only accessible to customers.
Hello Grunts,
In the last month, we hit a big milestone at Gruntwork: $1 million in annual recurring revenue! Then, we got right back to work, and made a huge number of updates, including making major changes to our ELK code to work around NLB limitations, updating Terratest so it can take a “snapshot” of your configs and logs to make it easier to debug test failures, updating Terragrunt so it automatically retries on errors that are known to be transient, fixing the perpetual diffs issue with S3 bucket lifecycle settings, adding support for Oracle Cloud Infrastructure to Terratest, and a huge number of other fixes and improvements. In other news, you can now use Yubikeys with AWS and the Oracle JDK now requires a paid support contract for production usage, so you may need to change JDKs soon.
As always, if you have any questions or need help, email us at support@gruntwork.io!
Gruntwork Updates
Gruntwork is now generating $1 million in annual recurring revenue
Motivation: Our mission is to make it 10x easier to understand, build, and deploy software. To do that at scale, we realized that we needed to build a sustainable company.
Solution: We created Gruntwork and began offering access to world-class infrastructure code, DevOps software, training, and support as a part of a subscription. This subscription is now bringing in over $1 million in annual recurring revenue (ARR). We are deeply grateful to our customers for making this possible.
What to do about it: Check out How we got to $1 million in annual recurring revenue with $0 in fundraising for all the details.
Major Release: ELK Package
Motivation: While using our ELK code the last couple months, we hit a few limitations with using an NLB as the load balancer of choice for our inter-cluster communication:
- An NLB can’t route requests back to the same node that initiated the request.
- The NLB cannot be accessed via VPC peering connection from most instance types. This makes it impossible to access the NLB from, for example, a VPN server in another VPC.
Solution: We replaced the NLB with an ALB for communication between clusters. However, since Filebeat can only communicate with Logstash on a pure TCP protocol, and the ALB only supports HTTP/HTTPS, we can’t use the ALB with Filebeat. To get around this issue, we came up with an auto discovery mechanism that resides on the application server. It runs as a cron job on the server, periodically looking up Logstash EC2 instance IPs using the AWS APIs, updating the Filebeat configuration with the IPs of the returned instances, and restarting Filebeat to load the new configuration. We also rely on Filebeat’s built-in load balancing feature to distribute requests among the Logstash instances.
What to do about it: This is a hugely backwards incompatible change and special care needs to be taken to ensure a smooth upgrade. The following steps are a good starting point:
- Remove your use of the
nlb
module and replace with analb
. See example here: https://github.com/gruntwork-io/package-elk/blob/master/examples/elk-multi-cluster/main.tf#L436 - Replace your use of the
load-balancer-target-group
module with newly addedload-balancer-alb-target-group
. See example of using the new module https://github.com/gruntwork-io/package-elk/blob/master/examples/elk-multi-cluster/main.tf#L71 - Finally, update the various
target_group_arns
arguments passed to the cluster modules. https://github.com/gruntwork-io/package-elk/blob/master/examples/elk-multi-cluster/main.tf#L40 - If you’re using SSL with the ALB, you’ll need to take note of the ALB upgrade notes: module-load-balancer, v0.12.0
Terratest can now help take a snapshot of your config/logs
Motivation: When a infrastructure test fails, to understand what went wrong, you typically need the logs and config files from your deployed apps and services. Currently, getting at this information is a bit of a pain: you’d need some way to run the tests, “pause” (i.e., not tear down) the infrastructure after a failure, ssh to individual instances, and then view the logs and config files to see what went wrong. This is hard to do, especially when your tests are running automatically on a CI server.
Solution: Terratest can now automate the task of taking a “snapshot” of your whole deployment by grabbing a copy of log files, config files, and any other files useful for debugging. If you configure your CI server correctly, you can make this “snapshot” easy to browse. For example, when one of our ELK automated tests fails, here is how we can use CircleCI to debug what went wrong:
What to do about it: Update your code to use Terratest v0.13.0 and then take a look at our example readme for a full walk-through of the functionality and how to use it.
Terragrunt will now automatically retry on transient errors
Motivation: Occasionally, when you run a command like terraform apply
, you get a transient/intermittent error, such as a TLS handshake timeout or CloudWatch concurrency error. If you just re-run apply
, the error goes away, but having to deal with these intermittent failures is frustrating, especially in CI environments, and especially when running many commands at once (e.g., via apply-all
).
Solution: We’ve updated Terragrunt to automatically retry commands when you hit an error that is known to be transient! There’s nothing for you to do to enable it: if Terragrunt recognizes the error, it will automatically re-run the last command up to a configurable number of times (default is 3) with a configurable sleep between retries (default is 5 seconds). You can find the list of known transient errors in auto_retry_options.go. We will add support for specifying a custom list of retryable errors in the future (if you want this feature soon, PRs are very welcome!).
What to do about it: Give Terragrunt v0.17.0 a shot and see if it makes your Terraform usage a little more stable and reliable. Check out the Auto Retry docs for more details, including how to configure retries and sleeps, and how to disable retry functionality if, for some reason, it doesn’t work with your use cases.
Fix perpetual diff errors with S3 buckets
Motivation: For a while, some of our modules that used S3 buckets with lifecycle settings would always show a diff when you ran plan
, even though nothing had changed.
Solution: Thanks to the help of one of our customers, we believe we’ve figured out the cause: you should not set both the expired_object_delete_marker
and days
parameters in an expiration
block. We’ve fixed this issue in our load-balancer-access-logs
and cloudtrail
modules.
What to do about it: To pick up these fixes, update to module-aws-monitoring, v0.9.3 and module-security, v0.15.2.
Terratest now supports OCI
Motivation: Terratest is Gruntwork’s swiss army knife for infrastructure testing. Last month, we updated Terratest with support for testing infrastructure on Google Cloud Platform (GCP). This month, someone wanted to use Terratest to test infrastructure on Oracle Cloud Infrastructure (OCI).
Solution:**** Terratest now has initial support for OCI! Check out packer_oci_example_test.go for an example.
What to do about it: Grab Terratest v0.12.0 and take the oci package for a spin.
Jenkins backup cleanup fix
Motivation: There was a bug in how we configured the code that cleans up old backups for Jenkins in the Reference Architecture. As a result, backups wouldn’t be cleaned up, and more and more snapshots would pile up over time.
Solution: The fix requires tweaking the value of a single parameter, delete_older_than
, from 15
to 15d
, as shown in this commit in the Acme sample Reference Architecture.
What to do about it: If you’re using Jenkins with the Reference Architecture:
- Update your
delete_older_than
parameter as shown above. - Publish a new version of your
infrastructure-modules
repo. - Run
terragrunt apply
in yourinfrastructure-live
repo to deploy the changes.
Package SAM updates
Motivation: There were several small bugs and no way to pass environment variables to AWS SAM CLI while testing locally.
Solution: We implemented some bug fixes and also added support for passing environment variables to AWS SAM CLI through the Swagger file.
What to do about it: To pick up these fixes, update to package-sam, v0.1.7.
Gruntwork Houston updates
We’ve made a number of updates to Gruntwork Houston in the last month:
- Documentation for Okta: We’ve added step-by-step documentation for how to use Okta as an identity provider with Houston so that you can login to AWS via the web, CLI, VPN, and SSH using your Okta credentials.
houston-cli, v0.0.7
: Added the ability to create and setup thehouston
configuration from the command line using the newly introducedhouston configure
command.houston-cli, v0.0.8
: Improved help text output and bugfix tohoustonUrl
in config file to allow trailing slashes.
Are you interested in joining the Houston beta? Email us at info@gruntwork.io!
ELK updates
In addition to the NLB replacement mentioned at the top of this newsletter, we also made a number of other updates to package-elk in the last month:
- package-elk: v0.2.1: Added
iam_role_id
as an output variable for thelogstash-cluster
module. This variable is useful for adding ssh-grunt IAM policies to this ASG - package-elk: v0.2.2: Added a missing
=
character to a terraformlocal
declaration. There was some inconsistent behavior with some customers reporting issues as a result while other tests running and passing without issue. - package-elk: v0.2.3: Added options to Kibana cluster module to pass in ui & ssh security group ids (along with num of ui & ssh security group ids)
- package-elk: v0.2.4: Added pass through plumbing in
logstash-cluster
for passing through allowed security groups for collectd and beats to the underlyinglogstash-security-group-rules
module. This is very handy for specifying allowed security groups without having to have a 2ndlogstash-security-groups module
. - package-elk: v0.2.5: Fixed improperly passing
allowed_ssh_security_group_ids
toaws_launch_configuration
resources in both kibana and elastalert modules. Also added proper plumbing forallow_ssh_from_security_group_ids
to be specified in theelastalert
module and then be passed all the way through to the underlyingelastalert-security-group-rules
module - package-elk: v0.2.6: This release addresses issue: #57.
kibana-cluster
will now create egress rules for the security group that it creates. Stabilized the ELK tests. Added better documentation/clarified examples with our AMI and example code READMEs - package-elk: v0.2.7: Pass through the security groups allowed for service discovery so that we can set that right on the main module. Also renamed
vars.tf
tovariables.tf
Terragrunt updates
We made a number of other updates to Terragrunt in the last month:
- Terragrunt, v0.16.9: Add support for
force_path_style
in the S3 config. Add support for skipping S3 bucket versioning via theskip_bucket_versioning
config. - Terragrunt, v0.16.10: Terragrunt will now properly respect the
shared_credentials_file
config for S3 backends, using it when creating S3 buckets and DynamoDB tables. - Terragrunt, v0.16.11: You can now tell Terragrunt to exclude specific subdirectories when running the
xxx-all
commands (e.g.,apply-all
) by using the--terragrunt-exclude-dir
flag. This flag supports wildcard expressions and may be specified multiple times. - Terragrunt, v0.16.12: Fix the
prevent_destroy
flag so it works even when configs are inherited from a parent.tfvars
file. - Terragrunt, v0.16.13: When you use
extra_arguments
, Terragrunt will no longer pass-var
or-var-file
arguments to Terraform when you callapply
with a plan file. - Terragrunt, v0.16.14: This is a follow-up to
v0.16.13
that fixes a bug where-var
and-var-file
were still passed if you calledapply
with a plan file and other arguments in between (e.g.,terragrunt apply <other args> <plan file>
).
Terratest updates
We made a number of other updates to Terratest in the last month:
- Terratest, v0.10.4: Terratest now has methods for running
terraform plan
and extracting the exit code, includingInitAndPlan
andPlanExitCode
. - Terratest, v0.12.1: Added new helper methods
ScpFileFrom
andScpDirFrom
that will allow for the transfer of files from remote EC2 instances to the local machine. The main idea with these helper methods is to make it easy to tellterratest
to grab all of the various log and config files from your app running on some remote machine in the case that a test is going to fail. We already had methods interratest
that would grab the contents of those files and return the contents as string. The new methods introduced in this release expand upon that functionality and open up the possibility of easily grabbing and archiving all of the log and configuration files on your CI of choice. - Terratest, v0.12.2: Added the
WorkspaceSelectOrNew
method that can be used to create and select Terraform workspaces at test time.
Other open source updates
- terraform-aws-consul, v0.4.0: Important updates to the way security group rules are managed in the
consul-cluster
,consul-security-group-rules
, andconsul-client-security-group-rules
modules. - terraform-aws-vault, v0.10.3: The
vault-security-group-rules
module now adds aself
rule so that Vault servers can talk to each other via their API port. - terraform-aws-nomad, v0.4.5: You can now add EBS Volumes to your Nomad cluster by configuring the new
ebs_block_device
parameter in thenomad-cluster
module. - gruntwork-cli: v0.2.0: Added a custom
HelpPrinter
function that will wrap help text at specified line width, while preserving indentations in the output table. To use, you can callentrypoint.NewApp()
to construct the cli app which will take care of applying the modifications, or manually apply the changes yourself on thecli
app. You can also modify the line width by changingentrypoint.HelpTextLineWidth
(defaults to 80).
Other updates
- module-load-balancer, v0.12.0: Updated the ALB module to accept two new variables
https_listener_ports_and_acm_ssl_certs_num
andhttps_listener_ports_and_ssl_certs_num
to specify the length of the mappings between ports and their associated (non)acm certificates. This allows the values of the mappings to be dependent on dynamic resources. See: hashicorp/terraform#11482 - module-ecs, v0.8.4: The
ecs-service-with-discovery
module now outputs the security group ID via the output variableecs_task_security_group_id
. - module-ecs, v0.8.5: You can now configure volumes for the
ecs-service
module using the newvolumes
parameter. - package-lambda, v0.2.3: Add a new parameter called
wait_for
to thelambda
module. All the resources in the module will not be created untilwait_for
is resolved, which allows you to execute other steps (e.g., create zip file) before this module runs. This is a workaround for the lack ofdepends_on
for modules in Terraform. - module-asg: v0.6.16: Handled a possible concurrency issue that can cause a fatal exception while multiple processes attempt to unzip the
boto3
library zip file inget-desired-capacity.py
. We will now attempt to unzip the archive and catch any exception, and if it is the exception related to our concurrency issue, simply sleep for 5 seconds and try again. - cloud-nuke, v0.1.3: This release improves the nuking strategy by ensuring deletion functions doesn’t return when they encounter an error on just a single resource. It also added the ability to nuke ASG launch configurations.
- module-aws-monitoring, v0.9.2: The
cloudwatch-log-aggregation-scripts
,cloudwatch-memory-disk-metrics-scripts
, andsyslog
modules now support Amazon Linux 2. - package-openvpn, v0.8.0:
package-openvpn
now uses bash-commons under the hood. The behavior is identical, but you must now installbash-commons
before installing any of thepackage-openvpn
modules.
DevOps News
AWS now supports Yubikey for MFA
What happened: AWS now supports the Yubikey as a Multi-Factor Auth device.
Why it matters: The Yubikey is a tiny hardware USB device that supports a range of security functionality, including generating one-time passwords that can be used for Multi-Factor Authentication (MFA). It’s easier to use and (arguably) more secure than other MFA options, such as using the Google Authenticator app on your phone.
The way it works is you (or your company) buy a Yubikey and register it with (a) Yubico’s online service and (b) the online service you’re trying to log into, such as AWS. Then, whenever you’re logging into your online service, it will ask you not only for a username and password, but also a Yubikey token. To enter the token, you simply click on the text field in your browser, push a button on the Yubikey itself, and it will automatically enter the token for you (the Yubikey behaves as a USB keyboard), without you having to take your phone out of your pocket or type anything in manually. The web service will then check your token with the Yubikey service, and if it’s valid, allow you to login.
What to do about it: If you wish to start using a Yubikey with AWS, follow the instructions here.
As of version 11, Oracle JDK will no longer be free
Motivation: Oracle has released Java 11, but the terms come with a catch: you may no longer use Oracle’s JDK for commercial or production purposes without a paid support contract from Oracle.
Why it matters: For many years, the Oracle JDK was the recommended JDK for most Java apps, as it was the best maintained, had all the bells and whistles, and gave you the option to purchase support from Oracle. While you can still use the Oracle JDK for developing, testing, prototyping, and learning, the support contract is now no longer optional for production or commercial usage.
What to do about it: If you don’t want to pay Oracle for a support contract, you need to move to one of the flavors of OpenJDK:
- Oracle OpenJDK (note, this is not the same as their commercial JDK!)
- AdoptOpenJDK
- Zulu OpenJDK
The good news is that OpenJDK is more or less identical to Oracle JDK these days, so this should not generally cause issues. We will be updating our code (namely, the JDK installer in package-zookeeper) to use one of the OpenJDK flavors in the future.
RDS now supports deletion protection
What happened: Amazon has added support for deletion protection for RDS and Aurora databases.
Why it matters: You can turn on deletion protection with a single click (or single line of code). Once enabled, if you try to delete a database with deletion protection, you get an error (the only way to delete such a database is to explicitly disable deletion protection). This provides an extra sanity check to help protect your production databases from accidental deletion (e.g., accidental terraform destroy
).
What to do about it: You can enable deletion protection via the UI now. We’ll be exposing a flag to enable this feature in module-data-storage in the future (if you need it sooner, PRs are welcome!).
ElastiCache for Redis now supports read replicas for sharded Redis
What happened: Amazon has announced that ElastiCache for Redis now supports adding and removing read replica nodes for both sharded and non-sharded Redis clusters.
Why it matters: This makes it easier to scale your reads and improve availability for your Redis Cluster environments without requiring manual steps or needing to make application changes.
What to do about it: Check out the announcement blog post for the details.
Security Updates
Below is a list of critical security updates that may impact your services. We notify Gruntwork customers of these vulnerabilities as soon as we know of them via the Gruntwork Security Alerts mailing list. It is up to you to scan this list and decide which of these apply and what to do about them, but most of these are severe vulnerabilities, and we recommend patching them ASAP.
Jenkins
- Jenkins Security Advisory 2018–09–25: A number of vulnerabilities have been found in Jenkins plugins. We did not notify the Gruntwork Security Alerts mailing list, as most of these vulnerabilities are of “low” or “medium” severity, except for one: the Monitoring Plugin has a vulnerability that allows an attacker to send crafted requests to a web application for extraction of secrets from the file system, server-side request forgery, or denial-of-service attacks. If you are using this plugin, we recommend updating immediately.