Browse the Repo

file-type-icon.circleci
file-type-icon_ci
file-type-icon_docs
file-type-iconexamples
file-type-iconmodules
file-type-iconaws-helpers
file-type-iconbuild-helpers
file-type-iconcheck-url
file-type-iconcircleci-helpers
file-type-iconec2-backup
file-type-iconecs-deploy-runner-invoke-iam-policy
file-type-iconecs-deploy-runner-standard-configuration
file-type-iconecs-deploy-runner
file-type-icon_docs
file-type-icondocker
file-type-iconentrypoint
file-type-iconinvoker-lambda
file-type-iconREADME.adoc
file-type-iconcore-concepts.md
file-type-iconmain.tf
file-type-iconmain_ecs.tf
file-type-iconmain_lambda.tf
file-type-iconoutputs.tf
file-type-iconvariables.tf
file-type-icongit-helpers
file-type-icongruntwork-module-circleci-helpers
file-type-iconiam-policies
file-type-iconinfrastructure-deploy-script
file-type-iconinfrastructure-deployer
file-type-iconinstall-jenkins
file-type-iconjenkins-server
file-type-iconkubernetes-circleci-helpers
file-type-iconterraform-helpers
file-type-icontest
file-type-icontestdep
file-type-icon.gitignore
file-type-icon.pre-commit-config.yaml
file-type-iconCODEOWNERS
file-type-iconLICENSE.txt
file-type-iconREADME-CircleCI.adoc
file-type-iconREADME-Jenkins.adoc
file-type-iconREADME-Terraform-Terragrunt-Pipeline.adoc
file-type-iconREADME-TravisCI.adoc
file-type-iconREADME.adoc
file-type-iconsetup.cfg
file-type-iconterraform-cloud-enterprise-private-module-...

Browse the Repo

file-type-icon.circleci
file-type-icon_ci
file-type-icon_docs
file-type-iconexamples
file-type-iconmodules
file-type-iconaws-helpers
file-type-iconbuild-helpers
file-type-iconcheck-url
file-type-iconcircleci-helpers
file-type-iconec2-backup
file-type-iconecs-deploy-runner-invoke-iam-policy
file-type-iconecs-deploy-runner-standard-configuration
file-type-iconecs-deploy-runner
file-type-icon_docs
file-type-icondocker
file-type-iconentrypoint
file-type-iconinvoker-lambda
file-type-iconREADME.adoc
file-type-iconcore-concepts.md
file-type-iconmain.tf
file-type-iconmain_ecs.tf
file-type-iconmain_lambda.tf
file-type-iconoutputs.tf
file-type-iconvariables.tf
file-type-icongit-helpers
file-type-icongruntwork-module-circleci-helpers
file-type-iconiam-policies
file-type-iconinfrastructure-deploy-script
file-type-iconinfrastructure-deployer
file-type-iconinstall-jenkins
file-type-iconjenkins-server
file-type-iconkubernetes-circleci-helpers
file-type-iconterraform-helpers
file-type-icontest
file-type-icontestdep
file-type-icon.gitignore
file-type-icon.pre-commit-config.yaml
file-type-iconCODEOWNERS
file-type-iconLICENSE.txt
file-type-iconREADME-CircleCI.adoc
file-type-iconREADME-Jenkins.adoc
file-type-iconREADME-Terraform-Terragrunt-Pipeline.adoc
file-type-iconREADME-TravisCI.adoc
file-type-iconREADME.adoc
file-type-iconsetup.cfg
file-type-iconterraform-cloud-enterprise-private-module-...
EC2 backup

EC2 backup

Snapshot your EC2 instances on a scheduled basis.

Code Preview

Preview the Code

mobile file icon

core-concepts.md

down

Core Concepts

Overview

This module packages various scripts for infrastructure deployments (see What container is used for the deploy task? for more info) into an ECS task that streams its outputs to CloudWatch, with an AWS Lambda function that can invoke that task. CI servers can then be configured to directly invoke the lambda function to trigger the deployment and stream the output from CloudWatch.

The sequence of events is as follows:

ECS Deploy Task sequence diagram

By insulating the deploy script from the CI server, we are able to avoid granting IAM permissions to the CI servers that are required for deploying against the target accounts. Instead, the CI servers only need enough permissions to trigger the deployment. Refer to the Threat Model for more information.

Threat model of the deploy runner

To implement a CI/CD pipeline for infrastructure code, it is required that the ultimate entity or system running the infrastructure code has the permissions to deploy the infrastructure defined by code. Unfortunately, to support arbitrary CI/CD workflows, it is necessary to grant wide ranging permissions to the target environment. As such, it is important to consider ways to mitigate potential attacks against the various systems involved in the pipeline to avoid attackers gaining access to deploy targets, which could be catastrophic in the case of a breach of the production environment.

Here we define our threat model to explicitly cover what attacks are taken into consideration in the design, as well as what attacks are not considered. The goal of the threat model is to be realistic about the threats that are addressable with the tools available. By explicitly focusing attention on more likely and realistic threats, we can avoid overengineering and compromising the usability of of the solution against threats that are unlikely to exist (e.g a 5 person startup with 100 end users is unlikely to be the subject of a targeted attack by a government agency).

In this design, the following threat assumptions are made:

  • Attackers can originate from both external and internal sources (in relation to the organization).
  • External attacks are limited to those that can get full access to a CI environment, but not the underlying source code. Note that any CI/CD solution can likely be compromised if an attacker has access to your source code.
  • Internal attackers are limited to those with restricted access to the environments. This means that the threat model does not consider attackers with enough privileges to already have access to the deploy target accounts (e.g an internal ops admin with full access to the prod environment). However, an internal attacker with permissions in the dev environment trying to elevate their access to the prod environment is considered.
  • Similarly, internal attackers are limited to those with restricted access in the CI environment and git repository. A threat where the internal attackers can bypass admin approval in a CI pipeline or can force push deployment branches is not considered.
  • Internal attackers can have access to the CI environment and the underlying code of the infrastructure (e.g the git repository).

Given the threat assumptions, the following mitigations are baked into the design:

  • Minimal access to target environments: Attackers that gain access to the underlying AWS secrets used by the CI environments will at most have the ability to run deployments against a predefined set of code. This means that external attackers who do not have access to the source code will at most be able to: (a) deploy code that has already been deployed before, (b) see the plan of the infrastructure between two points of time. They will not be able to write arbitrary infrastructure code to read DB secrets, for example. It is important to note that the IAM policies are set up such that the IAM user for CI only has access to trigger predefined events. They do not have access to arbitrarily invoke the ECS task, as that could potentially expose arbitrary deployments by modifying the command property (e.g use command to echo some infrastructure code and run terraform).

    • Note that there is still risk of rolling back the existing infrastructure by attempting to deploy a previous version. See below for potential ways to mitigate this type of attack.
    • Similarly, this alone does not mitigate threats from internal attackers who have access to the source code, as a potential attacker with access to the source code can write arbitrary code to destroy or lookup arbitrary infrastructure in the target environment. See below for potential ways to mitigate this type of attack.
  • Minimal options for deployment: The Lambda function exposes a minimal interface for triggering deployments. Attackers will only be able to trigger a deployment against a known repo and known git refs (branches, tags, etc). To further limit the scope, the lambda function can be restricted to only allow references to repositories that matches a predefined regular expression. This prevents attackers from creating an open source repo with malicious code that they subsequently deploy by pointing the deploy runner to it.

  • Restricted Refs for apply: Since many CI systems depend on the pipeline being managed as code in the same repository, internal attackers can easily circumvent approval flows by modifying the CI configuration on a test branch. This means that potential attackers can run an apply to destroy the environment or open backdoors by running infrastructure code from test branches without having the code approved. To mitigate this, the Lambda function allows specifying a list of git refs (branches, tags, etc) as the source of apply and apply-all. If you limit the source of apply to only protected branches (see below), it prevents attackers from having the ability to run apply unless it has been reviewed.

  • CI server does not need access to the source code: Since the deployments are being done remotely in an ECS task, the actual CI server does not need to clone the underlying repository to deploy the infrastructure. This means that you can design your CI pipeline to only have access to the webhook events and possibly the change list of files (to know which module to deploy), but not the source code itself. This can further decrease the effect of a potential breach of the CI server, as the attacker will not have the ability to read or modify the infrastructure code to use the pipeline to their advantage.

These mitigations alone will not prevent all attacks defined in the threat model. For example, an internal attacker with access to the source code can still do damage to the target environments by merging in code that removes all the infrastructure resources, thereby destroying all infrastructure when the apply command is run. Or, they could expose secrets by writing infrastructure code that will leak the secrets in the logs via a local-exec provisioner. Note that that any CI/CD solution can likely be compromised if an attacker has full access to your source code.

For these types of threats, your best bet is to implement various policies and controls on the source control repository and build configurations:

  • Only deploy from protected branches: In most git hosting platforms, there is a concept of protected branches (see GitHub docs for example). Protected branches allow you to implement policies for controlling what code can be merged in. For most platforms, you can protect a branch such that: (a) it can never be force pushed, (b) it can never be merged to or commit to from the cli, (c) merges require status checks to pass, (d) merges require approval from N reviewers. By only building CI pipelines from protected branches, you can add checks and balances to ensure review of potentially harmful infrastructure actions.

  • Require approval in CI build steps: If protected branches is not an option, you can implement an approval workflow in the CI server. This can mitigate attacks such that attackers will need enough privileges on the CI server to approve builds in order to actually modify infrastructure. This can mitigate potential attacks where the attacker has access to the CI server to trigger arbitrary builds manually (e.g to run a previous job that is deplying an older version to roll back the infrastructure), but not enough access to approve the job. Note that this will not mitigate potential threats from internal attackers who have enough permissions to approve builds.

  • Avoid logging secrets: Our threat model assumes that attackers can get access to the CI servers, which means they will have access to the deployment logs. This will include detailed outputs from a terraform plan or apply. While it is impossible to prevent terraform from leaking secrets into the state, it is possible to avoid terraform from logging sensitive information. Make use of pgp encryption functions or encrypted environment variables / config files (in the case of service deployments) to ensure sensitive data does not show up in the plan output. Additionally, tag sensitive outputs with the sensitive keyword so that terraform will mask the outputs.

  • Consider a forking based workflow for pull requests: For greater control, you can consider implemmenting a forking based workflow. In this model, you only allow your trusted admins to have access to the main infrastructure repo, but anyone on the team can read and fork the code. When non-admins want to implement changes, instead of branching from the infra repo they will fork the repo, implement changes on their fork, and then open a PR from the fork. The advantage of this approach is that many CI platforms do not automatically run builds from a fork for security reasons. Instead, admins manually trigger a build by pushing the forked branch to an internal branch. While this is an inconvenience to devs as you won't automatically see the plan, it prevents unwanted access to secrets by modifying the CI pipeline to log internal environment variables or show infrastructure secrets using external data sources.

Operations

Which launch type should I use?

The ECS deploy runner supports both Fargate and EC2 launch types. When running in Fargate mode, each ECS task is spun up on demand for each invocation. This means that you will only pay for the container runtime for the duration of the task. Additionally, concurrency of the jobs is only limited by the maximum number of Fargate tasks AWS allows you to run at a given point in time (default is 100). This means that you don't need to worry about scaling your capacity on demand, allowing you to minimize your costs. This works best when you have the need to run many deployments in parallel across multiple containers, or if you have a sparse work schedule where your builds run for a limited time each day.

The EC2 launch type will deploy a cluster of EC2 instances to run the tasks on. This launch type reserves VMs to host the tasks which cuts down the container image download time and VM boot up time of the ECS task. However, the start up time is traded off with the cost of keeping the resources up longer than the task run times, as well as the inability to scale up and down on demand. This works best when you have short deployment times where the start up time of Fargate containers is relatively expensive.

The following is a table summarizing the differences:

Feature Fargate EC2
Pay only for runtime
Serverless
Autoscaling ⚠️ (Requires optimization for each environment)
Cached images
Time to boot Minutes 10s of seconds

What container is used for the deploy task?

Any container specified in container_images can be used for the deploy task. You can also specify multiple containers for a single ECS Deploy Runner stack. This is useful when using specialized third party containers for deployment tasks that are not directly supported by the Gruntwork deploy runner container (e.g., kaniko for building Docker images).

For convenience, we provide Dockerfiles (defined in the subfolder docker) to build containers that have a set of tools that are most commonly used in infrastructure projects that depend on Gruntwork modules. There are two Dockerfiles in the folder:

deploy-runner

This container is an Ubuntu 18.04 image that contains the following trigger scripts:

and tools:

  • git
  • terraform
  • terragrunt
  • kubergrunt
  • packer
  • git-add-commit-push (from the git-helpers module)

Note that you will only be allowed to invoke the scripts in the trigger directory (/opt/ecs-deploy-runner/scripts) if you use the standard configuration (see What configuration is recommended for container_images? for more details).

If your infrastructure code requires additional tools, you can customize the runtime environment by building a new container and providing the image reference to this module using the container_images input variable.

To build the docker container, follow the following steps:

  1. Set the GITHUB_OAUTH_TOKEN environment variable to a read only machine user with access to Gruntwork.
  2. Change working directory to the docker folder of this module (modules/ecs-deploy-runner/docker from the root of the repo).
  3. Run: docker build --build-arg GITHUB_OAUTH_TOKEN --tag gruntwork/ecs-deploy-runner .

kaniko

The ECS Deploy Runner uses ECS Fargate to run the infrastructure code. However, ECS Fargate does not support bind mounting the docker sock to use Docker in Docker for building images. As such, it is currently not possible to build docker images directly in ECS Fargate. Instead, we use an indirect method with a tool called kaniko. Kaniko is a binary that was originally built for building docker images in Kubernetes, but it supports any platform where docker in docker is not supported.

We need a specialized kaniko container for the ECS deploy runner that is setup to push the built docker images to ECR. In addition to the kaniko command, our version contains:

  • A configuration file to setup the Amazon ECR Credential Helper so that kaniko can authenticate to AWS for pushing images to ECR.
  • A trigger command to wrap the kaniko command to simplify the args for AWS based CI/CD use cases.
  • An entrypoint script that is compatible with the ecs-deploy-runner for enforcing security restrictions around what commands can be invoked in the container.

The ECS Deploy Runner stack supports a wide range of configuration options for each container to maximize the security benefits of the stack. For example, we provide configuration options for controlling which options and arguments to allow for each script in a container. This flexibility allows the stack to adapt to almost all CI/CD use cases, but at the expense of requiring time and effort to figure out the best options to minimize the security risk of the stack.

For convenience, we provide container configurations that are distilled to a set of user friendly options (e.g., infrastructure_live_repositories as opposed to hardcoded_options) that you can use to configure a canonical ECS Deploy Runner stack that can be used with most infrastructure and application CI/CD workflows. You can use the ecs-deploy-runner-standard-configuration module for this purpose.

The standard configuration will set up:

  • A docker-image-builder ECS task using the kaniko container with recommended script configurations for restricting what repos can be used to build containers.
  • An ami-builder ECS task using the deploy-runner container that is restricted to only running build-packer-artifact. The task has recommended script configurations for restricting what repos can be used to build AMIs.
  • A terraform-planner ECS task using the deploy-runner container that has recommended script configurations to restrict the container to only allow running plan actions with the infrastructure-deploy-script.
  • A terraform-applier ECS task using the deploy-runner container that has recommended script configurations to restrict the cotnainer to only allow running apply actions with the infrastructure-deploy-script. Additionally, this container can be used to run terraform-update-variable if variables need to be updated for a deployment.
  • Secrets Manager entries that are passed into the containers as environment variables.

How do I use the ECS Deploy Runner with a private VCS system such as GitHub Enterprise?

If you try using the ECS Deploy Runner docker container with a private VCS system such as GitHub Enterprise, you might get an error message indicating that the SSH host was not verified. This is expected because we enable SSH host verification when accessing Git repos via SSH in the container. This means that the host keys must be validated beforehand at container creation time.

This is done by copying a precompiled list of host keys for each of the major VCS systems in the docker/known_hosts file. Each entry was added using the ssh-keyscan CLI utility that comes with openssh. To add the host key for your private VCS server, run the following command to add it to the known_hosts file:

# Run at root of repo
ssh-keyscan -t rsa DOMAIN_OF_VCS_SERVER >> ./modules/ecs-deploy-runner/docker/known_hosts

Then, build the container using the steps outlined in What container is used for the deploy task?

What scripts can be invoked as part of the pipeline?

The pipeline assumes every docker container is equipped with the deploy-runner entrypoint command (see the entrypoint directory for the source code). This is a small go binary that enforces the configured trigger directory of the Docker container by making sure that the script requested to invoke actually resides in the trigger directory. This enforcement ensures that the ECS tasks with powerful IAM permissions can only be used for running specific, pre-defined scripts.

This entrypoint should be configured on the Docker container in the Dockerfile using the ENTRYPOINT directive so that the ECS task automatically passes through the command args without the option to override it.

You can install the entrypoint command and configure the trigger directory using the gruntwork-installer. Note that the install script assumes you have a working go compiler in the PATH. See the Dockerfile for the deploy-runner and the kaniko containers for an example of how to do this in your custom Dockerfile.

Once deployed, you can use the infrastructure-deployer CLI to look up the supported scripts in a given container. Refer to How do I invoke the ECS deploy runner for more information.

How do I restrict what args can be passed into the scripts?

This module exposes a detailed configuration object for each container passed into container_images that can be used to configure restrictions on the args that can be passed to the script. This is done through the script_config attribute in each entry of the container_images map. Refer to the variables.tf documentation for the script_config map to see the type signature and what attributes you can set on the configuration.

Each entry in the script_config map corresponds to a script in the trigger directory, with the key referencing the script name. These options can be used to implement complex restrictions for each script to avoid allowing a user to invoke arbitary code with the assigned IAM credentials of the container. Note that by default if a script is not included in the configuration map, it will not allow any arg to be passed in.

For example, the following is a simplified version of the script configuration setup for the infrastructure-deploy-script in the terraform-applier task:

infrastructure-deploy-script = {
  hardcoded_options = {
    repo                    = var.terraform_applier.infrastructure_live_repositories
    allowed-apply-refs-json = [jsonencode(var.terraform_applier.allowed_apply_git_refs)]
  }
  hardcoded_args        = []
  allow_positional_args = false
  allowed_options = [
    "--log-level",
    "--ref",
    "--deploy-path",
    "--binary",
    "--command",
    "--command-args",
  ]
  restricted_options = []
  restricted_options_regex = {
    command = "apply(-all)?"
  }
}

Note the following:

  • The configuration hardcodes the repo arg and disallows the user from setting that value. This ensures that a user can not change the source of the code by passing in an arbitrary repository with --repo.
  • The configuration also hardcodes the allowed-apply-refs-json arg to ensure that the user can not run apply from any git ref that isn't approved. This ensures that the user can't modify the CI script in the infrastructure repo to trigger an apply on unreviewed code.
  • The configuration also disables positional args. This isn't strictly necessary as the infrastructure-deploy-script does not support positional args, but is good practice to avoid potential vulnerabilities.
  • The configuration allows setting the deploy-path, binary, command, and command-args options, which allow for flexibility in the workflow (e.g., running terragrunt plan on a specific path with the -no-color option).
  • The configuration restricts the command option to only allow apply. This ensures that the user can't use this container for the plan action, which can run on any branch and thus allows arbitrary code execution with powerful IAM credentials intended for deploying infrastructure.

Here is another example from the standard configuration (build-packer-artifact):

build-packer-artifact = {
  hardcoded_options     = {}
  hardcoded_args        = []
  allow_positional_args = false
  allowed_options = [
    "--packer-template-path",
    "--build-name",
    "--var",
  ]
  restricted_options = []
  restricted_options_regex = {
    packer-template-path = "^git::(${local.ami_repositories_as_regex})//.+"
  }
}

This config:

  • Allows build-name, var, and packer-template-path to be set by the user.
  • packer-template-path is restricted to only build from a git repo, and only those repos that were passed in. However, any subpath and ref in those repos are allowed.

What are the IAM permissions necessary to trigger a deployment?

You can use the ecs-deploy-runner-invoke-iam-policy module to create an IAM policy that grants the minimal permissions necessary to trigger a deployment, check the status of the deployment, and stream the logs from that deployment.

How do I stream logs from the deployment task?

The ECS task is configured to stream the stdout and stderr logs from the underlying container running the deploy script to CloudWatch Logs under a deterministic name. You can use the predetermined name to find and stream the log outputs from the CloudWatch Log Group and Stream.

Note that this will be done automatically for you when you invoke a deployment using the infrastructure-deployer CLI.

How do I trigger a deployment?

This module configures an ECS task definition to run infrastructure deployments using the deploy script provided in the infrastructure-deploy-script module. Additionally, this module will configure an AWS Lambda function to be able to trigger the ECS task. You can read more about the architecture in the Overview and Threat Model sections of this doc, including the reasoning behind introducing Lambda instead of directly invoking the ECS task.

Given that, to trigger a deployment, you need to invoke the deployment Lambda function. This can be done by using the deployment CLI in the infrastructure-deployer module. For example, to invoke a plan action for the module dev/us-east-1/services/my-service with version v0.0.1 of the code using the standard configuration:

infrastructure-deployer --aws-region us-east-2 -- \
    terraform-planner \
    infrastructure-deploy-script \
    --ref v0.0.1 \
    --deploy-path dev/us-east-1/services/my-service \
    --command plan \
    --binary terraform

This will:

  • Invoke the deployment lambda function
  • Wait for the ECS task to start
  • Stream the logs from the ECS task to stdout and stderr
  • Wait until the task finishes
  • Exit with the exit code provided by the task

Refer to the infrastructure-deployer module doc for more information.

How do I trigger a deployment from CI?

AWS Lambda currently does not have direct integrations with version control tools. Therefore, there is no easy way to configure automated git flows to direclty invoke the Lambda function. Instead, you should configure a CI build system (e.g Jenkins, CircleCI, Gitlab) to invoke the deployment task using the infrastructure-deployer CLI to perform the deployment actions. Refer to the How do I trigger a deployment? section of the docs for more information.

You can read more about the architecture in the Overview and Threat Model sections of this doc, including the reasoning behind introducing AWS Lambda instead of directly invoking the ECS task.

To summarize:

  • Use existing CI servers (e.g Jenkins, CircleCI, Gitlab) to integrate your workflow with version control
  • Use this module to set up an ECS task to run your deployments via a trigger Lambda function
  • Use the infrastructure-deployer CLI in your CI builds to invoke the ECS task (via Lambda) and stream the logs

When you integrate all the components together, users can now trigger deployments when they merge infrastructure code. For example, here is an example workflow that can be configured (where USER denotes user actions, BUILD denotes CI build server actions, and ECS denotes actions by the ECS task):

  • USER: writes some Terraform code and commit it to a git branch
  • BUILD: git commit triggers a build job in CI
  • USER: logs into the CI server to see the build job
  • USER: clicks the build job to see the build output
  • BUILD: call out to the infrastructure-deployer to trigger a deployment task
  • BUILD: the infrastructure-deployer invokes the trigger Lambda function, which in turns create the ECS task
  • ECS: run the desired action (terraform/terragrunt plan/apply), streaming output to CloudWatch logs
  • BUILD: the infrastructure-deployer finds the CloudWatch Logs and streams the logs to stdout of the build server.
  • USER: sees the logs streamed from the infrastructure-deployer in the CI server UI
  • ECS: the task exists
  • BUILD: the infrastructure-deployer detects the task has finished and exits as well, exiting with the same exit code as the task
  • USER: sees if deployment succeeded or failed

How do I provide access to private git repositories?

Since we are not running the deployment from the CI server directly, you can't use the SSH key management mechanisms provided by each CI server. Instead, you must store the private SSH key in AWS Secrets Manager so that it can be shared with the ECS task at runtime. This secret is automatically injected by the ECS container agent as an environment variable when the task is first started.

You can learn more about how the secret is added to ECS in the official documentation from AWS.

In the standard configuration, we will setup the expected environment variables for each container based on the entries provided to the secrets_manager_env_vars input variables of the corresponding task configuration. We recommend the following settings for each container:

docker_image_builder = {
  secrets_manager_env_vars = {
    GIT_USERNAME = "ARN of secrets manager entry containing github personal access token for private repos containing Dockerfiles."
    GITHUB_OAUTH_TOKEN = "ARN of secrets manager entry containing github personal access token for use with gruntwork-install during docker build."
  }
}

ami_builder = {
  secrets_manager_env_vars = {
    GITHUB_OAUTH_TOKEN = "ARN of secrets manager entry containing github personal access token for use with gruntwork-install during docker build."
  }
}

terraform_planner = {
  secrets_manager_env_vars = {
    DEPLOY_SCRIPT_SSH_PRIVATE_KEY = "ARN of secrets manager entry containing raw contents of a SSH private key for accessing private repos containing infrastructure live configuration."
  }
}

terraform_applier = {
  secrets_manager_env_vars = {
    DEPLOY_SCRIPT_SSH_PRIVATE_KEY = "ARN of secrets manager entry containing raw contents of a SSH private key for accessing private repos containing infrastructure live configuration. This is also used when updating the config files with terraform-update-variable."
  }
}

For entries corresponding to SSH keys, you will need to make sure to store the contents of the ssh private key into AWS Secrets Manager in order for the ECS task to properly read and use the key. Note that currently the ECS deploy runner does not support pem keys that require a password.

You will also want to make sure to use a dedicated machine user with read only privileges for accessing the source code. As mentioned in the threat model, write access to the source code will defeat almost any security measures employed for CI/CD of infrastructure code, so you will want to make sure that damage can be limited even if this secret were to leak. The exception is if you are implementing automated deployment workflows, in which case you will want to configure argument boundaries to ensure that you can't modify the input variables of arbitrary infrastructure configurations using terraform-update-variable.

To create a machine user and associate its SSH key:

  1. Create the machine user on your version control platform.

  2. Create a new SSH key pair on the command line using ssh-keygen:

    ssh-keygen -t rsa -b 4096 -C "MACHINE_USER_EMAIL"
    

    Make sure to set a different path to store the key (to avoid overwriting any existing key). Also avoid setting a passphrase on the key.

  3. Upload the SSH key pair to the machine user. See the following docs for the major VCS platforms:

    • GitHub
    • GitLab
    • BitBucket: (Note: you will need to expand one of the instructions to see the full instructions for adding an SSH key to the machine user account)
  4. Create an AWS Secrets Manager entry with the contents of the private key. In the following example, we use the aws CLI to create the entry in us-west-2, sourcing the contents from the SSH private key file ~/.ssh/machine_user

    cat ~/.ssh/machine_user \
        | xargs -0 aws secretsmanager create-secret --region us-west-2 --name "SSHPrivateKeyForECSDeployRunner" --secret-string
    

    When you run this command, you should see a JSON output with metadata about the created secret:

    {
        "ARN": "arn:aws:secretsmanager:us-west-2:000000000000:secret:SSHPrivateKeyForECSDeployRunner-SOME_RANDOM_STRING",
        "Name": "SSHPrivateKeyForECSDeployRunner",
        "VersionId": "21cda90e-84e0-4976-8914-7954cb6151bd"
    }
    
  5. Record the ARN from the output and set the relevant secrets_manager_env_vars or repo_access_ssh_key_secrets_manager_arn input variables in the standard configuration.

Contributing

Developing the Invoker Lambda function

The source code for the invoker lambda function exists in the invoker-lambda folder. In the folder, you will find the following folder structure:

  • invoker: A python package containing the lambda function handler.
  • dev_requirements.txt: Additional requirements for enhanced developer experiences. E.g mypy and type stubs for static analysis.

Note that the invoker code requires Python 3.8 to run. This is primarily to take advantage of the enhanced static types that were added in Python 3.8. Since we can target a known environment (AWS Lambda), we trade off portability of the scripts for a better developer experience.

See the relevant docs for python local development for the infrastructure-deploy-script for information on how to setup your local environment for running the type checker.

Questions? Ask away.

We're here to talk about our services, answer any questions, give advice, or just to chat.

Ready to hand off the Gruntwork?