terraform

How to Avoid Large OpenTofu/Terraform State Files

OpenTofu/Terraform rely on state to keep track of infrastructure resources. Think of state as a detailed map of everything OpenTofu is…
How to Avoid Large OpenTofu/Terraform State Files
Zach Goldberg
Chief Technical Officer
Published February 5, 2025

OpenTofu/Terraform rely on state to keep track of infrastructure resources. Think of state as a detailed map of everything OpenTofu is managing. As your infrastructure grows and you add more resources, the state file gets larger. This leads to some challenges:

  • Degraded performance: Larger state files take longer to refresh before plan/applies, slowing down local development and CI pipelines.
  • Increased risk: If your state file gets corrupted or there’s a data leak, the impact (or “blast radius”) is much greater with a larger file.
  • Larger review scope: Changes to your IaC may also impact a large set of resources, requiring extensive review from one or more platform teams.

To address these challenges the obvious move is to break down state files into smaller, more manageable units. As with everything, this brings in new complexities, and so there are tradeoffs to be made. There are actually multiple ways to achieve this breakdown that bring different tradeoffs. For the sake of this article, lets call these different techniques IaC Topologies:

  • The All In One — aka Mega Modules or a Terralith
  • Workspaces — A single unit with multiple versions of state
  • Multi-Unit — Multiple, independently maintained units
  • Stack — Orchestrate multi-unit deployments as a single, versioned entity

All In One (Mega Modules or Terralith)

This is the default starting point for every OpenTofu/Terraform codebase. Everything lives in a single state file and is managed exclusively at runtime by OpenTofu/Terraform itself.

  • Pros: Simple to set up and OpenTofu automatically manages dependencies between modules using variables and outputs.
  • Cons: Terraliths can become slow and unwieldy as your infrastructure grows, have a higher risk of issues impacting a large number of resources, can be difficult to reuse as individual modules, can require slow and expensive reviews for updates and are harder to integrate updates between multiple developers.

Workspaces

Workspaces are a built-in feature in OpenTofu/Terraform. The Workspace feature makes it easy to manage multiple distinct state files for the same OpenTofu code. For that exact use case, the feature works well, however if you want to stretch that use case it quickly becomes apparent that workspaces aren’t a one-size-fits-all solution.

The original author of Workspaces, Martin Atkins, has written an RFC on OpenTofu to deprecate — and ultimately remove — Workspaces. In this proposal (which remains open for comment) he argues that, in the real world, Workspaces often cause more problems than they solve, and he proposes an alternative, more flexible solution for the future of OpenTofu. Summarizing his argument:

  • All workspaces require the same backend, e.g. you can’t put dev and prod state files into different buckets or in different accounts.
  • Lack of ability to define configuration per workspace, resulting in complex tfvar file workarounds

In summary, Workspaces are a fine solution for a particular narrow problem, but they are not a viable solution in general to breaking down state files.

Pros:

  • Convenient for the use case where you need an exact replica of your resources with only minor changes between each replica

Cons:

  • Lack of ability to configure important differences between replicas, such as backends, resulting in often complex and difficult to maintain workarounds to capture differences.
  • Overall not a very flexible way to break down state files.

Multi-Unit

The Multi-Unit approach involves creating multiple state files by organizing resources and modules into different folders. Naturally this resolves many of the problems with the All in One approach, however it does introduce new challenges to be solved.

  • Pros: Smaller state files, enables smaller and more manageable updates, and is easier to manage across multiple teams.
  • Cons: Dependencies between modules are no longer handled for free inside OpenTofu/Terraform, however they can be managed nicely with Terragrunt. Units also tend to be highly similar, which can cause code duplication.

Without using additional tooling, multi-unit can work, however it will require custom orchestration of multiple runs and nuanced configuration of moving data between state files. A more comprehensive solution would be to use Terragrunt: you’d create multiple Terragrunt files in different folders, and set up dependency blocks between them. For example, if you have an app with a load balancer and a VPC, you might have these three files:

myapp/
├── iac/
│   └── dev
│      └── regional
│         └── us-east-1
│            └── vpc
│               └── terragrunt.hcl
│            └── loadbalancer
│               └── terragrunt.hcl
│            └── app1
│               └── terragrunt.hcl
# /myapp/iac/dev/regional/us-east-1/vpc/terragrunt.hcl
terraform {
  source = "git::ssh://git@github.com/acmeco/modules/vpc"
}
# /myapp/iac/dev/regional/us-east-1/loadbalancer/terragrunt.hcl
terraform {
  source = "git::ssh://git@github.com/acmeco/modules/loadbalancer"
}
dependency "vpc" {
  config_path = "../vpc"
}
inputs {
  vpc_id = dependency.vpc.outputs.vpc_id
}
# /myapp/iac/dev/regional/us-east-1/app1/terragrunt.hcl
terraform {
  source = "git::ssh://git@github.com/acmeco/modules/app1"
}
dependency "loadbalancer" {
  config_path = "../loadbalancer"
}
dependency "vpc" {
  config_path = "../vpc"
}
inputs {
  vpc_id = dependency.vpc.outputs.vpc_id
  loadbalancer_id = dependency.loadbalancer.outputs.loadbalancer_id
  app_name = "app1"
}

Now let’s look at the directory tree if we want another copy of this app, but perhaps in staging instead of in dev:

myapp/
├── iac/
│   └── dev
│      └── regional
│         └── us-east-1
│            └── vpc
│               └── terragrunt.hcl
│            └── loadbalancer
│               └── terragrunt.hcl
│            └── app1
│               └── terragrunt.hcl
│   └── stage
│      └── regional
│         └── us-east-1
│            └── vpc
│               └── terragrunt.hcl
│            └── loadbalancer
│               └── terragrunt.hcl
│            └── app1
│               └── terragrunt.hcl

To achieve this we’ve duplicated the entirety of all three Terragrunt files, with literally zero change. This pattern becomes a challenge at scale, and lacks DRYness, due to the copy-pasted terragrunt.hcl files, thus leading us to our final IaC topology type, Stacks.

Stacks

Stacks build upon the multi-unit approach, offering a way to manage sets of units in a parameterized and versioned manner.

  • Pros: All the benefits of multi-unit, plus easier code reuse and management at scale.
  • Cons: Requires specific tooling to support, such as Terragrunt Stacks

Note: The example here uses Terragrunt Stacks as defined in Terragrunt RFC 3313 — as of the writing of this blog post the feature is available only in limited experimental form, though the feature is making rapid progress. You can achieve much of this functionality without RFC 3313 by using the envcommon pattern, the major drawback being the lack of a versioned artifact to represent the stack.

To achieve the architecture in the prior example, we first define the stack and store it in our catalog of modules and stacks:

# acmeco/stacks/app1/terragrunt.stack.hcl @v0.0.1
locals {
  version = "v0.0.1"
}
unit "vpc" {
  source = "git::ssh://git@github.com/acmeco/modules/vpc?ref=${local.version}"
  path   = "vpc"
}

unit "loadbalancer" {
  source = "git::ssh://git@github.com/acmeco/modules/loadbalancer?ref=${local.version}"
  path   = "loadbalancer"
}

unit "app1" {
  source = "git::ssh://git@github.com/acmeco/modules/app1?ref=${local.version}"
  path   = "app1"
}

We can then reference the stack in our source tree:

│   └── dev
|      └── environment.hcl
│      └── regional
│         └── us-east-1
│            └── app1
│               └── terragrunt.stack.hcl
│   └── stage
|      └── environment.hcl
│      └── regional
│         └── us-east-1
│            └── app1
│               └── terragrunt.stack.hcl
# /myapp/iac/dev/environment.hcl
locals {
  environment = "dev"
}
# /myapp/iac/staging/environment.hcl
locals {
  environment = "staging"
}
# /myapp/iac/dev/regional/us-east-1/app1/terragrunt.stack.hcl
locals {
  version = "v0.0.1"
}
unit "app1" {
  source = "git::ssh://git@github.com/acmeco/units/app1?ref=${local.version}"
  path   = "app1"
}
# /myapp/iac/staging/regional/us-east-1/app1/terragrunt.stack.hcl
locals {
  version = "v0.0.1"
}
unit "app1" {
  source = "git::ssh://git@github.com/acmeco/uni/app1?ref=${local.version}"
  path   = "app1"
}

With this design we can now “stamp out” as many versions of app1 as we’d like in different environments and maintain a single, central, versioned reference to the definition of app1 itself.

IaC Topology Comparison

Table comparing the various IaC Topologies

Effectively managing your OpenTofu/Terraform state is crucial for maintaining healthy and scalable infrastructure. By understanding the different IaC Topologies and their tradeoffs, you can make informed decisions about how to structure your OpenTofu/Terraform projects. As you can see, setting up a more capable and scalable IaC structure requires using some additional tools, however it does not have to mean adding significant complexity or overhead. On the contrary, tools like stacks allow for very DRY, concise, and maintainable architectures that work just as well for teams of one developer to thousands.