Browse the Repo

file-type-icon.circleci
file-type-iconexamples
file-type-iconmodules
file-type-icontest
file-type-icon.gitignore
file-type-icon.pre-commit-config.yaml
file-type-iconCODEOWNERS
file-type-iconLICENSE.txt
file-type-iconREADME.md

Browse the Repo

file-type-icon.circleci
file-type-iconexamples
file-type-iconmodules
file-type-icontest
file-type-icon.gitignore
file-type-icon.pre-commit-config.yaml
file-type-iconCODEOWNERS
file-type-iconLICENSE.txt
file-type-iconREADME.md
Apache Kafka and Confluent Tools

Apache Kafka and Confluent Tools

Deploy a cluster of Kafka brokers. Optionally deploy Confluent tools such as Schema Registry, REST Proxy, and Kafka Connect.

Preview the Code

mobile file icon

README.md

down

Kafka

This repo contains modules for deploying and managing a cluster of Apache Kafka brokers. Note that Kafka depends on Apache ZooKeeper, which you can deploy with package-zookeeper.

The main modules are:

  • kafka-cluster: A Terraform module to run a cluster of Kafka brokers with EBS Volumes attached, zero-downtime deployment, and auto-recovery of failed nodes.

  • install-kafka: Install Apache Kafka.

  • run-kafka: Configure and start Kafka.

The supporting modules are:

Click on each module above to see its documentation.

Getting started

Head over to the examples folder for working example code.

What is Kafka?

Apache Kafka is an open source, distributed, streaming platform. It lets you publish and subscribe to streams of records (a bit like a queue), store streams of records in a fault-tolerant way, and process streams of records as they occur. It is mainly used for building real-time streaming data pipelines (e.g., ETL job to move data into Hadoop or a data warehouse) and real-time streaming applications (e.g., detect fraud in real-time by processing a stream of event data).

To learn more about why Kafka exists and what it can be used for, check out The Log: What every software engineer should know about real-time data's unifying abstraction.

What is ZooKeeper?

Apache ZooKeeper is an open source, distributed, hierarchical, key-value store. It implements a distributed consensus protocol, which makes it a useful primitive in other distributed systems—in particular, Kafka—as a way to do leader election, synchronization, coordination, locking, naming registry, and so on.

What is a Gruntwork module?

At Gruntwork, we've taken the thousands of hours we spent building infrastructure on AWS and condensed all that experience and code into pre-built packages or modules. Each module is a battle-tested, best-practices definition of a piece of infrastructure, such as a VPC, ECS cluster, or an Auto Scaling Group. Modules are versioned using Semantic Versioning to allow Gruntwork clients to keep up to date with the latest infrastructure best practices in a systematic way.

How do you use a module?

Most of our modules contain either:

  1. Terraform code
  2. Scripts & binaries

Using a Terraform Module

To use a module in your Terraform templates, create a module resource and set its source field to the Git URL of this repo. You should also set the ref parameter so you're fixed to a specific version of this repo, as the master branch may have backwards incompatible changes (see module sources).

For example, to use v1.0.8 of the standalone-server module, you would add the following:

module "ecs_cluster" {
  source = "git::git@github.com:gruntwork-io/module-server.git//modules/standalone-server?ref=v1.0.8"

  // set the parameters for the standalone-server module
}

Note: the double slash (//) is intentional and required. It's part of Terraform's Git syntax (see module sources).

See the module's documentation and vars.tf file for all the parameters you can set. Run terraform get -update to pull the latest version of this module from this repo before running the standard terraform plan and terraform apply commands.

Using scripts & binaries

You can install the scripts and binaries in the modules folder of any repo using the Gruntwork Installer. For example, if the scripts you want to install are in the modules/ecs-scripts folder of the https://github.com/gruntwork-io/module-ecs repo, you could install them as follows:

gruntwork-install --module-name "ecs-scripts" --repo "https://github.com/gruntwork-io/module-ecs" --tag "0.0.1"

See the docs for each script & binary for detailed instructions on how to use them.

Developing a module

Versioning

We are following the principles of Semantic Versioning. During initial development, the major version is to 0 (e.g., 0.x.y), which indicates the code does not yet have a stable API. Once we hit 1.0.0, we will follow these rules:

  1. Increment the patch version for backwards-compatible bug fixes (e.g., v1.0.8 -> v1.0.9).
  2. Increment the minor version for new features that are backwards-compatible (e.g., v1.0.8 -> 1.1.0).
  3. Increment the major version for any backwards-incompatible changes (e.g. 1.0.8 -> 2.0.0).

The version is defined using Git tags. Use GitHub to create a release, which will have the effect of adding a git tag.

Tests

See the test folder for details.

License

Please see LICENSE.txt for details on how the code in this repo is licensed.

Questions? Ask away.

We're here to talk about our services, answer any questions, give advice, or just to chat.

Ready to hand off the Gruntwork?