Browse the Repo

file-type-icon.circleci
file-type-icon.github
file-type-iconexamples
file-type-iconmodules
file-type-iconkinesis
file-type-iconmsk
file-type-iconREADME.md
file-type-iconmain.tf
file-type-iconoutputs.tf
file-type-iconvariables.tf
file-type-iconsns-sqs-connection
file-type-iconsns
file-type-iconsqs-lambda-connection
file-type-iconsqs
file-type-icontest
file-type-icon.gitignore
file-type-icon.pre-commit-config.yaml
file-type-iconCODEOWNERS
file-type-iconLICENSE.txt
file-type-iconREADME.md
file-type-iconterraform-cloud-enterprise-private-module-...

Browse the Repo

file-type-icon.circleci
file-type-icon.github
file-type-iconexamples
file-type-iconmodules
file-type-iconkinesis
file-type-iconmsk
file-type-iconREADME.md
file-type-iconmain.tf
file-type-iconoutputs.tf
file-type-iconvariables.tf
file-type-iconsns-sqs-connection
file-type-iconsns
file-type-iconsqs-lambda-connection
file-type-iconsqs
file-type-icontest
file-type-icon.gitignore
file-type-icon.pre-commit-config.yaml
file-type-iconCODEOWNERS
file-type-iconLICENSE.txt
file-type-iconREADME.md
file-type-iconterraform-cloud-enterprise-private-module-...
Kinesis

Kinesis

Create Kinesis streams with configurable or auto-calculated shard and retention settings.

Code Preview

Preview the Code

mobile file icon

README.md

down

Amazon Managed Streaming for Apache Kafka (Amazon MSK) Module

This Terraform module configures and launches an Amazon MSK cluster.

Amazon Managed Streaming for Apache Kafka (Amazon MSK) is a fully managed service that enables you to build and run applications that use Apache Kafka to process streaming data. Amazon MSK provides the control-plane operations, such as those for creating, updating, and deleting clusters. Managing all the data-plane operations, such running producers and consumers, is up to you.

It runs open-source versions of Apache Kafka, meaning existing applications, tooling, and plugins from partners and the Apache Kafka community are supported without requiring changes to application code. You can read more about supported Apache Kafka versions in the official documentation.

Note that this module does not support Amazon MSK Serverless, which is still in preview.

How do you use this module?

  • See the root README for instructions on using Terraform modules.
  • See the examples folder for example usage.
  • See variables.tf for all the variables you can set on this module.

Cluster Configuration

Amazon MSK provides a default configuration for brokers, topics, and Apache ZooKeeper nodes. You can also create custom configurations with var.server_properties and use them to create new MSK clusters or to update existing clusters.

Capacity Planning

When planning the capacity for your cluster, there are multiple factors that need to be taken into consideration, including:

  • Performance and throughput
  • Fault tolerance
  • Storage capacity

To ensure high availability for production workloads, it is recommended to have a topic replication factor > 1. This means that your topics are partitioned and replicated across multiple brokers in the cluster, leading to better fault tolerance and parallelism for your consumers. As a rule of thumb, the optimal number of partitions for a topic should be equal to, or a multiple of, the number of brokers in your cluster. Note that the number of partitions can only be increased, not decreased.

See https://docs.aws.amazon.com/msk/latest/developerguide/bestpractices.html for further details on planning the capacity and configuration of your cluster.

Storage Auto Scaling

Amount of required EBS storage depends on multiple factors, for example number of topics, amount and size of your data, data retention and replication factor. As such it is not possible to give an exact recommendation, instead the storage requirements should be calculated based on your use case. It is important to monitor disk usage and increase disk size when needed.

The module will set the initial EBS volume size with input variable initial_ebs_volume_size and automatically scale the broker volumes up until broker_storage_autoscaling_max_capacity is reached. You can optionally disable scale in with input variable disable_broker_storage_scale_in. You can use broker_storage_autoscaling_target_percentage to control the scaling threshold.

Monitoring

Monitoring With CloudWatch

Amazon MSK integrates with Amazon CloudWatch so that you can collect, view, and analyze metrics for your MSK serverless cluster. You can set the monitoring level for an MSK cluster to one of the following: DEFAULT, PER_BROKER, PER_TOPIC_PER_BROKER, or PER_TOPIC_PER_PARTITION. You can read more about metrics and monitoring here: https://docs.aws.amazon.com/msk/latest/developerguide/metrics-details.html

Open Monitoring with Prometheus

You can also monitor your MSK cluster with Prometheus, an open-source monitoring system for time-series metric data. You can also use tools that are compatible with Prometheus-formatted metrics or tools that integrate with Amazon MSK Open Monitoring, like Datadog, Lenses, New Relic, and Sumo logic. You can read more about Open Monitoring with Prometheus here: https://docs.aws.amazon.com/msk/latest/developerguide/open-monitoring.html

All metrics emitted by Apache Kafka to JMX are accessible using open monitoring with Prometheus. For information about Apache Kafka metrics, see Monitoring in the Apache Kafka documentation.

Encryption

Amazon MSK allows you to enable encryption at rest and in transit. The certificates that Amazon MSK uses for encryption must be renewed every 13 months. Amazon MSK automatically renews these certificates for all clusters.

Encryption at Rest

Amazon MSK integrates with AWS Key Management Service (KMS) to offer transparent server-side encryption. Amazon MSK always encrypts your data at rest. When you create an MSK cluster, you can specify the AWS KMS customer master key (CMK) with var.encryption_at_rest_kms_key_arn that you want Amazon MSK to use to encrypt your data at rest. If no key is specified, an AWS managed KMS (aws/msk managed service) key will be used for encrypting the data at rest.

Encryption in Transit

Amazon MSK uses TLS 1.2. By default, it encrypts data in transit between the brokers of your MSK cluster. You can override this default using var.encryption_in_transit_in_cluster input variable at the time you create the cluster. You can also control client-to-broker encryption using var.encryption_in_transit_client_broker input variable.

Logging

Broker logs enable you to troubleshoot your Apache Kafka applications and to analyze their communications with your MSK cluster. You can deliver Apache Kafka broker logs to one or more of the following destination types:

  • Amazon CloudWatch Logs
  • Amazon S3
  • Amazon Kinesis Data Firehose.

You can read more about MSK logging here: https://docs.aws.amazon.com/msk/latest/developerguide/msk-logging.html

Authentication and Authorization

You can use IAM to authenticate clients and to allow or deny Apache Kafka actions. Alternatively, you can use TLS or SASL/SCRAM to authenticate clients, and Apache Kafka ACLs to allow or deny actions. You can read more about available authentication and authorization options here: https://docs.aws.amazon.com/msk/latest/developerguide/kafka_apis_iam.html

Connecting to Kafka brokers

Once you've used this module to deploy the Kafka brokers, you'll want to connect to them from Kafka clients (e.g., Kafka consumers and producers in your apps) to read and write data. To do this, you typically need to configure the bootstrap.servers property for your Kafka client with the IP addresses of a few of your Kafka brokers (you don't need all the IPs, as the rest will be discovered automatically via ZooKeeper):

--bootstrap.servers=10.0.0.4:9092,10.0.0.5:9092,10.0.0.6:9092

Depending on which client authentication method you configured, there are a number of output variables (bootstrap_brokers_*) that provide you with a list of bootstrap servers. You can also get the list of bootstrap servers using the AWS Cli:

$ aws kafka get-bootstrap-brokers --cluster-arn ClusterArn

{
    "BootstrapBrokerStringSaslIam": "b-1.myTestCluster.123z8u.c2.kafka.us-west-1.amazonaws.com:9098,b-2.myTestCluster.123z8u.c2.kafka.us-west-1.amazonaws.com:9098"
}

MSK Connect

MSK Connect is a feature of Amazon MSK that makes it easy for developers to stream data to and from their Apache Kafka clusters. With MSK Connect, you can deploy fully managed connectors built for Kafka Connect that move data into or pull data from popular data stores like Amazon S3 and Amazon OpenSearch Service. You can deploy connectors developed by 3rd parties like Debezium for streaming change logs from databases into an Apache Kafka cluster, or deploy an existing connector with no code changes. Connectors automatically scale to adjust for changes in load and you pay only for the resources that you use.

Kafka Cluster Migration

You can mirror or migrate your cluster using MirrorMaker, which is part of Apache Kafka. Kafka MirrorMaker is a utility that helps to replicate the data between two Apache Kafka clusters within or across regions.

For further information about migrating Kafka clusters, see: https://docs.aws.amazon.com/msk/latest/developerguide/migration.html

ZooKeeper

Kafka depends on ZooKeeper to work. Amazon MSK manages the Apache ZooKeeper nodes for you. Each Amazon MSK cluster includes the appropriate number of Apache ZooKeeper nodes for your Apache Kafka cluster at no additional cost.

Controlling Access to Apache ZooKeeper

For security reasons you may want to limit access to the Apache ZooKeeper nodes that are part of your Amazon MSK cluster. To limit access to the nodes, you can assign a separate security group to them. You can then decide who gets access to that security group.

As ZooKeeper security group configuration requires manual actions, this module does not include support for that. To change the security group for ZooKeeper, follow these instructions: https://docs.aws.amazon.com/msk/latest/developerguide/zookeeper-security.html

Questions? Ask away.

We're here to talk about our services, answer any questions, give advice, or just to chat.

Ready to hand off the Gruntwork?