Browse the Repo

file-type-icon.circleci
file-type-iconexamples
file-type-iconmodules
file-type-iconbash-commons
file-type-iconexhibitor-shared-config
file-type-iconinstall-exhibitor
file-type-iconinstall-open-jdk
file-type-iconinstall-oracle-jdk
file-type-iconinstall-supervisord
file-type-iconinstall-zookeeper
file-type-iconprecommit-hooks
file-type-iconrun-exhibitor
file-type-iconrun-health-checker
file-type-iconzookeeper-cluster
file-type-iconREADME.md
file-type-iconmain.tf
file-type-iconoutputs.tf
file-type-iconvars.tf
file-type-iconzookeeper-iam-permissions
file-type-iconzookeeper-security-group-rules
file-type-icontest
file-type-icon.gitignore
file-type-icon.pre-commit-config.yaml
file-type-icon.pre-commit-hooks.yaml
file-type-iconCODEOWNERS
file-type-iconLICENSE.txt
file-type-iconREADME.md

Browse the Repo

file-type-icon.circleci
file-type-iconexamples
file-type-iconmodules
file-type-iconbash-commons
file-type-iconexhibitor-shared-config
file-type-iconinstall-exhibitor
file-type-iconinstall-open-jdk
file-type-iconinstall-oracle-jdk
file-type-iconinstall-supervisord
file-type-iconinstall-zookeeper
file-type-iconprecommit-hooks
file-type-iconrun-exhibitor
file-type-iconrun-health-checker
file-type-iconzookeeper-cluster
file-type-iconREADME.md
file-type-iconmain.tf
file-type-iconoutputs.tf
file-type-iconvars.tf
file-type-iconzookeeper-iam-permissions
file-type-iconzookeeper-security-group-rules
file-type-icontest
file-type-icon.gitignore
file-type-icon.pre-commit-config.yaml
file-type-icon.pre-commit-hooks.yaml
file-type-iconCODEOWNERS
file-type-iconLICENSE.txt
file-type-iconREADME.md
Apache ZooKeeper

Apache ZooKeeper

Deploy an Apache ZooKeeper cluster. Supports automatic bootstrap, Exhibitor, zero-downtime rolling deployment, and auto healing.

Code Preview

Preview the Code

mobile file icon

README.md

down

ZooKeeper Cluster

This folder contains a Terraform module for running a cluster of Apache ZooKeeper nodes. Under the hood, the cluster is powered by the server-group module, so it supports attaching ENIs and EBS Volumes, zero-downtime rolling deployment, and auto-recovery of failed nodes. This module assumes that you are deploying an AMI that has both ZooKeeper and Exhibitor installed.

Quick start

Key considerations for using this module

Here are the key things to take into account when using this module:

ZooKeeper AMI

You specify the AMI to run in the cluster using the ami_id input variable. We recommend creating a Packer template to define the AMI with the following modules installed:

See the zookeeper-ami example for working sample code.

User Data

When your servers are booting, you need to tell them to start Exhibitor (which, in turn, will start ZooKeeper). The easiest way to do that is to specify a User Data script via the user_data input variable that runs the run-exhibitor script. See user-data.sh for an example.

Cluster size

Although you can run ZooKeeper on just a single server, in production, we strongly recommend running multiple ZooKeeper servers in a cluster (called an ensemble) so that:

  1. ZooKeeper replicates your data to all servers in the ensemble, so if one server dies, you don't lose any data, and the other servers can continue serving requests.
  2. Since the data is replicated across all the servers, any of the ZooKeeper nodes can respond to a read request, so you can scale to more read traffic by increasing the size of the cluster.

Note that ZooKeeper achieves consensus by using a majority vote, which has three implications:

  1. Your cluster must have an odd number of servers to make it possible to achieve a majority.
  2. A ZooKeeper cluster can continue to operate as long as a majority of the servers are operational. That means a cluster with n nodes can tolerate (n - 1) / 2 failed servers. So a 1-node cluster cannot tolerate any failed servers, a 3-node cluster can tolerate 1 failed server, a 5-node cluster can tolerate 2 failed servers, and a 7-node cluster can tolerate 3 failed servers.
  3. Larger clusters actually make writes slower, since you have to wait on more servers to respond to the vote. Most use cases are much more read-heavy than write-heavy, so this is typically a good trade-off. In practice, because writes get more expensive as the cluster grows, it's unusual to see a ZooKeeper cluster with more than 7 servers.

Putting all of this together, we recommend that in production, you always use a 3, 5, or 7 node cluster depending on your availability and scalability requirements.

Health checks

We strongly recommend associating an Elastic Load Balancer (ELB) with your ZooKeeper cluster and configuring it to perform TCP health checks on the ZooKeeper client port (2181 by default). The zookeeper-cluster module allows you to associate an ELB with ZooKeeper, using the ELB's health checks to perform zero-downtime deployments (i.e., ensuring the previous node is passing health checks before deploying the next one) and to detect when a server is down and needs to be automatically replaced.

Note that we do NOT recommend connecting to the ZooKeeper cluster via the ELB. That's because you access the ELB via its domain name, and most ZooKeeper clients (including Kafka) cache DNS entries forever. So if the underlying IPs stored in DNS for the ELB change (which could happen at any time!), the ZooKeeper clients own't find out about it until after a restart. You should always connect directly to the ZooKeeper nodes themselves via their static IP addresses.

Check out the zookeeper-cluster example for working sample code that includes an ELB.

Rolling deployments

To deploy updates to a ZooKeeper cluster, such as rolling out a new version of the AMI, you need to do the following:

  1. Shut down ZooKeeper on one server.
  2. Deploy the new code on the same server.
  3. Wait for the new code to come up successfully and start passing health checks.
  4. Repeat the process with the remaining servers.

This module can do this process for you automatically by using the server-group module's support for zero-downtime rolling deployment.

Static IPs and ENIs

To connect to ZooKeeper, either from other ZooKeeper servers, or from ZooKeeper clients such as Kafka, you need to provide the list of IP addresses for your ZooKeeper servers. Most ZooKeeper clients read this list of IPs during boot and never update it after. That means you need a static list of IP addresses for your ZooKeeper nodes.

This is a problem in a dynamic cloud environment, where any of the ZooKeeper nodes could be replaced (either due to an outage or deployment) with a different server, with a different IP address, at any time. Using DNS doesn't help, as most ZooKeeper clients (including Kafka!) cache DNS results forever, so if the underlying IPs stored in the DNS record change, those clients won't find out about it until they are restarted.

Our solution is to use Elastic Network Interface (ENIs). An ENI is a static IP address that you can attach to any server. This module creates an ENI for each ZooKeeper server and gives each (server, ENI) a matching eni-0 tag. You can use the attach-eni script in the User Data of each server to find an ENI with a matching eni-0 tag and attach it to the server during boot. That way, if a server goes down and is replaced, its replacement reattaches the same ENI and gets the same IP address.

See user-data.sh for an example.

Transaction logs and EBS Volumes

Every write to a ZooKeeper server is immediately persisted to disk for durability in ZooKeeper's transaction log. We recommend using a separate EBS Volume to store these transaction logs. This ensures the hard drive used for transaction logs does not have to contend with any other disk operations, which can significantly improve ZooKeeper performance.

This module creates an EBS Volume for each ZooKeeper server and gives each (server, EBS Volume) a matching ebs-volume-0 tag. You can use the persistent-ebs-volume module in the User Data of each server to find an EBS Volume with a matching ebs-volume-0 tag and attach it to the server during boot. That way, if a server goes down and is replaced, its replacement reattaches the same EBS Volume.

See user-data.sh for an example.

Exhibitor

This module assumes that you are running an AMI with Exhibitor installed. Exhibitor performs several functions, including acting as a process supervisor for ZooKeeper and cleaning up old transaction logs. ZooKeeper also exposes a UI you can use to see what's stored in and manage your ZooKeeper cluster. By default, this UI is available at port 8080 of every ZooKeeper server. We also expose Exhibitor at port 80 via the ELB used for health checks in the zookeeper-cluster example.

Data backup

ZooKeeper's primary mechanism for backing up data is the replication within the cluster, since every node has a copy of all the data. It is rare to backup data beyond that, as the type of data typically stored in ZooKeeper is ephemeral in nature (e.g., the leader of a cluster), and it's unusual for older data to be of any use.

That said, if you need more backup, you can do so from the Exhibitor UI, which offers Backup/Restore functionality that allows you to index the ZooKeeper transaction log and backup and restore specific transactions.

Questions? Ask away.

We're here to talk about our services, answer any questions, give advice, or just to chat.

Ready to hand off the Gruntwork?