Browse the Repo

file-type-icon.circleci
file-type-icon.github
file-type-iconexamples
file-type-iconmodules
file-type-iconkinesis-firehose
file-type-iconkinesis
file-type-iconREADME.md
file-type-iconmain.tf
file-type-iconoutputs.tf
file-type-iconvariables.tf
file-type-iconmsk
file-type-iconsns-sqs-connection
file-type-iconsns
file-type-iconsqs-lambda-connection
file-type-iconsqs
file-type-icontest
file-type-icon.gitignore
file-type-icon.pre-commit-config.yaml
file-type-iconCODEOWNERS
file-type-iconLICENSE.txt
file-type-iconREADME.md
file-type-iconterraform-cloud-enterprise-private-module-...

Browse the Repo

file-type-icon.circleci
file-type-icon.github
file-type-iconexamples
file-type-iconmodules
file-type-iconkinesis-firehose
file-type-iconkinesis
file-type-iconREADME.md
file-type-iconmain.tf
file-type-iconoutputs.tf
file-type-iconvariables.tf
file-type-iconmsk
file-type-iconsns-sqs-connection
file-type-iconsns
file-type-iconsqs-lambda-connection
file-type-iconsqs
file-type-icontest
file-type-icon.gitignore
file-type-icon.pre-commit-config.yaml
file-type-iconCODEOWNERS
file-type-iconLICENSE.txt
file-type-iconREADME.md
file-type-iconterraform-cloud-enterprise-private-module-...
Kinesis

Kinesis

Create Kinesis streams with configurable or auto-calculated shard and retention settings.

Code Preview

Preview the Code

mobile file icon

README.md

down

Kinesis Data Stream Module

This module creates a Kinesis Data Stream.

About Kinesis Data Stream

A Kinesis data stream is a set of shards. Each shard has a sequence of data records. Each data record has a sequence number that is assigned by Kinesis Data Streams.

  • data record: A data record is the unit of data stored in a Kinesis data stream. Data records are composed of a sequence number, a partition key, and a data blob, which is an immutable sequence of bytes.
  • shard: A shard is a uniquely identified sequence of data records in a stream. A stream is composed of one or more shards, each of which provides a fixed unit of capacity.
  • sequence number: Each data record has a sequence number that is unique per partition-key within its shard. Kinesis Data Streams assigns the sequence number after you write to the stream with client.putRecords or client.putRecord. Sequence numbers for the same partition key generally increase over time. The longer the time period between write requests, the larger the sequence numbers become.

Sharding / Partitioning in Kinesis Data Stream

Kinesis Data Stream achieves scalability by using shards. The data capacity of your stream is a function of the number of shards that you specify for the stream. The total capacity of the stream is the sum of the capacities of its shards.

How to Set Shard Size

You can configure the initial number of shards in two ways:

  • direct specification: specify number_of_shards directly
  • indirect specification: specify the average_data_size_in_kb, records_per_second and number_of_consumers variables and let the module calculate the initial number of shards.

Note: the module calculates the initial number of shards by:

  1. Calculate the incoming write bandwidth in KB (incoming_write_bandwidth_in_KB), which is equal to the average_data_size_in_KB multiplied by the number_of_records_per_second.
  2. Calculate the outgoing read bandwidth in KB (outgoing_read_bandwidth_in_KB), which is equal to the incoming_write_bandwidth_in_KB multiplied by the number_of_consumers.
  3. You can then calculate the initial number of shards (number_of_shards) your data stream needs using the following formula: number_of_shards = max (incoming_write_bandwidth_in_KB/1000, outgoing_read_bandwidth_in_KB/2000)

Refer to the suggestion of calculating the initial number of shards FAQ for more information.

How does Data Partition Work

A partition key is used to group data by shard within a stream. Kinesis Data Streams segregates the data records belonging to a stream into multiple shards. It uses the partition key that is associated with each data record to determine which shard a given data record belongs to. When an application puts data into a stream, it must specify a partition key.

With a single shard, all data goes into the same shard. There's no other way to use a custom partitioning logic.

How to Re-Shard a Stream

Re-configuring the shard size will result destroying the old Kinesis data stream and re-creating it with a new one. In order to prevent this, consider using the UpdateShardCount API. Updating the shard count is an asynchronous operation. To update the shard count, Kinesis Data Streams performs splits or merges on individual shards. This can cause short-lived shards to be created, in addition to the final shards. These short-lived shards count towards your total shard limit for your account in the Region. You can find more information in the following pages:

Limitation

Here are some limitation of the Kinesis Data stream you might be interested in :

  • Data Payload Size: The maximum size of the data payload of a record before base64-encoding is up to 1 MB.
  • Retention Period: The maximum value of a stream's retention period is 8760 hours (365 days).
  • Shard Throughput: Each shard can support up to 1 MB/sec or 1,000 records/sec write throughput or up to 2 MB/sec or 2,000 records/sec read throughput.

You can find the latest and full list of limitation/quotas in this page: Quotas and Limits.

Encryption

Amazon Kinesis Data Streams can automatically encrypt sensitive data as it enters into a stream. Kinesis Data Streams uses AWS KMS master keys for encryption. With server-side encryption, your Kinesis stream producers and consumers don't need to manage master keys or cryptographic operations. Your data is automatically encrypted as it enters and leaves the Kinesis Data Streams service, so your data at rest is encrypted. For more information, see Data Protection in Amazon Kinesis Data Streams.

How to Enable Encryption

You can enable encryption in two ways:

  • default encryption: set encryption_type = "KMS". This will use the default AWS service key for Kinesis, aws/kinesis.
  • custom key encryption: If you need to use a Customer Managed Key (CMK), see the master key module as well as documentation on user-generated KMS master keys for further information on how to create them. You can specify one using kms_key_id = "alias/<my_cmk_alias>"

How to Change KMS Key

You can change the KMS key by reconfiguring the encryption with the kms_key_id and encryption_type variables.

Please note that changing the KMS key for a Kinesis Data Stream does not retroactively re-encrypt previously encrypted data in the stream with the new KMS key. Any data that was previously encrypted with the old KMS key will remain encrypted with that key. However, any new data added to the stream after the KMS key change will be encrypted with the new KMS key.

If you need to re-encrypt the previously encrypted data in the stream with the new KMS key, you will need to manually copy the data to a new stream that is configured to use the new KMS key for encryption. Alternatively, you can use AWS Lambda or other AWS services to read the data from the original stream, decrypt it using the old KMS key, and then re-encrypt it with the new KMS key before writing it to a new stream or another data store.

Replication

Amazon Kinesis Data Stream does not support replication out of the box. One way to implement replication is to use Lambda. You can find more information from this AWS article: Build highly available streams with Amazon Kinesis Data Streams

There is also a sample prototype from AWS that demonstrates continuous data capture (CDC) to replicate data across regions: https://github.com/aws-samples/aws-kinesis-data-streams-replicator

How do you use this module?

Examples

Aside from the example module linked above, here are some examples of how you might deploy a Kinesis stream with this module:

module "kinesis" {
  source = "git::git@github.com:gruntwork-io/terraform-aws-messaging.git//modules/kinesis?ref=v0.0.1"

  name             = "my-stream"
  retention_period = 48

  number_of_shards    = 1
  shard_level_metrics = [
    "IncomingBytes",
    "IncomingRecords",
    "IteratorAgeMilliseconds",
    "OutgoingBytes",
    "OutgoingRecords",
    "ReadProvisionedThroughputExceeded",
    "WriteProvisionedThroughputExceeded"
  ]

}
module "kinesis" {
  source           = "git::git@github.com:gruntwork-io/terraform-aws-messaging.git//modules/kinesis?ref=v0.0.1"
  name             = "my-stream"
  retention_period = 48

  average_data_size_in_kb = 20
  records_per_second      = 10
  number_of_consumers     = 10

  shard_level_metrics = [
    "ReadProvisionedThroughputExceeded",
    "WriteProvisionedThroughputExceeded"
  ]
}

Questions? Ask away.

We're here to talk about our services, answer any questions, give advice, or just to chat.

Ready to hand off the Gruntwork?