Part 1 - Kafka Key Terminology

This section focuses on the key terminology and concepts about Apache Kafka. And also note that each has its contribution to Apache Kafka Architecture.

Distributed Messaging System

Kafka is a distributed messaging system which is designed for:

Producers / Publisher

A publisher creates some data and sends it to a broker. Producers are simply an applications that you write to send the messages.

Consumers / Subscriber

A subscriber retrieves the message and processes it from the broker to which it is interested and authorized. Consumers are simply applications that you write to receive the messages.

Topics

It is the collection of messages or grouping of messages. And every topic has its name, which is used by producers and consumers to send/receive messages. Topics can be defined upfront or on-demand as needed.

Broker or Worker or Node

Kafka broker is a software process also referred to as executable or daemon service that runs on a machine that can be physical or virtual.

Cluster

A kafka cluster is the grouping of multiple kafka brokers which can be on a single machine or different machines.

Controller

A controller is just a worker node or broker like any other. It just happened to be elected from amongst its peer to officiate in the administrative capacity of a controller. Below are the critical responsibilities of the controller:

Leader

A leader is also a worker node or broker like any other. It is elected by the controller for assigning the task. It will peer with other worker nodes based on the replication factor to complete its task. Once a peer has committed to its leader, the quorum is formed. And these peer takes all the task that is assigned to its leader.

Communication and Consensus

Virtually every component within a cluster has to keep some form of communication going between themselves. In a distributed system, these communications are called consensus. Besides the obvious data payloads being transferred as messages, there are other types of network communication happening that keep the cluster operating normally. These are some of them:

Zookeeper

In the context of Apache Kafka, ZooKeeper serves as a centralized service for maintaining metadata about a cluster of distributed nodes, which are:

Also, it is the responsibility of the Zookeeper to assign a broker to be responsible for the topic.

Topic message order

Every message that comes to the broker is grouped by topics and is stored inside a partition. And each partition stores these messages in an ordered sequence (by time) and immutable by nature means cannot be updated once it is received by the broker. But in the case of multiple partitions, an order cannot be guaranteed and it’s the responsibility of the consumer to handle this use case.

Event Souring

An architectural style or approach to maintaining an application’s state by capturing all changes as a a sequence of time-ordered, immutable events.

Message Content

Every message that is consumed by the broker contains these three properties:

Offset

It is the last read message position that is maintained by the Kafka consumer. It also corresponds to the message identifier.

Message Retention Policy

It is the period for how long messages will be stored in the broker. By default, it is set to 168 hours or seven days. The retention period is defined on a per topic basis. Physical storage can also constrain the message retention policy.

Distributed Commit Log (Transaction or Commit Logs)

It is the series of commit logs that can also be used for event sourcing. Following are the key properties of it:

Kafka Partitions

It is the physical representation of the topic as a commit log stored on one or more brokers. It can be found inside /tmp/kafka-logs/{topic}-{partition} directory, as per our example, /tmp/kafka-logs/my_topic-0 directory will be created. This directory will contain an index, log files, etc. These are the main properties of partitions:

Replication Factor

It is the factor that decides how much data redundancy should happen inside the cluster. And this is configured on a per-topic basis. These are the features that it provides:

Notes

Series