A Brief Introduction to Kafka

Posted By :Kundan Ray Akela |30th December 2018

In this blog, I am going to present an introduction to Kafka which is very popular nowadays.

Apache Kafka is a distributed streaming platform which has capabilities to

Message Queueing by the help of Publish and Subscribe paradigm
Store records in form of streams in a fault-tolerant way
Processing records in form of streams

Kafka is mainly popular and used in areas such as:

Very reliable capture data from application or system in real time, and
Building application which has the capability to use and feature real-time data

Let’s understand how Kafka is able to do the great job which is nowadays getting popularity:

It runs as a cluster on one or more servers and can be span across multiple datacenters.
Every Kafka clusters stores stream of records in categories, called topics.
Each record consists Key, Value, and Timestamp.

Image source: https://kafka.apache.org/intro

Let’s understand the main abstract thing which stores streams of records - Topic

We can understand the topic as a feed name to which records are published. Kafka topics is a multi-subscribe in nature which means a topic can have zero, one or multiple consumers that have subscribed for the data written into it in form of a stream.

In each topic there is a concept of maintaining partition log which looks like below:

Image source: https://kafka.apache.org/intro

In the diagram, there are three partitioned of a topic (0,1 and 2). Each partition is ordered and immutable that consists of stream of records that are written in a structured commit log, a special feature implemented in Kafka. Each record has a sequential id number which is called offset and on the basis of this id, Kafka uniquely identifies each record in a partition.

Let’s next explore about the Producers:

In simple word, we can say that producer produces data on a particular topic.The producer can choose which records can be stored on which partition of a topic. This is internally handled in a round robin manner simply to balance the load.

Let’s see about Consumers:

In simple words, consumers consume data from a particular topic. Every consumer is a part of consumer group. We can separate the consumer groups across multiple servers.

Image source: https://kafka.apache.org/intro

There are two servers and two consumer groups A and B. There is total of 4 partitions P0, P1, P2, P3, and each partition is subscribed by both Consumer groups.

Hope, this helps in understanding the basic concepts of Kafka. I will publish more information about it in the next subsequent blogs.

Thanks,

Kundan Ray

Request For Proposal

Get In Touch
[contact-form-7 404 "Not Found"]

A Brief Introduction to Kafka

Posted By :Kundan Ray Akela |30th December 2018

Ready to innovate ? Let's get in touch

Follow us

A Brief Introduction to Kafka

Posted By :Kundan Ray Akela |30th December 2018

About Author

Kundan Ray Akela

Ready to innovate ? Let's get in touch

Follow us