A Brief Introduction to Kafka

Posted By :Kundan Ray Akela |30th December 2018

In this blog, I am going to present an introduction to Kafka which is very popular nowadays.

Apache Kafka is a distributed streaming platform which has capabilities to

  • Message Queueing by the help of Publish and Subscribe paradigm
  • Store records in form of streams in a fault-tolerant way
  • Processing records in form of streams

 

Kafka is mainly popular and used in areas such as:

  • Very reliable capture data from application or system in real time, and
  • Building application which has the capability to use and feature real-time data

 

Let’s understand how Kafka is able to do the great job which is nowadays getting popularity:

  • It runs as a cluster on one or more servers and can be span across multiple datacenters.
  • Every Kafka clusters stores stream of records in categories, called topics.
  • Each record consists Key, Value, and Timestamp.

Image source: https://kafka.apache.org/intro

 

Let’s understand the main abstract thing which stores streams of records - Topic

We can understand the topic as a feed name to which records are published. Kafka topics is a multi-subscribe in nature which means a topic can have zero, one or multiple consumers that have subscribed for the data written into it in form of a stream.

In each topic there is a concept of maintaining partition log which looks like below:

Image source: https://kafka.apache.org/intro

 

In the diagram, there are three partitioned of a topic (0,1 and 2). Each partition is ordered and immutable that consists of stream of records that are written in a structured commit log, a special feature implemented in Kafka. Each record has a sequential id number which is called offset and on the basis of this id, Kafka uniquely identifies each record in a partition.

Let’s next explore about the Producers:

In simple word, we can say that producer produces data on a particular topic.The producer can choose which records can be stored on which partition of a topic. This is internally handled in a round robin manner simply to balance the load.

Let’s see about Consumers:

In simple words, consumers consume data from a particular topic. Every consumer is a part of consumer group. We can separate the consumer groups across multiple servers.

Image source: https://kafka.apache.org/intro

There are two servers and two consumer groups A and B. There is total of 4 partitions P0, P1, P2, P3, and each partition is subscribed by both Consumer groups.

Hope, this helps in understanding the basic concepts of Kafka. I will publish more information about it in the next subsequent blogs.

 

Thanks,

Kundan Ray


About Author

Kundan Ray Akela

Kundan has good programming and problem-solving skills.He is very good in explaining the ideas clearly and make the proper system design as well as plan.His hobbies are to play cricket and travel.

Request For Proposal

[contact-form-7 404 "Not Found"]

Ready to innovate ? Let's get in touch

Chat With Us