Reader small image

You're reading from  Building Data Streaming Applications with Apache Kafka

Product typeBook
Published inAug 2017
PublisherPackt
ISBN-139781787283985
Edition1st Edition
Tools
Right arrow
Authors (2):
Chanchal Singh
Chanchal Singh
author image
Chanchal Singh

Chanchal Singh has over half decades experience in Product Development and Architect Design. He has been working very closely with leadership team of various companies including directors ,CTO's and Founding members to define technical road-map for company.He is the Founder and Speaker at meetup group Big Data and AI Pune MeetupExperience Speaks. He is Co-Author of Book Building Data Streaming Application with Apache Kafka. He has a Bachelor's degree in Information Technology from the University of Mumbai and a Master's degree in Computer Application from Amity University. He was also part of the Entrepreneur Cell in IIT Mumbai. His Linkedin Profile can be found at with the username Chanchal Singh.
Read more about Chanchal Singh

Manish Kumar
Manish Kumar
author image
Manish Kumar

Manish Kumar works as Director of Technology and Architecture at VSquare. He has over 13 years' experience in providing technology solutions to complex business problems. He has worked extensively on web application development, IoT, big data, cloud technologies, and blockchain. Aside from this book, Manish has co-authored three books (Mastering Hadoop 3, Artificial Intelligence for Big Data, and Building Streaming Applications with Apache Kafka).
Read more about Manish Kumar

View More author details
Right arrow

Deep Dive into Kafka Consumers

Every messaging system has two types of data flows. One flow pushes the data to the Kafka queues and the other flow reads the data from those queues. In the previous chapter, our focus was on the data flows that are pushing the data to Kafka queues using producer APIs. After reading the previous chapter, you should have sufficient knowledge about publishing data to Kafka queues using producer APIs in your application. In this chapter, our focus is on the second type of data flow--reading the data from Kafka queues.

Before we start with a deep dive into Kafka consumers, you should have a clear understanding of the fact that reading data from Kafka queues involves understanding many different concepts and they may differ from reading data from traditional queuing systems.

With Kafka, every consumer has a unique identity and they are in full control...

Kafka consumer internals

In this section of the chapter, we will cover different Kafka consumer concepts and various data flows involved in consuming messages from Kafka queues. As already mentioned, consuming messages from Kafka is a bit different from other messaging systems. However, when you are writing consumer applications using consumer APIs, most of the details are abstracted. Most of the internal work is done by Kafka consumer libraries used by your application.

Irrespective of the fact that you do not have to code for most of the consumer internal work, you should understand these internal workings thoroughly. These concepts will definitely help you in debugging consumer applications and also in making the right application decision choices.

Understanding the responsibilities...

Kafka consumer APIs

Like Kafka producer, Kafka also provides a rich set of APIs to develop a consumer application. In previous sections of this chapter, you have learned about internal concepts of consumer, working of consumer within a consumer group, and partition rebalance. We will see how this concept helps in building a good consumer application.

  • Consumer configuration
  • KafkaConsumer object
  • Subscription and polling
  • Commit and offset
  • Additional configuration

Consumer configuration

Creating Kafka consumer also requires a few mandatory properties to be set. There are basically four properties:

  • bootstrap.servers: This property is similar to what we defined in Chapter 3, Deep Dive into Kafka Producers, for producer configuration...

Java Kafka consumer

The following program is a simple Java consumer which consumes data from topic test. Please make sure data is already available in the mentioned topic otherwise no record will be consumed.

import org.apache.kafka.clients.consumer.*;
import org.apache.kafka.common.TopicPartition;
import org.apache.log4j.Logger;

import java.util.*;

public class DemoConsumer {
private static final Logger log = Logger.getLogger(DemoConsumer.class);

public static void main(String[] args) throws Exception {

String topic = "test1";
List<String> topicList = new ArrayList<>();
topicList.add(topic);
Properties consumerProperties = new Properties();
consumerProperties.put("bootstrap.servers", "localhost:9092");
consumerProperties.put("group.id", "Demo_Group");
consumerProperties...

Scala Kafka consumer

This is the Scala version of the previous program and will work the same as the previous snippet. Kafka allows you to write consumer in many languages including Scala.

import org.apache.kafka.clients.consumer._
import org.apache.kafka.common.TopicPartition
import org.apache.log4j.Logger
import java.util._


object DemoConsumer {
private val log: Logger = Logger.getLogger(classOf[DemoConsumer])

@throws[Exception]
def main(args: Array[String]) {
val topic: String = "test1"
val topicList: List[String] = new ArrayList[String]
topicList.add(topic)
val consumerProperties: Properties = new Properties
consumerProperties.put("bootstrap.servers", "10.200.99.197:6667")
consumerProperties.put("group.id", "Demo_Group")
consumerProperties.put("key.deserializer", "org.apache.kafka.common.serialization...

Common message consuming patterns

Here are a few of the common message consuming patterns:

  • Consumer group - continuous data processing: In this pattern, once consumer is created and subscribes to a topic, it starts receiving messages from the current offset. The consumer commits the latest offsets based on the count of messages received in a batch at a regular, configured interval. The consumer checks whether it's time to commit, and if it is, it will commit the offsets. Offset commit can happen synchronously or asynchronously. It uses the auto-commit feature of the consumer API.

The key point to understand in this pattern is that consumer is not controlling the message flows. It is driven by the current offset of the partition in a consumer group. It receives messages from that current offset and commits the offsets as and when messages are received by it after regular...

Best practices

After going through the chapter, it is important to note a few of the best practices. They are listed as follows:

  • Exception handling: Just like producers, it is the sole responsibility of consumer programs to decide on program flows with respect to exceptions. A consumer application should define different exception classes and, as per your business requirements, decide on the actions that need to be taken.
  • Handling rebalances: Whenever any new consumer joins consumer groups or any old consumer shuts down, a partition rebalance is triggered. Whenever a consumer is losing its partition ownership, it is imperative that they should commit the offsets of the last event that they have received from Kafka. For example, they should process and commit any in-memory buffered datasets before losing the ownership of a partition. Similarly, they should close any open file...

Summary

This concludes our section on Kafka consumers. This chapter addresses one of the key functionalities of Kafka message flows. The major focus was on understanding consumer internal working and how the number of consumers in the same group and number of topic partitions can be utilized to increase throughput and latency. We have also covered how to create consumers using consumer APIs and how to handle message offsets in case consumer fails.
We started with Kafka consumer APIs and also covered synchronous and asynchronous consumers and their advantages and disadvantages. We explained how to increase the throughput of a consumer application. We then went through the consumer rebalancer concept and when it gets triggered and how we can create our own rebalancer. We also focused on different consumer patterns that are used in different consumer applications. We focused on...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Building Data Streaming Applications with Apache Kafka
Published in: Aug 2017Publisher: PacktISBN-13: 9781787283985
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
Chanchal Singh

Chanchal Singh has over half decades experience in Product Development and Architect Design. He has been working very closely with leadership team of various companies including directors ,CTO's and Founding members to define technical road-map for company.He is the Founder and Speaker at meetup group Big Data and AI Pune MeetupExperience Speaks. He is Co-Author of Book Building Data Streaming Application with Apache Kafka. He has a Bachelor's degree in Information Technology from the University of Mumbai and a Master's degree in Computer Application from Amity University. He was also part of the Entrepreneur Cell in IIT Mumbai. His Linkedin Profile can be found at with the username Chanchal Singh.
Read more about Chanchal Singh

author image
Manish Kumar

Manish Kumar works as Director of Technology and Architecture at VSquare. He has over 13 years' experience in providing technology solutions to complex business problems. He has worked extensively on web application development, IoT, big data, cloud technologies, and blockchain. Aside from this book, Manish has co-authored three books (Mastering Hadoop 3, Artificial Intelligence for Big Data, and Building Streaming Applications with Apache Kafka).
Read more about Manish Kumar