Reader small image

You're reading from  Building Data Streaming Applications with Apache Kafka

Product typeBook
Published inAug 2017
PublisherPackt
ISBN-139781787283985
Edition1st Edition
Tools
Right arrow
Authors (2):
Chanchal Singh
Chanchal Singh
author image
Chanchal Singh

Chanchal Singh has over half decades experience in Product Development and Architect Design. He has been working very closely with leadership team of various companies including directors ,CTO's and Founding members to define technical road-map for company.He is the Founder and Speaker at meetup group Big Data and AI Pune MeetupExperience Speaks. He is Co-Author of Book Building Data Streaming Application with Apache Kafka. He has a Bachelor's degree in Information Technology from the University of Mumbai and a Master's degree in Computer Application from Amity University. He was also part of the Entrepreneur Cell in IIT Mumbai. His Linkedin Profile can be found at with the username Chanchal Singh.
Read more about Chanchal Singh

Manish Kumar
Manish Kumar
author image
Manish Kumar

Manish Kumar works as Director of Technology and Architecture at VSquare. He has over 13 years' experience in providing technology solutions to complex business problems. He has worked extensively on web application development, IoT, big data, cloud technologies, and blockchain. Aside from this book, Manish has co-authored three books (Mastering Hadoop 3, Artificial Intelligence for Big Data, and Building Streaming Applications with Apache Kafka).
Read more about Manish Kumar

View More author details
Right arrow

Building Streaming Applications Using Kafka Streams

In the previous chapter, you learned about Kafka Connect and how it makes a user's job simple when it comes to importing and exporting data from Kafka. You also learned how Kafka Connect can be used as an extract and load processor in the ETL pipeline. In this chapter, we will focus on Kafka Stream, which is a lightweight streaming library used to develop a streaming application that works with Kafka. Kafka Stream can act as a transformer in the ETL phase.

We will cover the following topics in this chapter:

  • Introduction to Kafka Stream
  • Kafka Stream architecture
  • Advantages of using Kafka Stream
  • Introduction to KStream and KTable
  • Use case example

Introduction to Kafka Streams

The data processing strategy has evolved over time, and it's still being used in different ways. The following are the important terms related to Kafka Streams:

  • Request/response: In this type of processing, you send a single request. This is sent as request data, and the server processes it and returns the response data as a result. You may take the example of REST servers, where processing is done on request and the response is sent to the client after processing. Processing may involve filtering, cleansing, aggregation, or lookup operations. Scaling such a processing engine requires adding more services in order to handle the increase in traffic.
  • Batch processing: This is a process where you send a bounded set of input data in batches, and the processing engine sends the response in batches after processing. In batch processing, data is already...

Kafka Stream architecture 

Kafka Streams internally uses the Kafka producer and consumer libraries. It is tightly coupled with Apache Kafka and allows you to leverage the capabilities of Kafka to achieve data parallelism, fault tolerance, and many other powerful features.

In this section, we will discuss how Kafka Stream works internally and what the different components involved in building Stream applications using Kafka Streams are. The following figure is an internal representation of the working of Kafka Stream:

Kafka Stream architecture

Stream instance consists of multiple tasks, where each task processes non overlapping subsets of the record. If you want to increase parallelism, you can simply add more instances, and Kafka Stream will auto distribute work among different instances.

Let's discuss a few important components seen in the previous figure:

  • Stream...

Integrated framework advantages

Kafka Stream is tightly integrated with Apache Kafka. It provides reach sets of API and offers powerful features to build the Stream processing application. If you are using Kafka as a centralized storage layer for your data and want to do Stream processing over the it, then using Kafka Stream should be preferred because of the following reasons:

  • Deployment: An application built using Kafka Stream does not require any extra setup of the clusters to run. It can be run from a single-node machine or from your laptop. This is a huge advantage over other processing tools, such as Spark, Storm, and so on, which require clusters to be ready before you can run the application. Kafka Stream uses Kafka's producer and consumer library.

If you want to increase parallelism, you just need to add more instances of the application, and Kafka Stream will...

Understanding tables and Streams together

Before we start discussing tables and Streams, let's understand the following simple code of a word count program written in Java using a Kafka Stream API, and then we will look into the concepts of KStream and KTable. We have been discussing the concepts of Kafka Stream; in this section, we will discuss KStream, KTable, and their internals.

Maven dependency

The Kafka Stream application can be run from anywhere. You just need to add library dependency and start developing your program. We are using Maven to build our application. Add the following dependency into your project:

<dependency>
<groupId>org.apache.Kafka</groupId>
<artifactId>Kafka-Streams...

Use case example of Kafka Streams

We will take the same example of IP fraud detection that we used in Chapter 5, Building Spark Streaming Applications with Kafka, and Chapter 6, Building Storm Application with Kafka. Let's start with how we can build the same application using Kafka Stream. We will start with the code, take the producer, and look up the code from Chapter 6, Building Storm Application with Kafka, which can be utilized here as well.

Maven dependency of Kafka Streams

The best part of Kafka Stream is that it does not require any extra dependency apart from Stream libraries. Add the dependency to your pom.xml:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http...

Summary

In this chapter, you learned about Kafka Stream and how it makes sense to use Kafka Stream to do transformation when we have Kafka in our pipeline. We also went through the architecture, internal working, and integrated framework advantages of Kafka Streams. We covered KStream and KTable in brief and understood how they are different from each other. A detailed explanation of the Kafka Stream API is out of the scope of this book.

In the next chapter, we will cover the internals of Kafka clusters, capacity planning, single-cluster and multi-cluster deployment, and adding and removing brokers.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Building Data Streaming Applications with Apache Kafka
Published in: Aug 2017Publisher: PacktISBN-13: 9781787283985
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
Chanchal Singh

Chanchal Singh has over half decades experience in Product Development and Architect Design. He has been working very closely with leadership team of various companies including directors ,CTO's and Founding members to define technical road-map for company.He is the Founder and Speaker at meetup group Big Data and AI Pune MeetupExperience Speaks. He is Co-Author of Book Building Data Streaming Application with Apache Kafka. He has a Bachelor's degree in Information Technology from the University of Mumbai and a Master's degree in Computer Application from Amity University. He was also part of the Entrepreneur Cell in IIT Mumbai. His Linkedin Profile can be found at with the username Chanchal Singh.
Read more about Chanchal Singh

author image
Manish Kumar

Manish Kumar works as Director of Technology and Architecture at VSquare. He has over 13 years' experience in providing technology solutions to complex business problems. He has worked extensively on web application development, IoT, big data, cloud technologies, and blockchain. Aside from this book, Manish has co-authored three books (Mastering Hadoop 3, Artificial Intelligence for Big Data, and Building Streaming Applications with Apache Kafka).
Read more about Manish Kumar