Reader small image

You're reading from  Building Data Streaming Applications with Apache Kafka

Product typeBook
Published inAug 2017
PublisherPackt
ISBN-139781787283985
Edition1st Edition
Tools
Right arrow
Authors (2):
Chanchal Singh
Chanchal Singh
author image
Chanchal Singh

Chanchal Singh has over half decades experience in Product Development and Architect Design. He has been working very closely with leadership team of various companies including directors ,CTO's and Founding members to define technical road-map for company.He is the Founder and Speaker at meetup group Big Data and AI Pune MeetupExperience Speaks. He is Co-Author of Book Building Data Streaming Application with Apache Kafka. He has a Bachelor's degree in Information Technology from the University of Mumbai and a Master's degree in Computer Application from Amity University. He was also part of the Entrepreneur Cell in IIT Mumbai. His Linkedin Profile can be found at with the username Chanchal Singh.
Read more about Chanchal Singh

Manish Kumar
Manish Kumar
author image
Manish Kumar

Manish Kumar works as Director of Technology and Architecture at VSquare. He has over 13 years' experience in providing technology solutions to complex business problems. He has worked extensively on web application development, IoT, big data, cloud technologies, and blockchain. Aside from this book, Manish has co-authored three books (Mastering Hadoop 3, Artificial Intelligence for Big Data, and Building Streaming Applications with Apache Kafka).
Read more about Manish Kumar

View More author details
Right arrow

Securing Kafka

In all the earlier chapters, you learned how to use Kafka. In this chapter, our focus is more towards securing Kafka. Securing Kafka is one of the important aspect in enterprise adoption of Kafka. Organizations have lot of sensitive information that needs to be stored in secure environment to ensure security compliance. In this chapter, we focus on ways of securing sensitive information in Kafka. We will focus on the different security aspects of Apache Kafka and will cover the following topics:

  • An overview of securing Kafka
  • Wire encryption using SSL
  • Kerberos SASL for authentication
  • Understanding ACL and authorization
  • Understanding Zookeeper authentication
  • Apache Ranger for authorization
  • Best practices for Kafka security

An overview of securing Kafka

Kafka is used as a centralized event data store, receiving data from various sources, such as micro services and databases.

In any enterprise deployment of Kafka, security should be looked at from five paradigms. They are as follows:

  • Authentication: This establishes who the client(producer or consumer) is that trying to use Kafka services. Kafka has support for the Kerberos authentication mechanism.
  • Authorization: This establishes what kind of permission the client (producer or consumer) has on topics. Kafka has support for ACLs for authorization. Apache tools, such as Ranger, can also be used for Kafka authorization.
  • Wire encryption: This ensures that any sensitive data traveling over the network is encrypted and not in plain text. Kafka has support for SSL communication between the client (producer or consumer) and the broker. Even inter-broker...

Wire encryption using SSL

In Kafka, you can enable support for Secure Sockets Layer (SSL) wire encryption. Any data communication over the network in Kafka can be SSL-wire encrypted. Therefore, you can encrypt any communication between Kafka brokers (replication) or between client and broker (read or write).

The following diagram represents how SSL encryption works in Kafka:

The preceding diagram depicts how communication between broker and client is encrypted. This is valid for both producer and consumer communications. Every broker or client maintains their keys and certificates. They also maintain truststores containing certificates for authentication. Whenever certificates are presented for authentication, they are verified against certificates stored in truststores of respective components.

...

Kerberos SASL for authentication

Kerberos is an authentication mechanism of clients or servers over secured network. It provides authentication without transferring the password over the network. It works by using time-sensitive tickets that are generated using symmetric key cryptography.

It was chosen over the most-widely-used SSL-based authentication. Kerberos has the following advantages:

  • Better performance: Kerberos uses symmetric key operations. This helps in faster authentication, which is different from SSL key-based authentication.
  • Easy integration with Enterprise Identity Server: Kerberos is one of the established authentication mechanisms. Identity servers such as Active Directory have support for Kerberos. In this way, services such as Kafka can be easily integrated with centralized authentication servers.
  • Simpler user management: Creating, deleting, and updating users...

Understanding ACL and authorization

Apache Kafka comes with a pluggable authorizer known as Kafka Authorization Command Line (ACL) Interface, which is used for defining users and allowing or denying them to access its various APIs. The default behavior is that only a superuser is allowed to access all the resources of the Kafka cluster, and no other user can access those resources if no proper ACL is defined for those users. The general format in which Kafka ACL is defined is as follows:

Principal P is Allowed OR Denied Operation O From Host H On Resource R.

The terms used in this definition are as follows:

  • Principal is the user who can access Kafka
  • Operation is read, write, describe, delete, and so on
  • Host is an IP of the Kafka client that is trying to connect to the broker
  • Resource refers to Kafka resources such as topic, group, cluster

Let's discuss a few common ACL...

Understanding Zookeeper authentication

Zookeeper is the metadata service for Kafka. SASL-enabled Zookeeper services first authenticate access to metadata stored in Zookeeper. Kafka brokers need to authenticate themselves using Kerberos to use Zookeeper services. If valid, the Kerberos ticket is presented to Zookeeper, it then provides access to the metadata stored in it. After valid authentication, Zookeeper establishes connecting user or service identity. This identity is then used to authorize access to metadata Znodes guarded by ACLs.

One important thing for you to understand is that Zookeeper ACLs restrict modifications to Znodes. Znodes can be read by any client. The philosophy behind this behavior is that sensitive data is not stored in Zookeeper. However, modifications by an unauthorized user can disrupt your cluster's behavior. Hence, Znodes are world readable, but...

Apache Ranger for authorization

Ranger is a used to monitor and manage security across the Hadoop ecosystem. It provides a centralized platform from which to create and manage security policies across the cluster. We will look at how we can use Ranger to create policies for the Kafka cluster.

Adding Kafka Service to Ranger

The following screenshot shows the user interface in Ranger which is used to add services. We will add Kafka service here to configure policies for it later:


Let's look into the Service Details :

  • Service name: The service name needs to be set up in agent config. For example, in this case, it can be Kafka
  • Description: This represents what this service will do
  • Active Status: This refers to enabling...

Best practices

Here is a list of best practices to optimize your experience with Kafka:

  • Enable detailed logs for Kerberos: Troubleshooting Kerberos issues can be a nightmare for technical stakeholders. Sometimes it is difficult to understand why Kerberos authentication is not working. It also happens that errors are not that very informative and you get the root cause by looking at the actual authentication flows. Hence, you need to have a proper debugging set for Kerberos. In Kafka or, as a matter of fact, in any JAVA Kerberos-enabled application, you can set the Kerberos debug level using the following property:
sun.security.krb5.debug=true
  • Integrate with Enterprise Identity Server: You should always integrate your Kerberos authentication with Enterprise Identity Servers. It has many benefits. You do not have to manage more than one version of users. Any user deletion activity...

Summary

In this chapter, we covered different Kafka security paradigms. Our goal with this chapter is to ensure that you understand different paradigms of securing Kafka. We wanted you to first understand what are different areas you should evaluate while securing Kafka. After that, we wanted to address how parts of securing Kafka. One thing to note here is that Authentication and Authorization is something you have to always implement in a secure Kafka cluster. Without these two, your Kafka cluster is not secure. SSL can be optional but is strongly recommended for highly sensitive data. Please keep not of best practices of securing Kafka as these are more gathered from practical industry implementation experiences of securing Kafka.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Building Data Streaming Applications with Apache Kafka
Published in: Aug 2017Publisher: PacktISBN-13: 9781787283985
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
Chanchal Singh

Chanchal Singh has over half decades experience in Product Development and Architect Design. He has been working very closely with leadership team of various companies including directors ,CTO's and Founding members to define technical road-map for company.He is the Founder and Speaker at meetup group Big Data and AI Pune MeetupExperience Speaks. He is Co-Author of Book Building Data Streaming Application with Apache Kafka. He has a Bachelor's degree in Information Technology from the University of Mumbai and a Master's degree in Computer Application from Amity University. He was also part of the Entrepreneur Cell in IIT Mumbai. His Linkedin Profile can be found at with the username Chanchal Singh.
Read more about Chanchal Singh

author image
Manish Kumar

Manish Kumar works as Director of Technology and Architecture at VSquare. He has over 13 years' experience in providing technology solutions to complex business problems. He has worked extensively on web application development, IoT, big data, cloud technologies, and blockchain. Aside from this book, Manish has co-authored three books (Mastering Hadoop 3, Artificial Intelligence for Big Data, and Building Streaming Applications with Apache Kafka).
Read more about Manish Kumar