You're reading from Apache Kafka 1.0 Cookbook
The Confluent Platform is a full stream data system. It enables you to organize and manage data from several sources in one high-performance and reliable system. As mentioned in the first few chapters, the goal of an enterprise service bus is not only to provide the system a means to transport messages and data but also to provide all the tools that are required to connect the data origins (data sources), applications, and data destinations (data sinks) to the platform.
The Confluent Platform has these parts:
- Confluent Platform open source
- Confluent Platform enterprise
- Confluent Cloud
The Confluent Platform open source has the following components:
- Apache Kafka core
- Kafka Streams
- Kafka Connect
- Kafka clients
- Kafka REST Proxy
- Kafka Schema Registry
The Confluent Platform enterprise has the following components:
- Confluent Control Center
- Confluent support, professional services, and consulting
All the components are open source except the Confluent Control Center, which is a proprietary of Confluent...
In order to use the REST proxy and the Schema Registry, we need to install the Confluent Platform. Also, the Confluent Platform has important administration, operation, and monitoring features fundamental for modern Kafka production systems.
At the time of writing this book, the Confluent Platform Version is 4.0.0.
Currently, the supported operating systems are:
- Debian 8
- Red Hat Enterprise Linux
- CentOS 6.8 or 7.2
- Ubuntu 14.04 LTS and 16.04 LTS
macOS currently is just supported for testing and development purposes, not for production environments. Windows is not yet supported. Oracle Java 1.7 or higher is required.
The default ports for the components are:
2181
: Apache ZooKeeper8081
: Schema Registry (REST API)8082
: Kafka REST Proxy8083
: Kafka Connect (REST API)9021
: Confluent Control Center9092
: Apache Kafka brokers
It is important to have these ports, or the ports where the components are going to run, open.
With the Confluent Platform installed, the administration, operation, and monitoring of Kafka become very simple. Let's review how to operate Kafka with the Confluent Platform.
The commands in this section should be executed from the directory where the Confluent Platform is installed:
- To start ZooKeeper, Kafka, and the Schema Registry with one command, run:
$ confluent start schema-registry
The output of this command should be:
Starting zookeeperzookeeper is [UP]Starting kafkakafka is [UP]Starting schema-registryschema-registry is [UP]
Note
To execute the commands outside the installation directory, add Confluent's bin
directory to PATH
:export PATH=<path_to_confluent>/bin:$PATH
- To manually start each service with its own command, run:
$ ./bin/zookeeper-server-start ./etc/kafka/zookeeper.properties$ ./bin/kafka-server-start ./etc/kafka/server.properties$ ./bin/schema-registry-start...
This recipe shows you how to use the metrics reporter of the Confluent Control Center.
The execution of the previous recipe is needed.
Before starting the Control Center, configure the metrics reporter:
- Back up the
server.properties
file located at:
<confluent_path>/etc/kafka/server.properties
- In the
server.properties
file, uncomment the following lines:
metric.reporters=io.confluent.metrics.reporter.ConfluentMetricsReporter confluent.metrics.reporter.bootstrap.servers=localhost:9092 confluent.metrics.reporter.topic.replicas=1
- Back up the Kafka Connect configuration located in:
<confluent_path>/etc/schema-registry/connect-avro-distributed.properties
- Add the following lines at the end of the
connect-avro-distributed.properties
file:
consumer.interceptor.classes=io.confluent.monitoring.clients.interceptor.MonitoringConsumerInterceptor producer.interceptor.classes=io.confluent.monitoring.clients.interceptor.MonitoringProducerInterceptor...
The Schema Registry is a repository. It is a metadata-serving layer for schemas. It provides a REST interface for storing and retrieving Avro schemas. It has a versioned history of schemas and provides compatibility analysis to leverage schema evolution based on that compatibility.
Remember that the Schema Registry has a REST interface; so, in this recipe, we use Java to make HTTP requests, but it is precisely a REST interface used to promote language and platform neutrality.
Remember the Customer sees BTC price Avro schema of Doubloon:
{ "name": "customer_sees_btcprice", "namespace": "doubloon.avro", "type": "record", "fields": [ { "name": "event", "type": "string" }, { "name": "customer", "type": { "name": "id", "type": "long", "name": "name", "type": "string", "name": "ipAddress", "type": "string...
What happens if we want to use Kafka in an environment that is not yet supported? Think in terms of something such as JavaScript, PHP, and so on.
For this and other programming challenges, the Kafka REST Proxy provides a RESTful interface to a Kafka cluster.
From a REST interface, one can produce and consume messages, view the state of the cluster, and perform administrative actions without using the native Kafka protocol or clients.
The example use cases are:
- Sending data to Kafka from a frontend app built in a non-supported language (yes, think of the JavaScript and PHP fronts, for example).
- The need to communicate with Kafka from an environment that doesn't support Kafka (think in terms of mainframes and legacy systems).
- Scripting administrative actions. Think of a DevOps team in charge of a Kafka system and a sysadmin who doesn't know the supported languages (Java, Scala, Python, Go, or C/C++).
As mentioned, Kafka Connect is a framework used to connect Kafka with external systems such as key-value stores (think of Riak, Coherence, and Dynamo), databases (Cassandra), search indexes (Elastic), and filesystems (HDFS).
In this book, there is a whole chapter about Kafka connectors, but this recipe is part of the Confluent Platform.
To read a data file with Kafka Connect:
- To list the installed connectors:
$ confluent list connectors
Bundled Predefined Connectors (edit configuration under etc/):
elasticsearch-sink
file-source
file-sink
jdbc-source
jdbc-sink
hdfs-sink
s3-sink
- The configuration file is located at
./etc/kafka/connect-file-source.properties
. It has these values:- The instance name:
name=file_source
- The implementer class:
connector.class=FileStreamSource
- The number of tasks of this connector instance:
tasks.max=1
- The input file:
file=continuous...