You're reading from Apache Kafka 1.0 Cookbook
Life is not discrete; it is a continuous flow. The first four chapters were focused on how to deal with a data pipeline manipulating every message individually. But what happens when we need to find a pattern or make a calculation over a subset of messages?
In the data world, a stream is linked to the most important abstractions. A stream depicts a continuously updating and unbounded process. Here, unbounded means unlimited size. By definition, a stream is a fault-tolerant, replayable, and ordered sequence of immutable data records. A data record is defined as a key-value pair.
Before we proceed, some concepts need to be defined:
- Stream processing application: Any program that utilizes the Kafka streams library is known as a stream processing application.
- Processor topology: This is a topology that defines the computational logic of the data processing that a stream processing application requires to be performed. A topology is a graph of stream processors (nodes) connected by streams...
This recipe sets the project to use Kafka streams in the Treu application project.
- Open the
build.gradle
file on the Treu project generated in Chapter 4, Message Enrichment, and add these lines:
apply plugin: 'java' apply plugin: 'application' sourceCompatibility = '1.8' mainClassName = 'treu.StreamingApp' repositories { mavenCentral() } version = '0.1.0' dependencies { compile 'org.apache.kafka:kafka-clients:1.0.0' compile 'org.apache.kafka:kafka-streams:1.0.0' compile 'org.apache.avro:avro:1.7.7' } jar { manifest { attributes 'Main-Class': mainClassName } from { configurations.compile.collect { it.isDirectory() ? it : zipTree(it) } } { exclude "META-INF/*.SF" exclude "META-INF/*.DSA" exclude "META-INF/*.RSA" } }
- To rebuild the app, from the project root directory, run this command:
$ gradle jar...
In the previous recipe, the first version of the streaming app was coded. Now, in this recipe, everything is compiled and executed.
The streaming app doesn't receive arguments from the command line:
- To build the project, from the
treu
directory, run the following command:
$ gradle jar
If everything is OK, the output should be:
...BUILD SUCCESSFULTotal time: ...
- To run the project, we have four different command-line windows. The following diagram shows what the arrangement of command-line windows should look like:
Figure 6.1: The four Terminals to test the streaming application—Confluent Control Center, Message producer, Message consumer, and the application itself
- In the first command-line Terminal, run the control center:
$ <confluent-path>/bin/confluent start
- In the second command-line Terminal, create the two topics needed:
$ bin/kafka-topics --create --topic src-topic...