Monitoring real-time data processing with Apache Spark Structured Streaming
In this recipe, you will learn how to do the following:
- Use the
statusandrecentProgressattributes of a streaming query to get information about the input rate, processing rate, latency, state size, and more - Use the
StreamingQueryListenerAPI to register a custom listener that can handle events related to the start, progress, and termination of a streaming query
To monitor the performance and progress of your streaming queries, Structured Streaming provides various metrics and APIs that you can use to access them.
Getting ready
Before we start, we need to make sure that we have a Kafka cluster running and a topic that produces some streaming data. For simplicity, we will use a single-node Kafka cluster and a topic named users. Open the 5.0 user-gen-kafka.ipynb notebook and execute the cell. This notebook produces a user record every few seconds and puts it on a Kafka topic called...