Learning Apache Apex

Designing and writing real-time streaming applications with Apache Apex
Preview in Mapt

Learning Apache Apex

Thomas Weise et al.

Designing and writing real-time streaming applications with Apache Apex

Quick links: > What will you learn?> Table of content

eBook
$5.00
RRP $35.99
Save 86%
Print + eBook
$44.99
RRP $44.99
What do I get with a Mapt Pro subscription?
  • Unlimited access to all Packt’s 5,000+ eBooks and Videos
  • Early Access content, Progress Tracking, and Assessments
  • 1 Free eBook or Video to download and keep every month after trial
What do I get with an eBook?
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with Print & eBook?
  • Get a paperback copy of the book delivered to you
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with a Video?
  • Download this Video course in MP4 format
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
$5.00
$44.99
RRP $35.99
RRP $44.99
eBook
Print + eBook

Frequently bought together


Learning Apache Apex Book Cover
Learning Apache Apex
$ 35.99
$ 5.00
Deep Learning By Example Book Cover
Deep Learning By Example
$ 39.99
$ 5.00
Buy 2 for $10.00
Save $65.98
Add to Cart

Book Details

ISBN 139781788296403
Paperback290 pages

Book Description

Apache Apex is a next-generation stream processing framework designed to operate on data at large scale, with minimum latency, maximum reliability, and strict correctness guarantees.

Half of the book consists of Apex applications, showing you key aspects of data processing pipelines such as connectors for sources and sinks, and common data transformations. The other half of the book is evenly split into explaining the Apex framework, and tuning, testing, and scaling Apex applications.

Much of our economic world depends on growing streams of data, such as social media feeds, financial records, data from mobile devices, sensors and machines (the Internet of Things - IoT). The projects in the book show how to process such streams to gain valuable, timely, and actionable insights. Traditional use cases, such as ETL, that currently consume a significant chunk of data engineering resources are also covered.

The final chapter shows you future possibilities emerging in the streaming space, and how Apache Apex can contribute to it.

Table of Contents

Chapter 1: Introduction to Apex
Unbounded data and continuous processing
Use cases and case studies
Application Model and API
Value proposition of Apex
Summary
Chapter 2: Getting Started with Application Development
Development process and methodology
Setting up the development environment
Creating a new Maven project
Application specifications
Custom operator development
Application configuration
Testing in the IDE
Running the application on YARN
Working on the cluster
Summary
Chapter 3: The Apex Library
An overview of the library
Integrations
Transformations
Summary
Chapter 4: Scalability, Low Latency, and Performance
Partitioning and how it works
Elasticity
Partitioning toolkit
Custom dynamic partitioning
Performance optimizations
Low-latency versus throughput
Sample application for dynamic partitioning
Performance – other aspects for custom operators
Summary
Chapter 5: Fault Tolerance and Reliability
Distributed systems need to be resilient
Fault-tolerance components and mechanism in Apex
Checkpointing
Processing guarantees
Summary
Chapter 6: Example Project – Real-Time Aggregation and Visualization
Streaming ETL and beyond
The application pattern in a real-world use case
Analyzing Twitter feed
Running the application
The Pub/Sub server
Grafana visualization
Summary
Chapter 7: Example Project – Real-Time Ride Service Data Processing
The goal
Datasource
The pipeline
Simulation of a real-time feed using historical data
Parsing the data
Looking up of the zip code and preparing for the windowing operation
Windowed operator configuration
Serving the data with WebSocket
Running the application
Running the application on GCP Dataproc
Summary
Chapter 8: Example Project – ETL Using SQL
The application pipeline
Building and running the application
Application configuration
The application code
Partitioning
Application testing
Understanding application logs
Calcite integration
Summary
Chapter 9: Introduction to Apache Beam
Introduction to Apache Beam
Beam concepts
WordCount in Apache Beam
Running Apache Beam WordCount on Apache Apex
Summary
Chapter 10: The Future of Stream Processing
Lower barrier for building streaming pipelines
Summary

What You Will Learn

  • Put together a functioning Apex application from scratch
  • Scale an Apex application and configure it for optimal performance
  • Understand how to deal with failures via the fault tolerance features of the platform
  • Use Apex via other frameworks such as Beam
  • Understand the DevOps implications of deploying Apex

Authors

Table of Contents

Chapter 1: Introduction to Apex
Unbounded data and continuous processing
Use cases and case studies
Application Model and API
Value proposition of Apex
Summary
Chapter 2: Getting Started with Application Development
Development process and methodology
Setting up the development environment
Creating a new Maven project
Application specifications
Custom operator development
Application configuration
Testing in the IDE
Running the application on YARN
Working on the cluster
Summary
Chapter 3: The Apex Library
An overview of the library
Integrations
Transformations
Summary
Chapter 4: Scalability, Low Latency, and Performance
Partitioning and how it works
Elasticity
Partitioning toolkit
Custom dynamic partitioning
Performance optimizations
Low-latency versus throughput
Sample application for dynamic partitioning
Performance – other aspects for custom operators
Summary
Chapter 5: Fault Tolerance and Reliability
Distributed systems need to be resilient
Fault-tolerance components and mechanism in Apex
Checkpointing
Processing guarantees
Summary
Chapter 6: Example Project – Real-Time Aggregation and Visualization
Streaming ETL and beyond
The application pattern in a real-world use case
Analyzing Twitter feed
Running the application
The Pub/Sub server
Grafana visualization
Summary
Chapter 7: Example Project – Real-Time Ride Service Data Processing
The goal
Datasource
The pipeline
Simulation of a real-time feed using historical data
Parsing the data
Looking up of the zip code and preparing for the windowing operation
Windowed operator configuration
Serving the data with WebSocket
Running the application
Running the application on GCP Dataproc
Summary
Chapter 8: Example Project – ETL Using SQL
The application pipeline
Building and running the application
Application configuration
The application code
Partitioning
Application testing
Understanding application logs
Calcite integration
Summary
Chapter 9: Introduction to Apache Beam
Introduction to Apache Beam
Beam concepts
WordCount in Apache Beam
Running Apache Beam WordCount on Apache Apex
Summary
Chapter 10: The Future of Stream Processing
Lower barrier for building streaming pipelines
Summary

Book Details

ISBN 139781788296403
Paperback290 pages
Read More

Read More Reviews

Recommended for You

Deep Learning By Example Book Cover
Deep Learning By Example
$ 39.99
$ 5.00
Hands-on DevOps Book Cover
Hands-on DevOps
$ 35.99
$ 5.00
Kotlin Blueprints Book Cover
Kotlin Blueprints
$ 35.99
$ 5.00
Microservice Patterns and Best Practices Book Cover
Microservice Patterns and Best Practices
$ 35.99
$ 5.00
Predictive Analytics with TensorFlow Book Cover
Predictive Analytics with TensorFlow
$ 39.99
$ 5.00
Apache Kafka 1.0 Cookbook Book Cover
Apache Kafka 1.0 Cookbook
$ 27.99
$ 5.00