Practical Real-time Data Processing and Analytics

A practical guide to help you tackle different real-time data processing and analytics problems using the best tools for each scenario
Preview in Mapt

Practical Real-time Data Processing and Analytics

Shilpi Saxena, Saurabh Gupta

A practical guide to help you tackle different real-time data processing and analytics problems using the best tools for each scenario

Quick links: > What will you learn?> Table of content

eBook
$5.00
RRP $39.99
Save 87%
Print + eBook
$49.99
RRP $49.99
What do I get with a Mapt Pro subscription?
  • Unlimited access to all Packt’s 5,000+ eBooks and Videos
  • Early Access content, Progress Tracking, and Assessments
  • 1 Free eBook or Video to download and keep every month after trial
What do I get with an eBook?
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with Print & eBook?
  • Get a paperback copy of the book delivered to you
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with a Video?
  • Download this Video course in MP4 format
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
$5.00
$49.99
RRP $39.99
RRP $49.99
eBook
Print + eBook

Frequently bought together


Practical Real-time Data Processing and Analytics Book Cover
Practical Real-time Data Processing and Analytics
$ 39.99
$ 5.00
Practical Time Series Analysis Book Cover
Practical Time Series Analysis
$ 35.99
$ 5.00
Buy 2 for $10.00
Save $65.98
Add to Cart

Book Details

ISBN 139781787281202
Paperback360 pages

Book Description

With the rise of Big Data, there is an increasing need to process large amounts of data continuously, with a shorter turnaround time. Real-time data processing involves continuous input, processing and output of data, with the condition that the time required for processing is as short as possible.

This book covers the majority of the existing and evolving open source technology stack for real-time processing and analytics. You will get to know about all the real-time solution aspects, from the source to the presentation to persistence. Through this practical book, you’ll be equipped with a clear understanding of how to solve challenges on your own.

We’ll cover topics such as how to set up components, basic executions, integrations, advanced use cases, alerts, and monitoring. You’ll be exposed to the popular tools used in real-time processing today such as Apache Spark, Apache Flink, and Storm. Finally, you will put your knowledge to practical use by implementing all of the techniques in the form of a practical, real-world use case.

By the end of this book, you will have a solid understanding of all the aspects of real-time data processing and analytics, and will know how to deploy the solutions in production environments in the best possible manner.

Table of Contents

Chapter 1: Introducing Real-Time Analytics
What is big data?
Big data infrastructure
Real–time analytics – the myth and the reality
Near real–time solution – an architecture that works
Lambda architecture – analytics possibilities
IOT – thoughts and possibilities
Cloud – considerations for NRT and IOT
Summary
Chapter 2: Real Time Applications – The Basic Ingredients
The NRT system and its building blocks
NRT – high-level system view
NRT – technology view
Summary
Chapter 3: Understanding and Tailing Data Streams
Understanding data streams
Setting up infrastructure for data ingestion
Taping data from source to the processor - expectations and caveats
Comparing and choosing what works best for your use case
Do it yourself
Summary
Chapter 4: Setting up the Infrastructure for Storm
Overview of Storm
Storm architecture and its components
Setting up and configuring Storm
Real-time processing job on Storm
Summary
Chapter 5: Configuring Apache Spark and Flink
Setting up and a quick execution of Spark
Setting up and a quick execution of Flink
Setting up and a quick execution of Apache Beam
Balancing in Apache Beam
Summary
Chapter 6: Integrating Storm with a Data Source
RabbitMQ – messaging that works
RabbitMQ exchanges
RabbitMQ – integration with Storm
PubNub data stream publisher
String together Storm-RMQ-PubNub sensor data topology
Summary
Chapter 7: From Storm to Sink
Setting up and configuring Cassandra
Storm and Cassandra topology
Storm and IMDB integration for dimensional data
Integrating the presentation layer with Storm
Do It Yourself
Summary
Chapter 8: Storm Trident
State retention and the need for Trident
Basic Storm Trident topology
Trident internals
Trident operations
DRPC
Do It Yourself
Summary
Chapter 9: Working with Spark
Spark overview
Distinct advantages of Spark
Spark – use cases
Spark architecture - working inside the engine
Spark pragmatic concepts
Spark 2.x – advent of data frames and datasets
Summary
Chapter 10: Working with Spark Operations
Spark – packaging and API
RDD pragmatic exploration
Shared variables – broadcast variables and accumulators
Summary
Chapter 11: Spark Streaming
Spark Streaming concepts
Spark Streaming - introduction and architecture
Packaging structure of Spark Streaming
Connecting Kafka to Spark Streaming
Summary
Chapter 12: Working with Apache Flink
Flink architecture and execution engine
Flink basic components and processes
Integration of source stream to Flink
Flink processing and computation
Flink persistence
FlinkCEP
Pattern API
Gelly
DIY
Summary
Chapter 13: Case Study
Introduction
Data modeling
Tools and frameworks
Setting up the infrastructure
Implementing the case study
Running the case study
Summary

What You Will Learn

  • Get an introduction to the established real-time stack
  • Understand the key integration of all the components
  • Get a thorough understanding of the basic building blocks for real-time solution designing
  • Garnish the search and visualization aspects for your real-time solution
  • Get conceptually and practically acquainted with real-time analytics
  • Be well equipped to apply the knowledge and create your own solutions

Authors

Table of Contents

Chapter 1: Introducing Real-Time Analytics
What is big data?
Big data infrastructure
Real–time analytics – the myth and the reality
Near real–time solution – an architecture that works
Lambda architecture – analytics possibilities
IOT – thoughts and possibilities
Cloud – considerations for NRT and IOT
Summary
Chapter 2: Real Time Applications – The Basic Ingredients
The NRT system and its building blocks
NRT – high-level system view
NRT – technology view
Summary
Chapter 3: Understanding and Tailing Data Streams
Understanding data streams
Setting up infrastructure for data ingestion
Taping data from source to the processor - expectations and caveats
Comparing and choosing what works best for your use case
Do it yourself
Summary
Chapter 4: Setting up the Infrastructure for Storm
Overview of Storm
Storm architecture and its components
Setting up and configuring Storm
Real-time processing job on Storm
Summary
Chapter 5: Configuring Apache Spark and Flink
Setting up and a quick execution of Spark
Setting up and a quick execution of Flink
Setting up and a quick execution of Apache Beam
Balancing in Apache Beam
Summary
Chapter 6: Integrating Storm with a Data Source
RabbitMQ – messaging that works
RabbitMQ exchanges
RabbitMQ – integration with Storm
PubNub data stream publisher
String together Storm-RMQ-PubNub sensor data topology
Summary
Chapter 7: From Storm to Sink
Setting up and configuring Cassandra
Storm and Cassandra topology
Storm and IMDB integration for dimensional data
Integrating the presentation layer with Storm
Do It Yourself
Summary
Chapter 8: Storm Trident
State retention and the need for Trident
Basic Storm Trident topology
Trident internals
Trident operations
DRPC
Do It Yourself
Summary
Chapter 9: Working with Spark
Spark overview
Distinct advantages of Spark
Spark – use cases
Spark architecture - working inside the engine
Spark pragmatic concepts
Spark 2.x – advent of data frames and datasets
Summary
Chapter 10: Working with Spark Operations
Spark – packaging and API
RDD pragmatic exploration
Shared variables – broadcast variables and accumulators
Summary
Chapter 11: Spark Streaming
Spark Streaming concepts
Spark Streaming - introduction and architecture
Packaging structure of Spark Streaming
Connecting Kafka to Spark Streaming
Summary
Chapter 12: Working with Apache Flink
Flink architecture and execution engine
Flink basic components and processes
Integration of source stream to Flink
Flink processing and computation
Flink persistence
FlinkCEP
Pattern API
Gelly
DIY
Summary
Chapter 13: Case Study
Introduction
Data modeling
Tools and frameworks
Setting up the infrastructure
Implementing the case study
Running the case study
Summary

Book Details

ISBN 139781787281202
Paperback360 pages
Read More

Read More Reviews

Recommended for You

Practical Time Series Analysis Book Cover
Practical Time Series Analysis
$ 35.99
$ 5.00
Statistics for Machine Learning Book Cover
Statistics for Machine Learning
$ 39.99
$ 5.00
Machine Learning: End-to-End guide for Java developers Book Cover
Machine Learning: End-to-End guide for Java developers
$ 75.99
$ 5.00
Advanced Machine Learning with Python Book Cover
Advanced Machine Learning with Python
$ 35.99
$ 5.00
Building Data Streaming Applications with Apache Kafka Book Cover
Building Data Streaming Applications with Apache Kafka
$ 35.99
$ 5.00
Microsoft Power BI Cookbook Book Cover
Microsoft Power BI Cookbook
$ 47.99
$ 5.00