You're reading from Building Big Data Pipelines with Apache Beam Use a single programming model for both batch and stream data processing

Product type Paperback

Published in Jan 2022

Publisher Packt

ISBN-13 9781800564930

Length 342 pages

Edition 1st Edition

Languages

Python

Tools

Apache Beam

Concepts

Big Data

Author (1):

Lukavský

View More author details

Table of Contents (13) Chapters

Preface

1. Section 1 Apache Beam: Essentials

2. Chapter 1: Introduction to Data Processing with Apache Beam FREE CHAPTER

3. Chapter 2: Implementing, Testing, and Deploying Basic Pipelines

4. Chapter 3: Implementing Pipelines Using Stateful Processing

5. Section 2 Apache Beam: Toward Improving Usability

6. Chapter 4: Structuring Code for Reusability

7. Chapter 5: Using SQL for Pipeline Implementation

8. Chapter 6: Using Your Preferred Language with Portability

9. Section 3 Apache Beam: Advanced Concepts

10. Chapter 7: Extending Apache Beam's I/O Connectors

11. Chapter 8: Understanding How Runners Execute Pipelines

12. Other Books You May Enjoy

Summary

In this chapter, we learned how unbounded streams of data can be viewed as time-varying relations and, as such, are suitable to be queried using SQL. We saw how standard SQL needs to be adjusted to fit streaming needs – we introduced three special functions called TUMBLE, HOP, and SESSION to be used in the GROUP BY clauses of SQL to apply a windowing strategy within SQL statements.

We explored that the prerequisite of applying Apache Beam SQL to PCollection is to create a PCollection<Row>, where Row represents the relational view of a stream, broken down to a structure with a given Schema, which represents the individual (possibly nested) fields of data elements inside PCollection. We also learned how to either automatically infer a schema from the given type using the @DefaultSchema annotation with a SchemaProvider such as JavaFieldSchema or JavaBeanSchema. When we cannot (or do not want to) use a @DefaultSchema, we can set the schema to a PCollection manually...

The rest of the chapter is locked

Tech Concepts

Programming languages

Tech Tools

Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.

Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.

50+ new titles added per month and exclusive early access to books as they are being written.

You're reading from Building Big Data Pipelines with Apache Beam Use a single programming model for both batch and stream data processing

Table of Contents (13) Chapters

Summary

Authors (1)

Personalised recommendations for you

You're reading from Building Big Data Pipelines with Apache Beam Use a single programming model for both batch and stream data processing

Table of Contents (13) Chapters

Summary

Authors (1)

Personalised recommendations for you

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access