Reader small image

You're reading from  Building Big Data Pipelines with Apache Beam

Product typeBook
Published inJan 2022
Reading LevelBeginner
PublisherPackt
ISBN-139781800564930
Edition1st Edition
Languages
Right arrow
Author (1)
Jan Lukavský
Jan Lukavský
author image
Jan Lukavský

Jan Lukavský is a freelance big data architect and engineer who is also a committer of Apache Beam. He is a certified Apache Hadoop professional. He is working on open source big data systems combining batch and streaming data pipelines in a unified model, enabling the rise of real-time, data-driven applications.
Read more about Jan Lukavský

Right arrow

Further development of Apache Beam SQL

In this section, we will sum up the possible further development of Apache Beam SQL and what parts are currently expected to be missing or somewhat incomplete.

At the end of the previous chapter, we described the retract and upsert streams and defined time-varying relations on top of these streams. Although Apache Beam does contain generic retractions as part of its model, they are not implemented at the moment. The same is true for SQL. Among other things, it implies that Apache Beam SQL currently does not support full stream-to-stream joins, only windowed joins.

A windowed join, by itself, does not guarantee that retractions will not be needed, but when using a default trigger without allowed lateness – or a trigger that fires past the end of the window, plus allowed lateness only –no retractions are needed. The reason for this is that all the data is projected onto the timestamp at the end of the window, and the window ends...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Building Big Data Pipelines with Apache Beam
Published in: Jan 2022Publisher: PacktISBN-13: 9781800564930

Author (1)

author image
Jan Lukavský

Jan Lukavský is a freelance big data architect and engineer who is also a committer of Apache Beam. He is a certified Apache Hadoop professional. He is working on open source big data systems combining batch and streaming data pipelines in a unified model, enabling the rise of real-time, data-driven applications.
Read more about Jan Lukavský