Reader small image

You're reading from  Building Big Data Pipelines with Apache Beam

Product typeBook
Published inJan 2022
Reading LevelBeginner
PublisherPackt
ISBN-139781800564930
Edition1st Edition
Languages
Right arrow
Author (1)
Jan Lukavský
Jan Lukavský
author image
Jan Lukavský

Jan Lukavský is a freelance big data architect and engineer who is also a committer of Apache Beam. He is a certified Apache Hadoop professional. He is working on open source big data systems combining batch and streaming data pipelines in a unified model, enabling the rise of real-time, data-driven applications.
Read more about Jan Lukavský

Right arrow

Task 16 – Implementing SQLSportTrackerMotivation

In this task, we will explore the benefits that SQL DSL brings us when it comes to more complex pipelines that are composed of several aggregations, joins, and so on. Again, as a recap, let's restate the problem definition.

Problem definition

Given a GPS location stream per workout (the same as in the previous task), create another stream that would contain information if the runner increased or decreased pace in the past minute by more than 10% compared to the average pace over the last 5 minutes. Again, use SQL DSL as much as possible.

The test and deployment are the same as in the corresponding SportTracker task, so we will skip this here. Instead, we will demonstrate how SQL (and schemas) can help us when we are dealing with joins – which is what we did when we were implementing our SportTrackerMovation example. So, let's reimplement that as well!

Problem decomposition discussion

In the original...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Building Big Data Pipelines with Apache Beam
Published in: Jan 2022Publisher: PacktISBN-13: 9781800564930

Author (1)

author image
Jan Lukavský

Jan Lukavský is a freelance big data architect and engineer who is also a committer of Apache Beam. He is a certified Apache Hadoop professional. He is working on open source big data systems combining batch and streaming data pipelines in a unified model, enabling the rise of real-time, data-driven applications.
Read more about Jan Lukavský