Reader small image

You're reading from  Building Big Data Pipelines with Apache Beam

Product typeBook
Published inJan 2022
Reading LevelBeginner
PublisherPackt
ISBN-139781800564930
Edition1st Edition
Languages
Right arrow
Author (1)
Jan Lukavský
Jan Lukavský
author image
Jan Lukavský

Jan Lukavský is a freelance big data architect and engineer who is also a committer of Apache Beam. He is a certified Apache Hadoop professional. He is working on open source big data systems combining batch and streaming data pipelines in a unified model, enabling the rise of real-time, data-driven applications.
Read more about Jan Lukavský

Right arrow

Ensuring pipeline upgradability

First, be aware that Beam (currently, as of version 2.28.0) does not offer an abstraction that would allow us to transfer a pipeline between runners including its state. The code of the pipeline can be transferred, but that means a new pipeline will be created and that any computation done on the previous runner is lost. That is due to the fact that, currently, the processing of pipeline upgrades is a runner-specific task, and details might therefore differ slightly based on which runner we choose.

That is the bad news. The good news is that the pipeline upgrade process is generally subject to the same constraints that are mostly runner independent and, therefore, the chances are high that very similar rules will apply to the majority of runners.

Let's look at the tasks that a runner must perform to upgrade a pipeline:

  1. The complete state (including the timers) of all transforms must be stored in a durable and persistent location. This...
lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Building Big Data Pipelines with Apache Beam
Published in: Jan 2022Publisher: PacktISBN-13: 9781800564930

Author (1)

author image
Jan Lukavský

Jan Lukavský is a freelance big data architect and engineer who is also a committer of Apache Beam. He is a certified Apache Hadoop professional. He is working on open source big data systems combining batch and streaming data pipelines in a unified model, enabling the rise of real-time, data-driven applications.
Read more about Jan Lukavský