Reader small image

You're reading from  Building Big Data Pipelines with Apache Beam

Product typeBook
Published inJan 2022
Reading LevelBeginner
PublisherPackt
ISBN-139781800564930
Edition1st Edition
Languages
Right arrow
Author (1)
Jan Lukavský
Jan Lukavský
author image
Jan Lukavský

Jan Lukavský is a freelance big data architect and engineer who is also a committer of Apache Beam. He is a certified Apache Hadoop professional. He is working on open source big data systems combining batch and streaming data pipelines in a unified model, enabling the rise of real-time, data-driven applications.
Read more about Jan Lukavský

Right arrow

Using side inputs

We have already seen how to use side outputs, and side inputs are analogous to them. Besides the single main input, a ParDo transform can have multiple additional side inputs, as shown in the following figure:

Figure 3.13 – Side inputs

We have multiple ways of declaring a side input to a ParDo object. For instance, consider the following example:

ParDo.of(new MyDoFn())
Analogous to side outputs is also the way how we declare a side input – we must provide it to the ParDo by call to withSideInput as follows:
input.apply(ParDo.of(new MyDoFn())
    .withSideInput("side-input", sideInput));

Because we may have multiple side inputs, we need a way to distinguish them – if we assign a name to the side input, we can later access it easily in DoFn using a @SideInput annotation:

@ProcessElement
public void processElement(
    @Element .. element,
    ...
lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Building Big Data Pipelines with Apache Beam
Published in: Jan 2022Publisher: PacktISBN-13: 9781800564930

Author (1)

author image
Jan Lukavský

Jan Lukavský is a freelance big data architect and engineer who is also a committer of Apache Beam. He is a certified Apache Hadoop professional. He is working on open source big data systems combining batch and streaming data pipelines in a unified model, enabling the rise of real-time, data-driven applications.
Read more about Jan Lukavský