Reader small image

You're reading from  Building Big Data Pipelines with Apache Beam

Product typeBook
Published inJan 2022
Reading LevelBeginner
PublisherPackt
ISBN-139781800564930
Edition1st Edition
Languages
Right arrow
Author (1)
Jan Lukavský
Jan Lukavský
author image
Jan Lukavský

Jan Lukavský is a freelance big data architect and engineer who is also a committer of Apache Beam. He is a certified Apache Hadoop professional. He is working on open source big data systems combining batch and streaming data pipelines in a unified model, enabling the rise of real-time, data-driven applications.
Read more about Jan Lukavský

Right arrow

The legacy Source API and the Read transform

Before the creation of the splittable DoFn object, Beam used the Source API and its associated Read transform. Although this transform is currently deprecated and should not be used for implementing new sources, it is still supported. On some runners and under specific conditions, using the deprecated Read transform might still be preferred. We have already seen examples of this – for example, the use_deprecated_read flag passed when using the --experiments flag for Python's ReadFromKafka transform.

The Read transform accepts a single parameter: either an object of the BoundedSource type or the UnboundedSource type. Whether the source is bounded or unbounded then determines if the resulting PCollection object is bounded or unbounded.

We apply the Read transform as follows:

Pipeline p = ...;
p.apply(Read.from(new MyUnboundedSource());

We will not go into the details of BoundedSource or UnboundedSource, mostly because...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Building Big Data Pipelines with Apache Beam
Published in: Jan 2022Publisher: PacktISBN-13: 9781800564930

Author (1)

author image
Jan Lukavský

Jan Lukavský is a freelance big data architect and engineer who is also a committer of Apache Beam. He is a certified Apache Hadoop professional. He is working on open source big data systems combining batch and streaming data pipelines in a unified model, enabling the rise of real-time, data-driven applications.
Read more about Jan Lukavský