Reader small image

You're reading from  Building Big Data Pipelines with Apache Beam

Product typeBook
Published inJan 2022
Reading LevelBeginner
PublisherPackt
ISBN-139781800564930
Edition1st Edition
Languages
Right arrow
Author (1)
Jan Lukavský
Jan Lukavský
author image
Jan Lukavský

Jan Lukavský is a freelance big data architect and engineer who is also a committer of Apache Beam. He is a certified Apache Hadoop professional. He is working on open source big data systems combining batch and streaming data pipelines in a unified model, enabling the rise of real-time, data-driven applications.
Read more about Jan Lukavský

Right arrow

Introducing the Join library DSL

Before we proceed, let's recall what a relation JOIN is. A relation can be viewed as a table. This table can have an arbitrary number of columns, but for the sake of this discussion, only three of them matter, as shown in the following table:

Table 4.1 – A sample relationship between individuals

This table defines a relation of a set of individuals (alice, bob), a set of different genders (female, male), and a set of some other properties with values of foo and bar. If the table contained more than three columns, we could view all the other values in the table as a single value. The actual structure and data type of the value column are not relevant to the discussion, so we can assume that we only have a single value in the table.

Let's assume we have another table:

Table 4.2 – A relationship of average heights based on gender

This table is a relationship between gender...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Building Big Data Pipelines with Apache Beam
Published in: Jan 2022Publisher: PacktISBN-13: 9781800564930

Author (1)

author image
Jan Lukavský

Jan Lukavský is a freelance big data architect and engineer who is also a committer of Apache Beam. He is a certified Apache Hadoop professional. He is working on open source big data systems combining batch and streaming data pipelines in a unified model, enabling the rise of real-time, data-driven applications.
Read more about Jan Lukavský