Building Big Data Pipelines with Apache Beam

By Jan Lukavský
    Advance your knowledge in tech with a Packt subscription

  • Instant online access to over 7,500+ books and videos
  • Constantly updated with 100+ new titles each month
  • Breadth and depth in over 1,000+ technologies

About this book

Apache Beam is an open source unified programming model to define and execute multiple data processing pipelines, including extract, transform, and load (ETL), batch, and stream processing.

This book will help you to confidently build data processing pipelines with Apache Beam. You’ll start with an overview of Apache Beam and understand how to implement basic pipelines using it. The book covers various techniques to load data, perform transformations and store the data. You will also learn how to test and run the pipelines effectively. As you progress, you will explore how to implement your own Domain Specific Language (DSL)and also get to grips with using Euphoria DSL. Later chapters will show you how to query your data using SQL before progressing to run a pipeline using a portable runner. Finally, you will learn advanced Apache Beam concepts such as IO connectors and R.

By the end of this Apache book, you will be able to confidently implement batch and streaming data pipelines using Apache Beam.

Publication date:
November 2021

About the Author

  • Jan Lukavský

    Jan Lukavský is a freelance Big Data Architect and Engineer who also works for Apache and is a contributor to Apache Beam. He is a Certified Professional for Apache Hadoop. He is working on open source Big Data systems combining batch and streaming data pipelines in a unified model enabling the rise of real-time data driven applications.

    Browse publications by this author
Building Big Data Pipelines with Apache Beam
Unlock this book and the full library for FREE
Start free trial