You're reading from Data Engineering with Google Cloud Platform - Second Edition

Product type Book

Published in Apr 2024

Publisher Packt

ISBN-13 9781835080115

Pages 476 pages

Edition 2nd Edition

Languages

Concepts

Data Engineering

Author (1):

Adi Wijaya

Table of Contents (19) Chapters

Preface

1. Part 1: Getting Started with Data Engineering with GCP

2. Chapter 1: Fundamentals of Data Engineering

3. Chapter 2: Big Data Capabilities on GCP

4. Part 2: Build Solutions with GCP Components

5. Chapter 3: Building a Data Warehouse in BigQuery

6. Chapter 4: Building Workflows for Batch Data Loading Using Cloud Composer

7. Chapter 5: Building a Data Lake Using Dataproc

8. Chapter 6: Processing Streaming Data with Pub/Sub and Dataflow

9. Chapter 7: Visualizing Data to Make Data-Driven Decisions with Looker Studio

10. Chapter 8: Building Machine Learning Solutions on GCP

11. Part 3: Key Strategies for Architecting Top-Notch Solutions

12. Chapter 9: User and Project Management in GCP

13. Chapter 10: Data Governance in GCP

14. Chapter 11: Cost Strategy in GCP

15. Chapter 12: CI/CD on GCP for Data Engineers

16. Chapter 13: Boosting Your Confidence as a Data Engineer

17. Index

Why subscribe?

18. Other Books You May Enjoy

Exercise – deploying a dummy workflow with Vertex AI Pipelines

Before we continue with the hands-on exercise, let’s understand what Vertex AI Pipelines is. Vertex AI Pipelines is a tool for orchestrating ML workflows. Under the hood, it uses an open source tool called Kubeflow Pipeline. Like the relationship between Airflow and Cloud Composer or Hadoop and Dataproc, to understand Vertex AI Pipelines, we need to be familiar with Kubeflow Pipelines.

Kubeflow Pipelines is a platform for building and deploying portable, scalable ML workflows based on Docker containers. Using containers for ML workflows is particularly important compared to data workflows. For example, in data workflows, it’s typical to load the BigQuery, GCS, and pandas libraries for all the steps. Those libraries will be used in the upstream to downstream steps. In ML, the upstream process is data loading; the other step is building models that need specific libraries, such as TensorFlow or scikit...