Reader small image

You're reading from  Data Engineering with Google Cloud Platform - Second Edition

Product typeBook
Published inApr 2024
PublisherPackt
ISBN-139781835080115
Edition2nd Edition
Right arrow
Author (1)
Adi Wijaya
Adi Wijaya
author image
Adi Wijaya

Adi Widjaja is a strategic cloud data engineer at Google. He holds a bachelor's degree in computer science from Binus University and co-founded DataLabs in Indonesia. Currently, he dedicates himself to big data and analytics and has spent a good chunk of his career helping global companies in different industries.
Read more about Adi Wijaya

Right arrow

Preface

There is too much information – too many options and plans. Things are complicated. We live in a world where there is more and more information, which can be as problematic as too little information, and I’m aware that this condition applies when people want to start doing data engineering in the cloud, specifically Google Cloud Platform (GCP) in this book.

When people want to embark on a career in data, there are so many different roles whose definitions sometimes vary from one company to the next.

When someone chooses to be a data engineer, there are many technology options – cloud versus non-cloud, a big data database versus a transactional database, a self-managed service versus a managed service, and so on.

Upon deciding to use the cloud on GCP, you will find that the public documentation contains a wide variety of product options and tutorials.

In this book, instead of adding further dimensions to the data engineering and GCP products, the main goal is to help you narrow down the information. This book will help you focus on all the important concepts and components from the vast array of information available on the internet.

The guidance and exercises are based on the writer’s experience in the field. By reading the book and following exercises, you will learn the most relevant and clear path to start and boost your career in data engineering using GCP.

Readers of the first edition have consistently reported using the book to successfully achieve GCP certification and launch careers as GCP data engineers. Building on this proven approach, in this second edition, we’ve updated all information to reflect the latest GCP landscape, while preserving the book’s successful format.

Who this book is for

This book is intended for anyone involved in the data and analytics space, including IT developers, data analysts, data scientists, or any other relevant role that an individual wants to jump into in the data engineering field.

This book is also intended for data engineers who want to start using GCP, prepare for certification, and get practical examples based on real-world scenarios.

Finally, this book will be of interest to anyone who wants to know the thought process, have practical guidance, and have a clear path to run through the technology components to be able to start, achieve the certification, and gain a practical perspective in data engineering with GCP.

What this book covers

This book is divided into 3 parts and 13 chapters. Each part is a collection of independent chapters that have one objective.

Chapter 1, Fundamentals of Data Engineering, explains the role of data engineers and how data engineering relates to GCP.

Chapter 2, Big Data Capabilities on GCP, introduces the relevant GCP services related to data engineering.

Chapter 3, Building a Data Warehouse in BigQuery, covers the data warehouse concepts using BigQuery.

Chapter 4, Building Workflows for Batch Data Loading Using Cloud Composer, explains data orchestration using Cloud Composer.

Chapter 5, Building a Data Lake Using Dataproc, details the data lake concept with Hadoop using Dataproc.

Chapter 6, Processing Streaming Data with Pub/Sub and Dataflow, explains the concept of streaming data using Pub/Sub and Dataflow.

Chapter 7, Visualizing Data to Make Data-Driven Decisions with Looker Studio, covers how to utilize data from BigQuery to visualize it as charts in Looker Studio.

Chapter 8, Building Machine Learning Solutions on GCP, sets out the concepts of MLOps using Vertex AI.

Chapter 9, User and Project Management in GCP, explains the fundamentals of GCP identity and access management as well as GCP project structures.

Chapter 10, Data Governance in GCP, explains the concept of data governance and how to utilize Dataplex and Dataform to implement some of the foundations.

Chapter 11, Cost Strategy in GCP, covers how to estimate an overall data solution using GCP.

Chapter 12, CI/CD on GCP for Data Engineers, explains the concept of CI/CD and its relevance to data engineers.

Chapter 13, Boosting Your Confidence as a Data Engineer, prepares you for the GCP certification and offers some final thoughts in terms of summarizing what’s been learned in this book.

To get the most out of this book

To successfully follow the examples in this book, you need a GCP account and project. If, at this point, you don’t have a GCP account and project, don’t worry. We will cover that as part of the exercises in this book.

Occasionally, we will use the free tier from GCP for practice, but be aware that some products might not have free tiers. Notes will be provided if this is the case.

All the exercises in this book can be completed without any additional software installation. The exercises will be done in the GCP console, which you can open from any operating system using your favorite browser.

You should be familiar with basic programming languages. In this book, I will focus on utilizing Python and the Linux command line.

If you are using the digital version of this book, we advise you to type the code yourself or access the code from the book’s GitHub repository (a link is available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.

Always remember that this book is not positioned to replace GCP public documentation. Hence, comprehensive information on every single feature of GCP services might not be available in this book. We also won’t use all the GCP services that are available. For such information, you can always check the public documentation.

Remember that the main goal of this book is to help you narrow down information. Use this book as your step-by-step guide to build solutions to common challenges facing data engineers. Follow the patterns from the exercises, the relationship between concepts, important GCP services, and best practices. Always use the hands-on exercises so that you can experience working with GCP.

Download the example code files

You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/Data-Engineering-with-Google-Cloud-Platform-Second-Edition. If there’s an update to the code, it will be updated in the GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Conventions used

There are a number of text conventions used throughout this book.

Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: “To do that, find the credit_card_default table in the bigquery-public-data project, under the ml_datasets dataset.”

A block of code is set as follows:

random_forest_classifier = RandomForestClassifier(n_estimators=100)
random_forest_classifier.fit(X_train,y_train)

Any command-line input or output is written as follows:

$ pip install -r requirements

Bold: Indicates a new term, an important word, or words that you see on screen. For instance, words in menus or dialog boxes appear in bold. Here is an example: “First, you must create a Vertex AI notebook – either Colab Enterprise or Workbench.”

Tips or important notes

Appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, email us at customercare@packtpub.com and mention the book title in the subject of your message.

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata and fill in the form.

Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at copyright@packt.com with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Share Your Thoughts

Once you’ve read Data Engineering With Google Cloud Platform, we’d love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.

Your review is important to us and the tech community and will help us make sure we’re delivering excellent quality content.

Download a free PDF copy of this book

Thanks for purchasing this book!

Do you like to read on the go but are unable to carry your print books everywhere?

Is your eBook purchase not compatible with the device of your choice?

Don’t worry, now with every Packt book you get a DRM-free PDF version of that book at no cost.

Read anywhere, any place, on any device. Search, copy, and paste code from your favorite technical books directly into your application.

The perks don’t stop there, you can get exclusive access to discounts, newsletters, and great free content in your inbox daily

Follow these simple steps to get the benefits:

  1. Scan the QR code or visit the link below

https://packt.link/free-ebook/9781835080115

  1. Submit your proof of purchase
  2. That’s it! We’ll send your free PDF and other benefits to your email directly
lock icon
The rest of the chapter is locked
You have been reading a chapter from
Data Engineering with Google Cloud Platform - Second Edition
Published in: Apr 2024Publisher: PacktISBN-13: 9781835080115
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Adi Wijaya

Adi Widjaja is a strategic cloud data engineer at Google. He holds a bachelor's degree in computer science from Binus University and co-founded DataLabs in Indonesia. Currently, he dedicates himself to big data and analytics and has spent a good chunk of his career helping global companies in different industries.
Read more about Adi Wijaya