Reader small image

You're reading from  Data Engineering with Google Cloud Platform

Product typeBook
Published inMar 2022
Reading LevelBeginner
PublisherPackt
ISBN-139781800561328
Edition1st Edition
Languages
Right arrow
Author (1)
Adi Wijaya
Adi Wijaya
author image
Adi Wijaya

Adi Widjaja is a strategic cloud data engineer at Google. He holds a bachelor's degree in computer science from Binus University and co-founded DataLabs in Indonesia. Currently, he dedicates himself to big data and analytics and has spent a good chunk of his career helping global companies in different industries.
Read more about Adi Wijaya

Right arrow

Preface

There is too much information; too many plans; it's complicated. We live in a world where there is more and more information that is as problematic as too little information, and I'm aware that this condition applies when people want to start doing data engineering in the cloud, specifically Google Cloud Platform (GCP) in this book.

When people want to embark on a career in data, there are so many different roles whose definitions sometimes vary from one company to the next.

When someone chooses to be a data engineer, there are a great number of technology options: cloud versus non-cloud; big data database versus traditional; self-managed versus a managed service; and many more.

When they decide to use the cloud on GCP, the public documentation contains a wide variety of product options and tutorials.

In this book, instead of adding further dimensions to the data engineering and GCP products, the main goal of this book is to help you narrow down the information. This book will help you narrow down all the important concepts and components from the vast array of information available on the internet. The guidance and exercises are based on the writer's experience in the field, and will give you a clear focus. By reading the book and following the exercises, you will learn the most relevant and clear path to start and boost your career in data engineering using GCP.

Who this book is for

This book is intended for anyone involved in the data and analytics space, including IT developers, data analysts, data scientists, or any other relevant roles where an individual wants to gain a jump start in the data engineering field.

This book is also intended for data engineers who want to start using GCP, prepare certification, and get practical examples based on real-world scenarios.

Finally, this book will be of interest to anyone who wants to know the thought process, have practical guidance, and a clear path to run through the technology components to be able to start, achieve the certification, and gain a practical perspective in data engineering with GCP.

What this book covers

This book is divided into 3 sections and 12 chapters. Each section is a collection of independent chapters that have one objective:

Chapter 1, Fundamentals of Data Engineering, explains the role of data engineers and how data engineering relates to GCP.

Chapter 2, Big Data Capabilities on GCP, introduces the relevant GCP services related to data engineering.

Chapter 3, Building a Data Warehouse in BigQuery, covers the data warehouse concept using BigQuery.

Chapter 4, Building Orchestration for Batch Data Loading Using Cloud Composer, explains data orchestration using Cloud Composer.

Chapter 5, Building a Data Lake Using Dataproc, details the Data Lake concept with Hadoop using DataProc.

Chapter 6, Processing Streaming Data with Pub/Sub and Dataflow, explains the concept of streaming data using Pub/Sub and Dataflow.

Chapter 7, Visualizing Data for Making Data-Driven Decisions with Data Studio, covers how to use data from BigQuery to visualize it as charts in Data Studio.

Chapter 8, Building Machine Learning Solutions on Google Cloud Platform, sets out the concept of MLOps using Vertex AI.

Chapter 9, G User and Project Management in GCP, explains the fundamentals of GCP Identity and Access Management and project structures.

Chapter 10, Cost Strategy in GCP, covers how to estimate the overall data solution using GCP.

Chapter 11, CI/CD on Google Cloud Platform for Data Engineers, explains the concept of CI/CD and its relevance to data engineers.

Chapter 12, Boosting Your Confidence as a Data Engineer, prepares you for the GCP certification and offers some final thoughts in terms of summarizing what's been learned in this book.

To get the most out of this book

To successfully follow the examples in this book, you need a GCP account and project. If, at this point, you don't have a GCP account and project, don't worry. We will cover that as part of the exercises in this book.

Occasionally, we will use the free tier from GCP for practice, but be aware that some products might not have free tiers. Notes will be provided if this is the case.

All the exercises in this book can be completed without any additional software installation. The exercises will be done in the GCP console that you can open from any operating system using your favorite browser.

You should be familiar with basic programming languages. In this book, I will focus on utilizing Python and the Linux command line.

If you are using the digital version of this book, we advise you to type the code yourself or access the code from the book's GitHub repository (a link is available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.

This book is not positioned to replace GCP public documentation. Hence, comprehensive information on every single feature of GCP services might not be available in this book. We also won't use all the GCP services that are available. For such information, you can always check the public documentation.

Remember that the main goal of this book is to help you narrow down information. Use this book as your step-by-step guide to build solutions to common challenges facing data engineers. Follow the patterns from the exercises, the relationship between concepts, important GCP services, and best practices. Always use the hands-on exercises so you can experience working with GCP.

Download the example code files

You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/Data-Engineering-with-Google-Cloud-Platform. If there's an update to the code, it will be updated in the GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots and diagrams used in this book. You can download it here: https://static.packt-cdn.com/downloads/9781800561328_ColorImages.pdf.

Conventions used

There are a number of text conventions used throughout this book.

Code in text: Indicates code words in the text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "Mount the downloaded WebStorm-10*.dmg disk image file as another disk in your system."

A block of code is set as follows:

html, body, #map {
 height: 100%; 
 margin: 0;
 padding: 0
}

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

[default]
exten => s,1,Dial(Zap/1|30)
exten => s,2,Voicemail(u100)
exten => s,102,Voicemail(b100)
exten => i,1,Voicemail(s0)

Any command-line input or output is written as follows:

$ mkdir css
$ cd css

Bold: Indicates a new term, an important word, or words that you see on screen. For instance, words in menus or dialog boxes appear in bold. Here is an example: "Select System info from the Administration panel."

Tips or Important Notes

Appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, email us at customercare@packtpub.com and mention the book title in the subject of your message.

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata and fill in the form.

Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at copyright@packt.com with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Share Your Thoughts

Once you’ve read Data Engineering with Google Cloud Platform, we’d love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.

Your review is important to us and the tech community and will help us make sure we’re delivering excellent quality content.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Data Engineering with Google Cloud Platform
Published in: Mar 2022Publisher: PacktISBN-13: 9781800561328
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Adi Wijaya

Adi Widjaja is a strategic cloud data engineer at Google. He holds a bachelor's degree in computer science from Binus University and co-founded DataLabs in Indonesia. Currently, he dedicates himself to big data and analytics and has spent a good chunk of his career helping global companies in different industries.
Read more about Adi Wijaya