Reader small image

You're reading from  Data Engineering with Google Cloud Platform - Second Edition

Product typeBook
Published inApr 2024
PublisherPackt
ISBN-139781835080115
Edition2nd Edition
Right arrow
Author (1)
Adi Wijaya
Adi Wijaya
author image
Adi Wijaya

Adi Widjaja is a strategic cloud data engineer at Google. He holds a bachelor's degree in computer science from Binus University and co-founded DataLabs in Indonesia. Currently, he dedicates himself to big data and analytics and has spent a good chunk of his career helping global companies in different industries.
Read more about Adi Wijaya

Right arrow

User and Project Management in GCP

In this chapter, we will learn how to design and structure users and projects in Google Cloud Platform (GCP). By understanding user and project management in GCP, you will learn how to turn a development solution into a production-ready one.

In a production-ready solution, it’s very important to manage security by only allowing access to the right users. However, to do it efficiently, we need to understand the principle and strategy.

Managing production-ready solutions is almost impossible without understanding how a GCP project works. Understanding how to design GCP projects is another important aspect of an efficient solution.

In addition, this chapter will also include an example approach to provision GCP’s services automatically using an infrastructure-building tool, Terraform.

Specifically, in this chapter, we will cover the following topics:

  • Understanding Identity and Access Management (IAM) in GCP
  • Planning...

Technical requirements

In this chapter’s exercises, we will use the following GCP services:

  • IAM
  • BigQuery
  • Google Cloud Storage (GCS)

If you’ve never opened any of these services in your GCP console, open them and enable the application programming interface (API). We will also use an open source software called Terraform to help us provision the GCP services using code. It can be downloaded from their public website at https://www.terraform.io/downloads.html. The step-by-step installation will be discussed in the Exercise – creating and running basic Terraform scripts section.

Make sure you have your GCP console, Cloud Shell, and Cloud Shell Editor ready.

Download the example code and the dataset from https://github.com/PacktPublishing/Data-Engineering-with-Google-Cloud-Platform-Second-Edition/tree/main/chapter-9/code.

Understanding IAM in GCP

IAM is a central manager that manages who can access what – in other words, authorization. IAM manages all authorization within GCP. The concept is simple – you grant roles to accounts so that the accounts have the required permission to access specific GCP services. Here is a diagram for an account that needs to query a table in BigQuery:

Figure 9.1 – IAM roles, permissions, and GCP service correlation

Figure 9.1 – IAM roles, permissions, and GCP service correlation

In the example shown in the preceding diagram, to access a BigQuery table, an account needs, at a minimum, two roles: data viewer and job user. These roles contain multiple permissions to specifically perform an operation in BigQuery.

Let’s go through each of the important terms that we use in the IAM space:

  • Account: An account in GCP can be divided into two – a user account and a service account:
    • User account: This is the user email. It can be corporate email or personal email, depending...

Planning a GCP project structure

After practicing a lot of exercises from the previous chapters, I believe you have become familiar with GCP. From those exercises, you’ve learned about GCP services, their positioning, and how to use them. In this section, we will take a step back and look at those GCP services from a higher-level point of view.

In all the previous exercises throughout this book, we used only one project. All the GCP services, including BigQuery, GCS buckets, Cloud Composer, and the other services that we used, are enabled and provisioned in one project. For me, I have a project called packt-gcp-data-eng. The same from your side – you must have your own project, either using the default project or a new one that we created in Chapter 2, Big Data Capabilities on GCP. That’s a good enough starting point for learning and development, but in reality, an organization usually has more than one project. There are many scenarios and variations on how...

Understanding the GCP organization, folder, and project hierarchy

A GCP project organizes all your Google Cloud resources. Resources in GCP can be services, billing, accounts, authentications, logs, and monitoring. Resources from one project can be used and accessed by other resources from other projects. So long as the permissions to resources are set correctly, there is no restriction on accessing them between projects.

For example, look at Figure 9.3. The cloud SQL database from the core-apps-and-db project can be accessed by Cloud Composer in dwh-project. Let’s look at another example – a user account that was created in the core-apps-and-db project can access data from BigQuery in the data project. Note that accounts and authentications are also resources. The key point here is that resources in GCP projects are not isolated.

Now, let’s talk about the GCP folder. One GCP folder can contain one to many GCP projects. GCP folders can also contain one to...

Controlling user access to our data warehouse

Now that we’ve learned about user access at the organization, folder, and project levels, we will look specifically at access control lists (ACLs) in BigQuery. An ACL is the same concept as IAM, but the ACL terminology is more commonly used when talking about the data space. Planning an ACL in BigQuery means planning who can access what in BigQuery.

At a very high level, there are two main types of GCP permission in BigQuery, as follows:

  • Job permissions: BigQuery has job-level permissions. For example, for a user to be able to run a query inside the project, they need bigquery.jobs.create.

    Note that being able to run a query job doesn’t mean having access to the data. Access to the data is managed by the other permissions, which will be explained next.

  • Access permissions: This one is a little bit more complicated compared to job permissions. If we talk about data access, we need to understand that the main goal...

Practicing the concept of IaC using Terraform

IaC is the process of provisioning and managing resources using code. In our GCP case, the resources can be the GCP project, BigQuery datasets, GCS buckets, IAM, and all other resources that we’ve learned about throughout this book.

So far, we’ve created our resources using the GCP console’s user interface (UI) or the gcloud command. Imagine that you need to do that manually one by one using the UI for hundreds to thousands of objects throughout a large organization. That can be very painful – not only from a provisioning point of view but also in terms of managing it.

The common issues without the IaC approach are missing consistency, such as naming conventions, forgetting to configure some parameters, such as location, and losing track of resources that have been created.

With an IaC approach, we can use code to provision our resources. The advantage of using code is that you can implement software...

Summary

In this chapter, we covered three important topics in GCP – namely, IAM, project structure, and BigQuery ACLs. Additionally, we learned about IaC.

Understanding these four topics lifts your knowledge from being a data engineer to becoming a cloud data architect. People with these skills can think not only about the data pipeline but also the higher-level architecture, which is a very important role in any organization.

Always remember the principle of least privilege, which is the foundation for architecting all the topics of IAM, project structure, and BigQuery ACLs. Always make sure you only give the right access to the right user.

In the next chapter, you’ll discover how data governance using GCP services can unlock the full potential of your data, ensuring usability, security, and accountability.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Data Engineering with Google Cloud Platform - Second Edition
Published in: Apr 2024Publisher: PacktISBN-13: 9781835080115
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Adi Wijaya

Adi Widjaja is a strategic cloud data engineer at Google. He holds a bachelor's degree in computer science from Binus University and co-founded DataLabs in Indonesia. Currently, he dedicates himself to big data and analytics and has spent a good chunk of his career helping global companies in different industries.
Read more about Adi Wijaya