Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Data Engineering with Google Cloud Platform

You're reading from  Data Engineering with Google Cloud Platform

Product type Book
Published in Mar 2022
Publisher Packt
ISBN-13 9781800561328
Pages 440 pages
Edition 1st Edition
Languages
Author (1):
Adi Wijaya Adi Wijaya
Profile icon Adi Wijaya

Table of Contents (17) Chapters

Preface Section 1: Getting Started with Data Engineering with GCP
Chapter 1: Fundamentals of Data Engineering Chapter 2: Big Data Capabilities on GCP Section 2: Building Solutions with GCP Components
Chapter 3: Building a Data Warehouse in BigQuery Chapter 4: Building Orchestration for Batch Data Loading Using Cloud Composer Chapter 5: Building a Data Lake Using Dataproc Chapter 6: Processing Streaming Data with Pub/Sub and Dataflow Chapter 7: Visualizing Data for Making Data-Driven Decisions with Data Studio Chapter 8: Building Machine Learning Solutions on Google Cloud Platform Section 3: Key Strategies for Architecting Top-Notch Data Pipelines
Chapter 9: User and Project Management in GCP Chapter 10: Cost Strategy in GCP Chapter 11: CI/CD on Google Cloud Platform for Data Engineers Chapter 12: Boosting Your Confidence as a Data Engineer Other Books You May Enjoy

Chapter 10: Cost Strategy in GCP

This chapter will cover one of the most frequently asked questions from stakeholders – the solution's cost. Each GCP service has different pricing mechanisms. In this chapter, we will look at what valuable information you will need for calculating cost.

On top of that, we will have a section dedicated to BigQuery. We will discuss the difference between two options for the BigQuery pricing models – on-demand and flat-rate. Finally, we will revisit the BigQuery features for partitioned and clustered tables. Understanding these features can optimize a lot of your future costs in BigQuery.

The following topics will be covered in this chapter: 

  • Estimating the cost of your end-to-end data solution in GCP
  • Tips for optimizing BigQuery using partitioned and clustered tables

Technical requirements

For this chapter's exercises, we will be using BigQuery and GCP pricing calculators from the internet that you can open using any browser.

Estimating the cost of your end-to-end data solution in GCP

While trying out the exercises in this book, we've briefly discussed the cost that might be incurred when using various GCP services. You may have already been billed by some of the resources, so you may be wondering, "How much will it cost in a full production system?" This question is important and is often asked by stakeholders. 

As a data engineer, it will be great if you can estimate the end-to-end data solution cost upfront. To estimate the cost in GCP, first, we need to understand that not all GCP services use the same pricing calculation. 

Note

The pricing model that's described in this book is based on the latest information at the time of writing. Google Cloud can change the pricing model for any service at any time.

There are three types of pricing models:

  • Machine (VM)-based: There are GCP resources that are billed with the machines. The bills that are generated...

Tips for optimizing BigQuery using partitioned and clustered tables 

BigQuery tables can store data from zero bytes to petabytes of data. There will be no difference between creating a small-sized table or a large-sized table. To simplify the context and for illustration purposes only, let's say a small-sized table ranges from KBs to 100 GB. The large-sized tables range from 100 GB to PBs of data. Technically, both tables are the same, but if you think about optimizing performance and cost, we can configure the tables using two features called BigQuery partitioned table and BigQuery clustered table

These features are helpful for both on-demand and flat-rate pricing. In the on-demand pricing, the features will cut the billed bytes and will reduce the overall cost that is calculated from the billed bytes. With flat-rate pricing, it doesn't affect it directly. Remember that the cost of flat-rate pricing is flat per period. But when you're using features,...

Summary

In this chapter, we learned about two different things. First, we learned about how to estimate the end-to-end data solution cost. Second, we understood how BigQuery partitioned and clustered tables play a significant role in the cost. 

These two topics are usually needed by data engineers in different situations. Understanding how to calculate the cost will help in the early stages of GCP implementation. This is usually a particularly important step for an organization to decide the future solution for the whole organization. 

The second topic usually occurs when you're designing BigQuery tables and at a time when you need to evaluate the running BigQuery solution. Even though it's obvious that using partitioned and clustered tables is beneficial, it's not a surprise in a big organization as many tables are not optimized and can be improved. 

Lastly, we performed an experiment using the three different tables. It proved that using the...

lock icon The rest of the chapter is locked
You have been reading a chapter from
Data Engineering with Google Cloud Platform
Published in: Mar 2022 Publisher: Packt ISBN-13: 9781800561328
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}