Reader small image

You're reading from  Snowflake Cookbook

Product typeBook
Published inFeb 2021
Reading LevelBeginner
PublisherPackt
ISBN-139781800560611
Edition1st Edition
Languages
Concepts
Right arrow
Authors (2):
Hamid Mahmood Qureshi
Hamid Mahmood Qureshi
author image
Hamid Mahmood Qureshi

Hamid Qureshi is a senior cloud and data warehouse professional with almost two decades of total experience, having architected, designed, and led the implementation of several data warehouse and business intelligence solutions. He has extensive experience and certifications across various data analytics platforms, ranging from Teradata, Oracle, and Hadoop to modern, cloud-based tools such as Snowflake. Having worked extensively with traditional technologies, combined with his knowledge of modern platforms, he has accumulated substantial practical expertise in data warehousing and analytics in Snowflake, which he has subsequently captured in his publications.
Read more about Hamid Mahmood Qureshi

Hammad Sharif
Hammad Sharif
author image
Hammad Sharif

Hammad Sharif is an experienced data architect with more than a decade of experience in the information domain, covering governance, warehousing, data lakes, streaming data, and machine learning. He has worked with a leading data warehouse vendor for a decade as part of a professional services organization, advising customers in telco, retail, life sciences, and financial industries located in Asia, Europe, and Australia during presales and post-sales implementation cycles. Hammad holds an MSc. in computer science and has published conference papers in the domains of machine learning, sensor networks, software engineering, and remote sensing.
Read more about Hammad Sharif

View More author details
Right arrow

Chapter 6: Performance and Cost Optimization

Snowflake has built-in capabilities to optimize queries and performance through various out-of-the-box features, such as caching, auto-scaling, and automatically clustering tables. However, there is always an opportunity to positively influence performance by tweaking table structures, introducing physicalization techniques, and optimizing your compute to the maximum. In this chapter, we will explore some of the techniques that can be used to make a Snowflake-based data warehouse run more efficiently and, therefore, at a lower cost. The chapter also explores optimization strategies for reducing unnecessary storage costs.

The following recipes are included in this chapter:

  • Examining table schemas and deriving an optimal structure for a table
  • Identifying query plans and bottlenecks
  • Weeding out inefficient queries through analysis
  • Identifying and reducing unnecessary Fail-safe and Time Travel storage usage
  • Projections...

Technical requirements

This chapter requires access to a modern internet browser (Chrome, Edge, Firefox, and so on) and access to the internet to connect to your Snowflake instance in the cloud.

The code for this chapter can be found at https://github.com/PacktPublishing/Snowflake-Cookbook/tree/master/Chapter06.

Examining table schemas and deriving an optimal structure for a table

This recipe walks you through analyzing a table's structure in conjunction with the data it contains and provides suggestions on optimizing the table structure.

Getting ready

This recipe uses a public S3 bucket for a sample file that is loaded into an example table to demonstrate the concepts. You will need to be connected to your Snowflake instance via the web UI or the SnowSQL client to execute this recipe successfully.

How to do it…

We will create a new table with a not-so-optimal structure and load it with sample data. Later, we will optimize the table and load it with the same data and analyze the two tables' storage differences. The steps for this recipe are as follows:

  1. We will start by creating a new database and a table that will hold the sample data:
    CREATE DATABASE C6_R1;
    CREATE TABLE CUSTOMER
    (
      CustomerID VARCHAR(100),
      FName VARCHAR(1024),
     ...

Identifying query plans and bottlenecks

Through this recipe, you will understand Snowflake's query plans and learn how to identify bottlenecks and inefficiencies by reading through the query plans.

Getting ready

You will need to be connected to your Snowflake instance via the web UI or the SnowSQL client to execute this recipe.

How to do it…

We will be running a sample query using the TPCH sample dataset that is provided with Snowflake. The intent is to run an inefficient query, review its query plan, and identify which steps are using the most compute and contributing most to the overall query execution. The steps are as follows:

  1. We will start by executing a sample query on the TPCH dataset. Now, I am running this query on the X-Small virtual warehouse, so it may take around 15–20 minutes for this query to complete. It will likely complete faster if you are using a larger virtual warehouse. Note that the sample data is present in the SNOWFLAKE_SAMPLE_DATA...

Weeding out inefficient queries through analysis

We will learn about techniques to identify possible inefficient queries through this recipe. The identified inefficient queries can then be re-designed to be more efficient.

Getting ready

You will need to be connected to your Snowflake instance via the web UI or the SnowSQL client to execute this recipe.

How to do it…

We will be querying the QUERY_HISTORY Materialized View (MV) under the SNOWFLAKE database and ACCOUNT_USAGE schema to identify queries that have taken a long time or scanned a lot of data. Based on that result set, we can identify which queries are potentially inefficient. The steps for this recipe are as follows:

  1. We will start by simply selecting all rows from the QUERY_HISTORY view and order them by the time taken to execute:
    USE ROLE ACCOUNTADMIN;
    USE SNOWFLAKE;
    SELECT QUERY_ID, QUERY_TEXT, EXECUTION_TIME,USER_NAME 
    FROM SNOWFLAKE.ACCOUNT_USAGE.query_history 
    ORDER BY EXECUTION_TIME DESC;

    You...

Identifying and reducing unnecessary Fail-safe and Time Travel storage usage

Through this recipe, we will learn how to identify tables that may be used for ETL-like workloads and therefore do not need Fail-safe and Time Travel storage capabilities. Such tables can be altered to remove Fail-safe and Time Travel storage, resulting in lower overall storage costs.

Getting ready

You will need to be connected to your Snowflake instance via the web UI or the SnowSQL client to execute this recipe.

How to do it…

We will simulate a fictitious ETL process in which we use a temporary table for holding some data. Data from the interim table is then processed and aggregated into a target table. Once the target table is loaded, the ETL process deletes the data from the temporary table. The purpose of this is to explain what the best table type for interim ETL tables is. The steps for this recipe are as follows:

  1. We will start by creating a new database and a table that will...

Projections in Snowflake for performance

Snowflake offers the concept of MVs for optimizing different access patterns. MVs allow disconnecting the table design from evolving access paths. This recipe shall provide you with guidance on using MVs, their limitations, and their implications.

Getting ready

This recipe shows how Snowflake MVs can be constructed from a table and how query latency can be reduced. Note that these steps can be run in either the Snowflake web UI or the SnowSQL command-line client.

How to do it…

Let's start by creating a table in a database, followed by generating a large dataset to demonstrate how MVs improve efficiency. The steps for this recipe are as follows:

  1. We will start by creating a new database:
    CREATE DATABASE C6_R5;

    The database should be created successfully.

  2. Moreover, we shall execute a configuration change for the following steps so that Snowflake does not use caching:
    ALTER SESSION SET USE_CACHED_RESULT=FALSE;
  3. ...

Reviewing query plans to modify table clustering

Snowflake provides the option to configure clustering keys for tables so that larger tables can benefit from partition pruning. This recipe will analyze query plans in conjunction with table structures and identify whether a new clustering key will improve the query performance.

Getting ready

The steps in this recipe can be run either in the Snowflake web UI or the SnowSQL command-line client.

How to do it…

Let's start by creating and populating a table in Snowflake. We will simulate data being inserted into the table regularly, resulting in an increased size on disk and an increased number of partitions. The steps for this recipe are as follows:

  1. Create a new database, followed by the creation of a table that will hold the transaction data:
    CREATE DATABASE C6_R6;
    CREATE TABLE TRANSACTIONS
    (
      TXN_ID STRING,
      TXN_DATE DATE,
      CUSTOMER_ID STRING,
      QUANTITY DECIMAL...

Optimizing virtual warehouse scale

This recipe will explore how we can expand the number of concurrent queries that a virtual warehouse can process and identify the optimal sizing for your virtual warehouses. This entails analyzing query usage for each virtual warehouse and identifying whether the warehouse can process an additional load of queries concurrently or is limited. If it is struggling to match the demand from the workload, we will see how we can resize the cluster or virtual warehouse to an optimal size from a processing and billing perspective.

Getting ready

In this recipe, we will use Snowflake's web UI to show how to use different tools to put a load on the warehouse. We intend to use benchmarking queries in this case, which are available with Snowflake. Secondly, we shall be exploring analytics provided within the web UI to help understand the workload and actions that we can take based on the analysis.

How to do it…

Let's start with the...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Snowflake Cookbook
Published in: Feb 2021Publisher: PacktISBN-13: 9781800560611
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
Hamid Mahmood Qureshi

Hamid Qureshi is a senior cloud and data warehouse professional with almost two decades of total experience, having architected, designed, and led the implementation of several data warehouse and business intelligence solutions. He has extensive experience and certifications across various data analytics platforms, ranging from Teradata, Oracle, and Hadoop to modern, cloud-based tools such as Snowflake. Having worked extensively with traditional technologies, combined with his knowledge of modern platforms, he has accumulated substantial practical expertise in data warehousing and analytics in Snowflake, which he has subsequently captured in his publications.
Read more about Hamid Mahmood Qureshi

author image
Hammad Sharif

Hammad Sharif is an experienced data architect with more than a decade of experience in the information domain, covering governance, warehousing, data lakes, streaming data, and machine learning. He has worked with a leading data warehouse vendor for a decade as part of a professional services organization, advising customers in telco, retail, life sciences, and financial industries located in Asia, Europe, and Australia during presales and post-sales implementation cycles. Hammad holds an MSc. in computer science and has published conference papers in the domains of machine learning, sensor networks, software engineering, and remote sensing.
Read more about Hammad Sharif