You're reading from Snowflake Cookbook

Product typeBook

Published inFeb 2021

Reading LevelBeginner

PublisherPackt

ISBN-139781800560611

Edition1st Edition

Languages

Python

Tools

Cloud Foundry

Concepts

Data Science

Authors (2):

Hamid Mahmood Qureshi

Hammad Sharif

View More author details

Chapter 6: Performance and Cost Optimization

Snowflake has built-in capabilities to optimize queries and performance through various out-of-the-box features, such as caching, auto-scaling, and automatically clustering tables. However, there is always an opportunity to positively influence performance by tweaking table structures, introducing physicalization techniques, and optimizing your compute to the maximum. In this chapter, we will explore some of the techniques that can be used to make a Snowflake-based data warehouse run more efficiently and, therefore, at a lower cost. The chapter also explores optimization strategies for reducing unnecessary storage costs.

The following recipes are included in this chapter:

Examining table schemas and deriving an optimal structure for a table
Identifying query plans and bottlenecks
Weeding out inefficient queries through analysis
Identifying and reducing unnecessary Fail-safe and Time Travel storage usage
Projections...

Technical requirements

This chapter requires access to a modern internet browser (Chrome, Edge, Firefox, and so on) and access to the internet to connect to your Snowflake instance in the cloud.

The code for this chapter can be found at https://github.com/PacktPublishing/Snowflake-Cookbook/tree/master/Chapter06.

Examining table schemas and deriving an optimal structure for a table

This recipe walks you through analyzing a table's structure in conjunction with the data it contains and provides suggestions on optimizing the table structure.

Getting ready

This recipe uses a public S3 bucket for a sample file that is loaded into an example table to demonstrate the concepts. You will need to be connected to your Snowflake instance via the web UI or the SnowSQL client to execute this recipe successfully.

How to do it…

We will create a new table with a not-so-optimal structure and load it with sample data. Later, we will optimize the table and load it with the same data and analyze the two tables' storage differences. The steps for this recipe are as follows:

We will start by creating a new database and a table that will hold the sample data:

CREATE DATABASE C6_R1;
CREATE TABLE CUSTOMER
(
  CustomerID VARCHAR(100),
  FName VARCHAR(1024),
 ...

Identifying query plans and bottlenecks

Through this recipe, you will understand Snowflake's query plans and learn how to identify bottlenecks and inefficiencies by reading through the query plans.

Getting ready

You will need to be connected to your Snowflake instance via the web UI or the SnowSQL client to execute this recipe.

How to do it…

We will be running a sample query using the TPCH sample dataset that is provided with Snowflake. The intent is to run an inefficient query, review its query plan, and identify which steps are using the most compute and contributing most to the overall query execution. The steps are as follows:

We will start by executing a sample query on the TPCH dataset. Now, I am running this query on the X-Small virtual warehouse, so it may take around 15–20 minutes for this query to complete. It will likely complete faster if you are using a larger virtual warehouse. Note that the sample data is present in the SNOWFLAKE_SAMPLE_DATA...

Weeding out inefficient queries through analysis

We will learn about techniques to identify possible inefficient queries through this recipe. The identified inefficient queries can then be re-designed to be more efficient.

Getting ready

You will need to be connected to your Snowflake instance via the web UI or the SnowSQL client to execute this recipe.

How to do it…

We will be querying the QUERY_HISTORY Materialized View (MV) under the SNOWFLAKE database and ACCOUNT_USAGE schema to identify queries that have taken a long time or scanned a lot of data. Based on that result set, we can identify which queries are potentially inefficient. The steps for this recipe are as follows:

We will start by simply selecting all rows from the QUERY_HISTORY view and order them by the time taken to execute:

USE ROLE ACCOUNTADMIN;
USE SNOWFLAKE;
SELECT QUERY_ID, QUERY_TEXT, EXECUTION_TIME,USER_NAME 
FROM SNOWFLAKE.ACCOUNT_USAGE.query_history 
ORDER BY EXECUTION_TIME DESC;

You...

Identifying and reducing unnecessary Fail-safe and Time Travel storage usage

Through this recipe, we will learn how to identify tables that may be used for ETL-like workloads and therefore do not need Fail-safe and Time Travel storage capabilities. Such tables can be altered to remove Fail-safe and Time Travel storage, resulting in lower overall storage costs.

Getting ready

You will need to be connected to your Snowflake instance via the web UI or the SnowSQL client to execute this recipe.

How to do it…

We will simulate a fictitious ETL process in which we use a temporary table for holding some data. Data from the interim table is then processed and aggregated into a target table. Once the target table is loaded, the ETL process deletes the data from the temporary table. The purpose of this is to explain what the best table type for interim ETL tables is. The steps for this recipe are as follows:

We will start by creating a new database and a table that will...

Projections in Snowflake for performance

Snowflake offers the concept of MVs for optimizing different access patterns. MVs allow disconnecting the table design from evolving access paths. This recipe shall provide you with guidance on using MVs, their limitations, and their implications.

Getting ready

This recipe shows how Snowflake MVs can be constructed from a table and how query latency can be reduced. Note that these steps can be run in either the Snowflake web UI or the SnowSQL command-line client.

How to do it…

Let's start by creating a table in a database, followed by generating a large dataset to demonstrate how MVs improve efficiency. The steps for this recipe are as follows:

We will start by creating a new database:
```
CREATE DATABASE C6_R5;
```
The database should be created successfully.
Moreover, we shall execute a configuration change for the following steps so that Snowflake does not use caching:
```
ALTER SESSION SET USE_CACHED_RESULT=FALSE;
```

Reviewing query plans to modify table clustering

Snowflake provides the option to configure clustering keys for tables so that larger tables can benefit from partition pruning. This recipe will analyze query plans in conjunction with table structures and identify whether a new clustering key will improve the query performance.

Getting ready

The steps in this recipe can be run either in the Snowflake web UI or the SnowSQL command-line client.

How to do it…

Let's start by creating and populating a table in Snowflake. We will simulate data being inserted into the table regularly, resulting in an increased size on disk and an increased number of partitions. The steps for this recipe are as follows:

Create a new database, followed by the creation of a table that will hold the transaction data:

CREATE DATABASE C6_R6;
CREATE TABLE TRANSACTIONS
(
  TXN_ID STRING,
  TXN_DATE DATE,
  CUSTOMER_ID STRING,
  QUANTITY DECIMAL...

Optimizing virtual warehouse scale

This recipe will explore how we can expand the number of concurrent queries that a virtual warehouse can process and identify the optimal sizing for your virtual warehouses. This entails analyzing query usage for each virtual warehouse and identifying whether the warehouse can process an additional load of queries concurrently or is limited. If it is struggling to match the demand from the workload, we will see how we can resize the cluster or virtual warehouse to an optimal size from a processing and billing perspective.

Getting ready

In this recipe, we will use Snowflake's web UI to show how to use different tools to put a load on the warehouse. We intend to use benchmarking queries in this case, which are available with Snowflake. Secondly, we shall be exploring analytics provided within the web UI to help understand the workload and actions that we can take based on the analysis.

How to do it…

Let's start with the...

The rest of the chapter is locked

You have been reading a chapter from

Snowflake Cookbook

Published in: Feb 2021Publisher: PacktISBN-13: 9781800560611

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (2)

Hamid Mahmood Qureshi

Hamid Qureshi is a senior cloud and data warehouse professional with almost two decades of total experience, having architected, designed, and led the implementation of several data warehouse and business intelligence solutions. He has extensive experience and certifications across various data analytics platforms, ranging from Teradata, Oracle, and Hadoop to modern, cloud-based tools such as Snowflake. Having worked extensively with traditional technologies, combined with his knowledge of modern platforms, he has accumulated substantial practical expertise in data warehousing and analytics in Snowflake, which he has subsequently captured in his publications.
Read more about Hamid Mahmood Qureshi

Hammad Sharif

Hammad Sharif is an experienced data architect with more than a decade of experience in the information domain, covering governance, warehousing, data lakes, streaming data, and machine learning. He has worked with a leading data warehouse vendor for a decade as part of a professional services organization, advising customers in telco, retail, life sciences, and financial industries located in Asia, Europe, and Australia during presales and post-sales implementation cycles. Hammad holds an MSc. in computer science and has published conference papers in the domains of machine learning, sensor networks, software engineering, and remote sensing.
Read more about Hammad Sharif

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages