Reader small image

You're reading from  Snowflake Cookbook

Product typeBook
Published inFeb 2021
Reading LevelBeginner
PublisherPackt
ISBN-139781800560611
Edition1st Edition
Languages
Concepts
Right arrow
Authors (2):
Hamid Mahmood Qureshi
Hamid Mahmood Qureshi
author image
Hamid Mahmood Qureshi

Hamid Qureshi is a senior cloud and data warehouse professional with almost two decades of total experience, having architected, designed, and led the implementation of several data warehouse and business intelligence solutions. He has extensive experience and certifications across various data analytics platforms, ranging from Teradata, Oracle, and Hadoop to modern, cloud-based tools such as Snowflake. Having worked extensively with traditional technologies, combined with his knowledge of modern platforms, he has accumulated substantial practical expertise in data warehousing and analytics in Snowflake, which he has subsequently captured in his publications.
Read more about Hamid Mahmood Qureshi

Hammad Sharif
Hammad Sharif
author image
Hammad Sharif

Hammad Sharif is an experienced data architect with more than a decade of experience in the information domain, covering governance, warehousing, data lakes, streaming data, and machine learning. He has worked with a leading data warehouse vendor for a decade as part of a professional services organization, advising customers in telco, retail, life sciences, and financial industries located in Asia, Europe, and Australia during presales and post-sales implementation cycles. Hammad holds an MSc. in computer science and has published conference papers in the domains of machine learning, sensor networks, software engineering, and remote sensing.
Read more about Hammad Sharif

View More author details
Right arrow

Chapter 9: Advanced SQL Techniques

In this chapter, you'll learn how to use multiple advanced SQL techniques using the Snowflake data warehouse. These SQL techniques are essential from a data warehousing perspective, and include trend analysis, temporal analytics, managing sequences, unique counts, and managing processes as transactions.

The following recipes will be covered in this chapter:

  • Managing timestamp data
  • Shredding date data to extract calendar information
  • Unique counts and Snowflake
  • Managing transactions in Snowflake
  • Ordered analytics over window frames
  • Generating sequences in Snowflake

Let's get started!

Technical requirements

The recipes in this chapter work on Snowflake's web UI, as well as SnowSQL. The code for this chapter can be found at the following GitHub URL: https://github.com/PacktPublishing/Snowflake-Cookbook/tree/master/Chapter09.

Managing timestamp data

The recipe will provide you with an example of how files containing timestamp data should be managed using the timestamp functions available in Snowflake, as well as how formats are managed for the standard ISO format and non-standard formats. This recipe shall also provide you with some useful examples of how to avoid mixups that may occur if a source system provides inconsistent date and time data, or different zones are involved, as well as how to apply time zones using session.

Getting ready

Note that this recipe's steps can be run either in the Snowflake web UI or the SnowSQL command-line client.

How to do it…

Let's look at some practical examples of common date and time queries and calculations. We shall start with date and time data types, followed by timestamps. We shall investigate how time zones can be handled in Snowflake. In practical scenarios, data is received in text files, which will contain date and time data available...

Shredding date data to extract Calendar information

In this recipe, you'll learn how the different functions that are available in Snowflake can help us extract parts of data, such as the day, day of the week, week of the year, name of the day, and important days in business reporting cycles, such as the last day of the month or the first day of the month. These capabilities are essential in data warehousing as most of the data uses time and date as dimensions. Along the way, this recipe shall introduce some pitfalls related to the start of the week configuration in Snowflake. You'll learn how to develop a utility calendar table with different columns related to the date.

Getting ready

Note that this recipe's steps can be run either in the Snowflake WebUI or the SnowSQL command-line client.

How to do it…

As part of this recipe, we shall start with a simple date generator. The generator function will be used to generate calendar dates for the calendar...

Unique counts and Snowflake

Estimating unique counts for rows in distributed systems is a compute-intensive process. Snowflake uses a distributed algorithm that is state of the art. This technique is different from those used in other databases. It is faster but, at the same time, approximate. The recipe shall show you how to use the uniqueness functions that are available in Snowflake.

Getting ready

Note that this recipe's steps can be run either in the Snowflake WebUI or the SnowSQL command-line client. An extra small warehouse will be used in the recipe.

How to do it…

As part of this recipe, we shall explore the different count functions/capabilities of Snowflake. We shall start with the typical count and distinct count functions and how different combinations can yield different results. Then, we shall explore the HyperLogLog algorithm implementation in Snowflake, which can efficiently approximate count over groups. This is recommended for use cases where...

Managing transactions in Snowflake

Snowflake is an ACID-compliant database system. ACID is an acronym for atomicity, consistency, isolation, and durability, a set of properties for database transactions that ensure that the data in a database remains valid during operations that change the data, regardless of errors and failures. This recipe will provide you with insights into how to use transactions in Snowflake. The different concepts around isolation will be covered in this recipe. Snowflake's limitations will also be explained, particularly around the scope of a transaction, how commits and rollbacks work, and how the session is important in managing transaction behavior.

Getting ready

Note that this recipe's steps can be run either in the Snowflake WebUI or the SnowSQL command-line client.

How to do it…

The recipe will start by introducing Snowflake constructs for enabling transactions. We shall investigate how transactions can be started, as well...

Ordered analytics over window frames

The recipe will provide you with insight into Snowflake's ability to run ordered or simple analytics over subsets of rows. Such analytics are typically used in marketing analytics applications, where moving average or cumulative functions are applied to data to identify trends. These capabilities help data scientists wrangle large datasets.

Getting ready

Note that this recipe's steps can be run either in the Snowflake WebUI or the SnowSQL command-line client. We shall be generating data that we intend to use in this recipe. The dataset will have three columns: customer_id, deposit_dt, and deposit. This data will capture deposits that have been made by a customer on a particular date.

How to do it…

Let's start by generating some sample data. We shall create a view with the logic to generate data:

  1. The following query will be used to generate base data that will be used to implement a view. We will be using...

Generating sequences in Snowflake

Sequence generation is a common practice in data warehousing that's required in different scenarios where row uniqueness is needed. This recipe will demonstrate what sequence value variations can be made with Snowflake sequences, while also demonstrating how to configure a column in a table so that it defaults to an auto-increment value.

Getting ready

Note that this recipe's steps can be run either in the Snowflake WebUI or the SnowSQL command-line client.

How to do it…

In this recipe, we shall be exploring the sequence creation process and the various parameters that control a sequence's behavior. Then, we will create and populate tables where their column values are based on values from sequences. Let's get started:

  1. We will start by creating a database where we will create the objects for this recipe. Within this database, we will create a basic sequence object, as shown here:
    CREATE DATABASE C9_R6;...
lock icon
The rest of the chapter is locked
You have been reading a chapter from
Snowflake Cookbook
Published in: Feb 2021Publisher: PacktISBN-13: 9781800560611
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
Hamid Mahmood Qureshi

Hamid Qureshi is a senior cloud and data warehouse professional with almost two decades of total experience, having architected, designed, and led the implementation of several data warehouse and business intelligence solutions. He has extensive experience and certifications across various data analytics platforms, ranging from Teradata, Oracle, and Hadoop to modern, cloud-based tools such as Snowflake. Having worked extensively with traditional technologies, combined with his knowledge of modern platforms, he has accumulated substantial practical expertise in data warehousing and analytics in Snowflake, which he has subsequently captured in his publications.
Read more about Hamid Mahmood Qureshi

author image
Hammad Sharif

Hammad Sharif is an experienced data architect with more than a decade of experience in the information domain, covering governance, warehousing, data lakes, streaming data, and machine learning. He has worked with a leading data warehouse vendor for a decade as part of a professional services organization, advising customers in telco, retail, life sciences, and financial industries located in Asia, Europe, and Australia during presales and post-sales implementation cycles. Hammad holds an MSc. in computer science and has published conference papers in the domains of machine learning, sensor networks, software engineering, and remote sensing.
Read more about Hammad Sharif