Reader small image

You're reading from  Snowflake Cookbook

Product typeBook
Published inFeb 2021
Reading LevelBeginner
PublisherPackt
ISBN-139781800560611
Edition1st Edition
Languages
Concepts
Right arrow
Authors (2):
Hamid Mahmood Qureshi
Hamid Mahmood Qureshi
author image
Hamid Mahmood Qureshi

Hamid Qureshi is a senior cloud and data warehouse professional with almost two decades of total experience, having architected, designed, and led the implementation of several data warehouse and business intelligence solutions. He has extensive experience and certifications across various data analytics platforms, ranging from Teradata, Oracle, and Hadoop to modern, cloud-based tools such as Snowflake. Having worked extensively with traditional technologies, combined with his knowledge of modern platforms, he has accumulated substantial practical expertise in data warehousing and analytics in Snowflake, which he has subsequently captured in his publications.
Read more about Hamid Mahmood Qureshi

Hammad Sharif
Hammad Sharif
author image
Hammad Sharif

Hammad Sharif is an experienced data architect with more than a decade of experience in the information domain, covering governance, warehousing, data lakes, streaming data, and machine learning. He has worked with a leading data warehouse vendor for a decade as part of a professional services organization, advising customers in telco, retail, life sciences, and financial industries located in Asia, Europe, and Australia during presales and post-sales implementation cycles. Hammad holds an MSc. in computer science and has published conference papers in the domains of machine learning, sensor networks, software engineering, and remote sensing.
Read more about Hammad Sharif

View More author details
Right arrow

Chapter 4: Building Data Pipelines in Snowflake

Snowflake, like other data platforms, offers tools and abstractions to developers/users to build data pipelines to enable data processing and analytics. But being a cloud database, it has different ways of handling data pipelines. In a typical data pipeline, there are ways to execute a piece of code, sequence pieces of code to execute one after the other, and create dependencies within the pipeline and on the environment. Snowflake structures pipelines using the notions of tasks and streams. A pipeline allows developers to create a sequence of data processes that are represented by tasks. A task represents a data process that can be logically atomic. The other concept of a stream allows data processing applications to be intelligent, triggering data processing based on a change happening in the data landscape.

This chapter deals with setting up pipelines using tasks and streams and applying different techniques for transforming data...

Technical requirements

This chapter assumes that you have a Snowflake account already set up. The code for this chapter can be found at the following GitHub URL:

https://github.com/PacktPublishing/Snowflake-Cookbook/tree/master/Chapter04

Creating and scheduling a task

In this recipe, we will create a new task that runs a set of steps for processing data and configures the task to execute on a set schedule.

Getting ready

Note that the steps for this recipe can be run either in the Snowflake web UI or the SnowSQL command-line client.

How to do it…

To demonstrate the concept of a task, we will first create an aggregation query that we assume is being used in a report. We are assuming that the query takes a long time to run, therefore we are going to save the results of the query to a physical table and then refresh it periodically through a scheduled task. Let's see how to run tasks:

  1. To simplify the process for you, we have used the sample data provided by Snowflake and created an aggregation query on top of that. (Please note that sample data is included with your Snowflake instance and can be found under the SNOWFLAKE_SAMPLE_DATA database.) We will be using a fictitious query on the sample...

Conjugating pipelines through a task tree

In this recipe, we will connect multiple tasks together in a tree to produce a data pipeline that performs multiple functions as it executes.

Getting ready

The following steps describe the various ways to create and schedule a task. Note that these steps can be run either in the Snowflake web UI or the SnowSQL command-line client.

How to do it…

To demonstrate the concept of a task tree, we will first create an aggregation query that we assume is being used in a report. We are assuming that the query takes a long time to run, therefore we are going to save the results of the query to a physical table and then refresh it periodically through a scheduled task. The steps are as follows:

  1. To simplify the process for you, we have used the sample data provided by Snowflake and created an aggregation query on top of that. (Please note that sample data is included with your Snowflake instance and can be found under the SNOWFLAKE_SAMPLE_DATA...

Querying and viewing the task history

In this recipe, we will explore techniques that can be used to view the history of task execution, using the TASK_HISTORY table function.

Getting ready

The following steps describe the various ways to view and analyze the history of the execution of a single task as well as a series of tasks. Note that these steps can be run either in the Snowflake web UI or the SnowSQL command-line client.

To proceed with this recipe, ensure that you have already created and executed a few tasks; otherwise, no results will be returned.

How to do it…

To perform this recipe, let's try out the following steps:

  1. We will use the task_history table function, which can be used to query the history of task execution. The function takes several parameters but all of them are optional, so to start with, we will run a query without any parameters. This will return the execution history of all the tasks:
    SELECT * FROM TABLE(information_schema...

Exploring the concept of streams to capture table-level changes

In this recipe, we will explore the concept of streams, configure a stream on a table, and capture the changes that occur at the table level. Streams are Snowflake's way of performing change data capture on Snowflake tables and can be useful in data pipeline implementation.

Getting ready

The steps for this recipe can be run either in the Snowflake web UI or the SnowSQL command-line client.

How to do it…

The steps for this recipe are as follows:

  1. Let's start by creating a database and a staging table on which we will create our stream object. We create a staging table to simulate data arriving from outside Snowflake and being processed further through a stream object:
    CREATE DATABASE stream_demo;
    USE DATABASE stream_demo;
    CREATE TABLE customer_staging
    (
      ID INTEGER,
      Name STRING,
      State STRING,
      Country STRING
    );
  2. The process of creating...

Combining the concept of streams and tasks to build pipelines that process changed data on a schedule

In this recipe, we will combine the concept of streams and tasks and set up a scheduled Snowflake data pipeline that processes only changed data into a target table.

How to do it…

The following steps describe how to set up a stream to track and process changes that occur on table data. The steps are as follows:

  1. Let's start by creating a database and a staging table on which we will create our stream object. We will be creating a staging table to simulate data arriving from outside Snowflake and being processed further through a stream object:
    CREATE DATABASE stream_demo;
    USE DATABASE stream_demo;
    CREATE TABLE customer_staging
    (
      ID INTEGER,
      Name STRING,
      State STRING,
      Country STRING
    );
  2. Next, create a stream on the table that captures only the inserts. The insert-only mode is achieved by setting APPEND_ONLY to...

Converting data types and Snowflake's failure management

SQL queries frequently need to convert between data types. This recipe provides us with examples of conversion. Something that comes with conversion is failure – data type mismatches, for example. Snowflake provides a novel as well as a very structured way of handling such failure scenarios, and recovery methods. This allows the Snowflake user to build high-quality data processing pipelines that avoid failures and if they occur, know how to handle them, how to recover, and how to leave the system stable. Let's now look into Snowflake's unique approach to avoid errors during query execution by using TRY_ versions of different conversion functions.

How to do it…

The following steps walk you through various data type conversion scenarios:

  1. Let's start with the common scenario of converting a number stored as a string into a numeric value. We will explore the example of converting to...

Managing context using different utility functions

This recipe provides you with examples of context management by the use of different Snowflake functions. These functions enable the contextual data processing commonly required in ETL.

Getting ready

The following steps explore the various contextual functions available, their intent, and how they may be used in broader processing.

How to do it…

Perform the following steps to try this recipe:

  1. This step elaborates on some of the most frequently used information in contextual processing, which is the current date. Snowflake provides the CURRENT_DATE function, which, as the name suggests, returns the current date in the default date format:
    SELECT CURRENT_DATE();

    A result set showing the output of CURRENT_DATE looks as follows:

    Figure 4.40 – Output of CURRENT_DATE

  2. We can also combine the output of CURRENT_DATE with other processing logic. As an example, the following statement extracts the day name from...
lock icon
The rest of the chapter is locked
You have been reading a chapter from
Snowflake Cookbook
Published in: Feb 2021Publisher: PacktISBN-13: 9781800560611
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
Hamid Mahmood Qureshi

Hamid Qureshi is a senior cloud and data warehouse professional with almost two decades of total experience, having architected, designed, and led the implementation of several data warehouse and business intelligence solutions. He has extensive experience and certifications across various data analytics platforms, ranging from Teradata, Oracle, and Hadoop to modern, cloud-based tools such as Snowflake. Having worked extensively with traditional technologies, combined with his knowledge of modern platforms, he has accumulated substantial practical expertise in data warehousing and analytics in Snowflake, which he has subsequently captured in his publications.
Read more about Hamid Mahmood Qureshi

author image
Hammad Sharif

Hammad Sharif is an experienced data architect with more than a decade of experience in the information domain, covering governance, warehousing, data lakes, streaming data, and machine learning. He has worked with a leading data warehouse vendor for a decade as part of a professional services organization, advising customers in telco, retail, life sciences, and financial industries located in Asia, Europe, and Australia during presales and post-sales implementation cycles. Hammad holds an MSc. in computer science and has published conference papers in the domains of machine learning, sensor networks, software engineering, and remote sensing.
Read more about Hammad Sharif