You're reading from Snowflake Cookbook

Product typeBook

Published inFeb 2021

Reading LevelBeginner

PublisherPackt

ISBN-139781800560611

Edition1st Edition

Languages

Python

Tools

Cloud Foundry

Concepts

Data Science

Authors (2):

Hamid Mahmood Qureshi

Hammad Sharif

View More author details

Chapter 4: Building Data Pipelines in Snowflake

Snowflake, like other data platforms, offers tools and abstractions to developers/users to build data pipelines to enable data processing and analytics. But being a cloud database, it has different ways of handling data pipelines. In a typical data pipeline, there are ways to execute a piece of code, sequence pieces of code to execute one after the other, and create dependencies within the pipeline and on the environment. Snowflake structures pipelines using the notions of tasks and streams. A pipeline allows developers to create a sequence of data processes that are represented by tasks. A task represents a data process that can be logically atomic. The other concept of a stream allows data processing applications to be intelligent, triggering data processing based on a change happening in the data landscape.

This chapter deals with setting up pipelines using tasks and streams and applying different techniques for transforming data...

Technical requirements

This chapter assumes that you have a Snowflake account already set up. The code for this chapter can be found at the following GitHub URL:

https://github.com/PacktPublishing/Snowflake-Cookbook/tree/master/Chapter04

Creating and scheduling a task

In this recipe, we will create a new task that runs a set of steps for processing data and configures the task to execute on a set schedule.

Getting ready

Note that the steps for this recipe can be run either in the Snowflake web UI or the SnowSQL command-line client.

How to do it…

To demonstrate the concept of a task, we will first create an aggregation query that we assume is being used in a report. We are assuming that the query takes a long time to run, therefore we are going to save the results of the query to a physical table and then refresh it periodically through a scheduled task. Let's see how to run tasks:

To simplify the process for you, we have used the sample data provided by Snowflake and created an aggregation query on top of that. (Please note that sample data is included with your Snowflake instance and can be found under the SNOWFLAKE_SAMPLE_DATA database.) We will be using a fictitious query on the sample...

Conjugating pipelines through a task tree

In this recipe, we will connect multiple tasks together in a tree to produce a data pipeline that performs multiple functions as it executes.

Getting ready

The following steps describe the various ways to create and schedule a task. Note that these steps can be run either in the Snowflake web UI or the SnowSQL command-line client.

How to do it…

To demonstrate the concept of a task tree, we will first create an aggregation query that we assume is being used in a report. We are assuming that the query takes a long time to run, therefore we are going to save the results of the query to a physical table and then refresh it periodically through a scheduled task. The steps are as follows:

To simplify the process for you, we have used the sample data provided by Snowflake and created an aggregation query on top of that. (Please note that sample data is included with your Snowflake instance and can be found under the SNOWFLAKE_SAMPLE_DATA...

Querying and viewing the task history

In this recipe, we will explore techniques that can be used to view the history of task execution, using the TASK_HISTORY table function.

Getting ready

The following steps describe the various ways to view and analyze the history of the execution of a single task as well as a series of tasks. Note that these steps can be run either in the Snowflake web UI or the SnowSQL command-line client.

To proceed with this recipe, ensure that you have already created and executed a few tasks; otherwise, no results will be returned.

How to do it…

To perform this recipe, let's try out the following steps:

We will use the task_history table function, which can be used to query the history of task execution. The function takes several parameters but all of them are optional, so to start with, we will run a query without any parameters. This will return the execution history of all the tasks:
```
SELECT * FROM TABLE(information_schema...
```

Exploring the concept of streams to capture table-level changes

In this recipe, we will explore the concept of streams, configure a stream on a table, and capture the changes that occur at the table level. Streams are Snowflake's way of performing change data capture on Snowflake tables and can be useful in data pipeline implementation.

Getting ready

The steps for this recipe can be run either in the Snowflake web UI or the SnowSQL command-line client.

How to do it…

The steps for this recipe are as follows:

Let's start by creating a database and a staging table on which we will create our stream object. We create a staging table to simulate data arriving from outside Snowflake and being processed further through a stream object:
```
CREATE DATABASE stream_demo;
USE DATABASE stream_demo;
CREATE TABLE customer_staging
(
  ID INTEGER,
  Name STRING,
  State STRING,
  Country STRING
);
```
The process of creating...

Combining the concept of streams and tasks to build pipelines that process changed data on a schedule

In this recipe, we will combine the concept of streams and tasks and set up a scheduled Snowflake data pipeline that processes only changed data into a target table.

How to do it…

The following steps describe how to set up a stream to track and process changes that occur on table data. The steps are as follows:

Let's start by creating a database and a staging table on which we will create our stream object. We will be creating a staging table to simulate data arriving from outside Snowflake and being processed further through a stream object:
```
CREATE DATABASE stream_demo;
USE DATABASE stream_demo;
CREATE TABLE customer_staging
(
  ID INTEGER,
  Name STRING,
  State STRING,
  Country STRING
);
```
Next, create a stream on the table that captures only the inserts. The insert-only mode is achieved by setting APPEND_ONLY to...

Converting data types and Snowflake's failure management

SQL queries frequently need to convert between data types. This recipe provides us with examples of conversion. Something that comes with conversion is failure – data type mismatches, for example. Snowflake provides a novel as well as a very structured way of handling such failure scenarios, and recovery methods. This allows the Snowflake user to build high-quality data processing pipelines that avoid failures and if they occur, know how to handle them, how to recover, and how to leave the system stable. Let's now look into Snowflake's unique approach to avoid errors during query execution by using TRY_ versions of different conversion functions.

How to do it…

The following steps walk you through various data type conversion scenarios:

Let's start with the common scenario of converting a number stored as a string into a numeric value. We will explore the example of converting to...

Managing context using different utility functions

This recipe provides you with examples of context management by the use of different Snowflake functions. These functions enable the contextual data processing commonly required in ETL.

Getting ready

The following steps explore the various contextual functions available, their intent, and how they may be used in broader processing.

How to do it…

Perform the following steps to try this recipe:

This step elaborates on some of the most frequently used information in contextual processing, which is the current date. Snowflake provides the CURRENT_DATE function, which, as the name suggests, returns the current date in the default date format:
```
SELECT CURRENT_DATE();
```
A result set showing the output of CURRENT_DATE looks as follows:
Figure 4.40 – Output of CURRENT_DATE
We can also combine the output of CURRENT_DATE with other processing logic. As an example, the following statement extracts the day name from...

The rest of the chapter is locked

You have been reading a chapter from

Snowflake Cookbook

Published in: Feb 2021Publisher: PacktISBN-13: 9781800560611

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (2)

Hamid Mahmood Qureshi

Hamid Qureshi is a senior cloud and data warehouse professional with almost two decades of total experience, having architected, designed, and led the implementation of several data warehouse and business intelligence solutions. He has extensive experience and certifications across various data analytics platforms, ranging from Teradata, Oracle, and Hadoop to modern, cloud-based tools such as Snowflake. Having worked extensively with traditional technologies, combined with his knowledge of modern platforms, he has accumulated substantial practical expertise in data warehousing and analytics in Snowflake, which he has subsequently captured in his publications.
Read more about Hamid Mahmood Qureshi

Hammad Sharif

Hammad Sharif is an experienced data architect with more than a decade of experience in the information domain, covering governance, warehousing, data lakes, streaming data, and machine learning. He has worked with a leading data warehouse vendor for a decade as part of a professional services organization, advising customers in telco, retail, life sciences, and financial industries located in Asia, Europe, and Australia during presales and post-sales implementation cycles. Hammad holds an MSc. in computer science and has published conference papers in the domains of machine learning, sensor networks, software engineering, and remote sensing.
Read more about Hammad Sharif

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages