Reader small image

You're reading from  Snowflake Cookbook

Product typeBook
Published inFeb 2021
Reading LevelBeginner
PublisherPackt
ISBN-139781800560611
Edition1st Edition
Languages
Concepts
Right arrow
Authors (2):
Hamid Mahmood Qureshi
Hamid Mahmood Qureshi
author image
Hamid Mahmood Qureshi

Hamid Qureshi is a senior cloud and data warehouse professional with almost two decades of total experience, having architected, designed, and led the implementation of several data warehouse and business intelligence solutions. He has extensive experience and certifications across various data analytics platforms, ranging from Teradata, Oracle, and Hadoop to modern, cloud-based tools such as Snowflake. Having worked extensively with traditional technologies, combined with his knowledge of modern platforms, he has accumulated substantial practical expertise in data warehousing and analytics in Snowflake, which he has subsequently captured in his publications.
Read more about Hamid Mahmood Qureshi

Hammad Sharif
Hammad Sharif
author image
Hammad Sharif

Hammad Sharif is an experienced data architect with more than a decade of experience in the information domain, covering governance, warehousing, data lakes, streaming data, and machine learning. He has worked with a leading data warehouse vendor for a decade as part of a professional services organization, advising customers in telco, retail, life sciences, and financial industries located in Asia, Europe, and Australia during presales and post-sales implementation cycles. Hammad holds an MSc. in computer science and has published conference papers in the domains of machine learning, sensor networks, software engineering, and remote sensing.
Read more about Hammad Sharif

View More author details
Right arrow

Preface

Understanding a technology for analytics is an important aspect before embarking on delivering data analytic solutions, particularly in the cloud. This book introduces Snowflake tools and techniques you can use to tame challenges associated with data management, warehousing, and analytics.

The cloud provides a quick onboarding mechanism, but at the same time, for novice users who lack the knowledge to efficiently use Snowflake to build and maintain a data warehouse, using trial and error can lead to higher bills. This book provides a practical introduction and guidance for those who have used other technologies, either on-premise or in the cloud for analytics and data warehousing, and those who are keen on transferring their skills to the new technology.

The book provides practical examples that are typically involved in data warehousing and analytics in a simple way supported by code examples. It takes you through the user interface and management console offered by Snowflake and how to get started by creating an account. It also takes you through examples of how to load data and how to deliver analytics using different Snowflake capabilities and touches on extending the capabilities of Snowflake using stored procedures and user-defined functions. The book also touches on integrating Snowflake with Java and Apache Spark to allow it to coexist with a data lake.

By the end of this book, you will be able to build applications on Snowflake that can serve as the building blocks of a larger solution, alongside security, governance, the data life cycle, and the distribution of data on Snowflake.

Who this book is for

The book acts as a reference for users who want to learn about Snowflake using a practical approach. The recipe-based approach allows the different personas in data management to pick and choose what they want to learn, as and when required. The recipes are independent and start by helping you to understand the environment. The recipes require basic SQL and data warehousing knowledge.

What this book covers

Chapter 1, Getting Started with Snowflake, walks you through the process of setting up an account and connecting with Snowflake via its Web UI. The chapter introduces the concept of a virtual warehouse and how a virtual warehouse can be used elastically according to the user's needs around the complexity of user queries. Then the chapter moves onto ecosystem support for Snowflake, showing how Snowflake can be connected and used with different tools using different interfaces.

Chapter 2, Managing the Data Life Cycle, provides a set of recipes that introduce how data can be managed with Snowflake. The chapter talks about typical DBMSes and data warehouse concepts and introduces nuances specific to Snowflake.

Chapter 3, Loading and Extracting Data into and out of Snowflake, guides you on staging and loading data. In this chapter, we explore methods provided by Snowflake to load data into a Snowflake table. We explore the loading of data from familiar cloud sources such as Amazon S3, Azure Blob Storage, and GCP Cloud Storage, and on-premise hosts. The chapter also provides an example of how near-real-time data loading works on Snowflake.

Chapter 4, Building Data Pipelines in Snowflake, explains the capabilities on offer with Snowflake to process a string of SQL statements through the concept of tasks. Tasks allow developers to create data pipelines that process data and perform various functions as they progress through the execution sequence. Tasks combined with the concept of streams enables the user to manage complex orchestration and scheduling patterns. This chapter deals with setting up pipelines using tasks and streams and applying different techniques for transforming data within tasks.

Chapter 5, Data Protection and Security in Snowflake, walks you through handling authentication and authorization on Snowflake. Authentication refers to letting a user connect, and authorization refers to verifying what objects a connected user can access. Snowflake provides granular controls to limit access for out-of-the-box and custom roles. This chapter will help you set up role hierarchies, add custom roles, and set default roles for users.

Chapter 6, Performance and Cost Optimization, enables you to exploit Snowflake's capabilities to optimize queries and performance through various built-in features such as caching, autoscaling, and automatically clustering tables. There is always an opportunity to positively influence the performance by tweaking table structures, introducing physicalization techniques, and optimizing your compute resources to the maximum. In this chapter, we explore some of these techniques, which can be used to make a Snowflake-based data warehouse run more efficiently and therefore at a lower cost. The chapter also explores optimization strategies for reducing unnecessary storage costs.

Chapter 7, Secure Data Sharing, details how to share data with other Snowflake customers as well as non-Snowflake customers. Traditional warehouse solutions share data by extracting the data out and sending it over transport mechanisms, compromising data security and leading to inefficiencies. Another downside to this is that as soon as the data is extracted, it is already out of date. Snowflake overcomes this by providing a unique data sharing solution that ensures reduced costs, reduced operational overhead, and always up-to-date data.

Chapter 8, Back to the Future with Time Travel, equips you to deal with unpleasant data issues such as not determining when data was changed or whether the data has been lost altogether. Snowflake provides a unique way of going back in time through the Time Travel feature. This chapter explores the various applications of the Time Travel feature and combines it with cloning to tackle common data loss and debugging issues.

Chapter 9, Advanced SQL Techniques, provides you with multiple advanced SQL techniques using a Snowflake data warehouse. These SQL techniques are essential from a data warehousing perspective, such as trend analysis, temporal analytics, managing sequences, unique counts, and managing processes as transactions.

Chapter 10, Extending Snowflake's Capabilities, provides you with techniques for extending a Snowflake data warehouse and integrating it to coexist and work with other technologies. The chapter walks you through user-defined functions, through which custom functionality can be achieved. The chapter also involves connecting Snowflake with Apache Spark and demonstrates how to perform data processing on the Spark engine.

To get the most out of this book

You must have some knowledge of SQL, as that is the primary language used in Snowflake. There are a few examples that require knowledge of Java, Spark, and Linux:

If you are using the digital version of this book, we advise you to type the code yourself or access the code via the GitHub repository (link available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.

Download the example code files

You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/Snowflake-Cookbook.

In case there's an update to the code, it will be updated on the existing GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://static.packt-cdn.com/downloads/9781800560611_ColorImages.pdf.

Conventions used

There are a number of text conventions used throughout this book.

Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "We will start by simply selecting all rows from the QUERY_HISTORY view."

A block of code is set as follows:

CREATE TRANSIENT SCHEMA temporary_data 
DATA_RETENTION_TIME_IN_DAYS = 0
COMMENT = 'Schema containing temporary data used by ETL processes';

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

USE ROLE ACCOUNTADMIN;
USE SNOWFLAKE;
SELECT QUERY_ID, QUERY_TEXT, EXECUTION_TIME,USER_NAME 
FROM SNOWFLAKE.ACCOUNT_USAGE.query_history 
ORDER BY EXECUTION_TIME DESC;

Any command-line input or output is written as follows:

$ mkdir css
$ cd css

Bold: Indicates a new term, an important word, or words that you see on screen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: "Select System info from the Administration panel."

Tips or important notes

Appear like this.

Sections

In this book, you will find several headings that appear frequently (Getting ready, How to do it..., How it works..., There's more..., and See also).

To give clear instructions on how to complete a recipe, use these sections as follows:

Getting ready

This section tells you what to expect in the recipe and describes how to set up any software or any preliminary settings required for the recipe.

How to do it…

This section contains the steps required to follow the recipe.

How it works…

This section usually consists of a detailed explanation of what happened in the previous section.

There's more…

This section consists of additional information about the recipe in order to make you more knowledgeable about the recipe.

See also

This section provides helpful links to other useful information for the recipe.

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at customercare@packtpub.com.

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

Piracy: If you come across any illegal copies of our works in any form on the i nternet, we would be grateful if you would provide us with the location address or website name. Please contact us at copyright@packt.com with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Reviews

Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

For more information about Packt, please visit packt.com.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Snowflake Cookbook
Published in: Feb 2021Publisher: PacktISBN-13: 9781800560611
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
Hamid Mahmood Qureshi

Hamid Qureshi is a senior cloud and data warehouse professional with almost two decades of total experience, having architected, designed, and led the implementation of several data warehouse and business intelligence solutions. He has extensive experience and certifications across various data analytics platforms, ranging from Teradata, Oracle, and Hadoop to modern, cloud-based tools such as Snowflake. Having worked extensively with traditional technologies, combined with his knowledge of modern platforms, he has accumulated substantial practical expertise in data warehousing and analytics in Snowflake, which he has subsequently captured in his publications.
Read more about Hamid Mahmood Qureshi

author image
Hammad Sharif

Hammad Sharif is an experienced data architect with more than a decade of experience in the information domain, covering governance, warehousing, data lakes, streaming data, and machine learning. He has worked with a leading data warehouse vendor for a decade as part of a professional services organization, advising customers in telco, retail, life sciences, and financial industries located in Asia, Europe, and Australia during presales and post-sales implementation cycles. Hammad holds an MSc. in computer science and has published conference papers in the domains of machine learning, sensor networks, software engineering, and remote sensing.
Read more about Hammad Sharif