You're reading from Snowflake Cookbook

Product typeBook

Published inFeb 2021

Reading LevelBeginner

PublisherPackt

ISBN-139781800560611

Edition1st Edition

Languages

Python

Tools

Cloud Foundry

Concepts

Data Science

Authors (2):

Hamid Mahmood Qureshi

Hammad Sharif

View More author details

Preface

Understanding a technology for analytics is an important aspect before embarking on delivering data analytic solutions, particularly in the cloud. This book introduces Snowflake tools and techniques you can use to tame challenges associated with data management, warehousing, and analytics.

The cloud provides a quick onboarding mechanism, but at the same time, for novice users who lack the knowledge to efficiently use Snowflake to build and maintain a data warehouse, using trial and error can lead to higher bills. This book provides a practical introduction and guidance for those who have used other technologies, either on-premise or in the cloud for analytics and data warehousing, and those who are keen on transferring their skills to the new technology.

The book provides practical examples that are typically involved in data warehousing and analytics in a simple way supported by code examples. It takes you through the user interface and management console offered by Snowflake and how to get started by creating an account. It also takes you through examples of how to load data and how to deliver analytics using different Snowflake capabilities and touches on extending the capabilities of Snowflake using stored procedures and user-defined functions. The book also touches on integrating Snowflake with Java and Apache Spark to allow it to coexist with a data lake.

By the end of this book, you will be able to build applications on Snowflake that can serve as the building blocks of a larger solution, alongside security, governance, the data life cycle, and the distribution of data on Snowflake.

Who this book is for

The book acts as a reference for users who want to learn about Snowflake using a practical approach. The recipe-based approach allows the different personas in data management to pick and choose what they want to learn, as and when required. The recipes are independent and start by helping you to understand the environment. The recipes require basic SQL and data warehousing knowledge.

What this book covers

Chapter 1, Getting Started with Snowflake, walks you through the process of setting up an account and connecting with Snowflake via its Web UI. The chapter introduces the concept of a virtual warehouse and how a virtual warehouse can be used elastically according to the user's needs around the complexity of user queries. Then the chapter moves onto ecosystem support for Snowflake, showing how Snowflake can be connected and used with different tools using different interfaces.

Chapter 2, Managing the Data Life Cycle, provides a set of recipes that introduce how data can be managed with Snowflake. The chapter talks about typical DBMSes and data warehouse concepts and introduces nuances specific to Snowflake.

Chapter 3, Loading and Extracting Data into and out of Snowflake, guides you on staging and loading data. In this chapter, we explore methods provided by Snowflake to load data into a Snowflake table. We explore the loading of data from familiar cloud sources such as Amazon S3, Azure Blob Storage, and GCP Cloud Storage, and on-premise hosts. The chapter also provides an example of how near-real-time data loading works on Snowflake.

Chapter 4, Building Data Pipelines in Snowflake, explains the capabilities on offer with Snowflake to process a string of SQL statements through the concept of tasks. Tasks allow developers to create data pipelines that process data and perform various functions as they progress through the execution sequence. Tasks combined with the concept of streams enables the user to manage complex orchestration and scheduling patterns. This chapter deals with setting up pipelines using tasks and streams and applying different techniques for transforming data within tasks.

Chapter 5, Data Protection and Security in Snowflake, walks you through handling authentication and authorization on Snowflake. Authentication refers to letting a user connect, and authorization refers to verifying what objects a connected user can access. Snowflake provides granular controls to limit access for out-of-the-box and custom roles. This chapter will help you set up role hierarchies, add custom roles, and set default roles for users.

Chapter 6, Performance and Cost Optimization, enables you to exploit Snowflake's capabilities to optimize queries and performance through various built-in features such as caching, autoscaling, and automatically clustering tables. There is always an opportunity to positively influence the performance by tweaking table structures, introducing physicalization techniques, and optimizing your compute resources to the maximum. In this chapter, we explore some of these techniques, which can be used to make a Snowflake-based data warehouse run more efficiently and therefore at a lower cost. The chapter also explores optimization strategies for reducing unnecessary storage costs.

Chapter 7, Secure Data Sharing, details how to share data with other Snowflake customers as well as non-Snowflake customers. Traditional warehouse solutions share data by extracting the data out and sending it over transport mechanisms, compromising data security and leading to inefficiencies. Another downside to this is that as soon as the data is extracted, it is already out of date. Snowflake overcomes this by providing a unique data sharing solution that ensures reduced costs, reduced operational overhead, and always up-to-date data.

Chapter 8, Back to the Future with Time Travel, equips you to deal with unpleasant data issues such as not determining when data was changed or whether the data has been lost altogether. Snowflake provides a unique way of going back in time through the Time Travel feature. This chapter explores the various applications of the Time Travel feature and combines it with cloning to tackle common data loss and debugging issues.

Chapter 9, Advanced SQL Techniques, provides you with multiple advanced SQL techniques using a Snowflake data warehouse. These SQL techniques are essential from a data warehousing perspective, such as trend analysis, temporal analytics, managing sequences, unique counts, and managing processes as transactions.

Chapter 10, Extending Snowflake's Capabilities, provides you with techniques for extending a Snowflake data warehouse and integrating it to coexist and work with other technologies. The chapter walks you through user-defined functions, through which custom functionality can be achieved. The chapter also involves connecting Snowflake with Apache Spark and demonstrates how to perform data processing on the Spark engine.

To get the most out of this book

You must have some knowledge of SQL, as that is the primary language used in Snowflake. There are a few examples that require knowledge of Java, Spark, and Linux:

If you are using the digital version of this book, we advise you to type the code yourself or access the code via the GitHub repository (link available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.

Download the example code files

You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/Snowflake-Cookbook.

In case there's an update to the code, it will be updated on the existing GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://static.packt-cdn.com/downloads/9781800560611_ColorImages.pdf.

Conventions used

There are a number of text conventions used throughout this book.

Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "We will start by simply selecting all rows from the QUERY_HISTORY view."

A block of code is set as follows:

CREATE TRANSIENT SCHEMA temporary_data 
DATA_RETENTION_TIME_IN_DAYS = 0
COMMENT = 'Schema containing temporary data used by ETL processes';

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

USE ROLE ACCOUNTADMIN;
USE SNOWFLAKE;
SELECT QUERY_ID, QUERY_TEXT, EXECUTION_TIME,USER_NAME 
FROM SNOWFLAKE.ACCOUNT_USAGE.query_history 
ORDER BY EXECUTION_TIME DESC;

Any command-line input or output is written as follows:

$ mkdir css
$ cd css

Bold: Indicates a new term, an important word, or words that you see on screen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: "Select System info from the Administration panel."

Tips or important notes

Appear like this.

Sections

In this book, you will find several headings that appear frequently (Getting ready, How to do it..., How it works..., There's more..., and See also).

To give clear instructions on how to complete a recipe, use these sections as follows:

Getting ready

This section tells you what to expect in the recipe and describes how to set up any software or any preliminary settings required for the recipe.

How to do it…

This section contains the steps required to follow the recipe.

How it works…

This section usually consists of a detailed explanation of what happened in the previous section.

There's more…

This section consists of additional information about the recipe in order to make you more knowledgeable about the recipe.

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at customercare@packtpub.com.

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

Piracy: If you come across any illegal copies of our works in any form on the i nternet, we would be grateful if you would provide us with the location address or website name. Please contact us at copyright@packt.com with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Reviews

Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

For more information about Packt, please visit packt.com.

The rest of the chapter is locked

You have been reading a chapter from

Snowflake Cookbook

Published in: Feb 2021Publisher: PacktISBN-13: 9781800560611

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (2)

Hamid Mahmood Qureshi

Hamid Qureshi is a senior cloud and data warehouse professional with almost two decades of total experience, having architected, designed, and led the implementation of several data warehouse and business intelligence solutions. He has extensive experience and certifications across various data analytics platforms, ranging from Teradata, Oracle, and Hadoop to modern, cloud-based tools such as Snowflake. Having worked extensively with traditional technologies, combined with his knowledge of modern platforms, he has accumulated substantial practical expertise in data warehousing and analytics in Snowflake, which he has subsequently captured in his publications.
Read more about Hamid Mahmood Qureshi

Hammad Sharif

Hammad Sharif is an experienced data architect with more than a decade of experience in the information domain, covering governance, warehousing, data lakes, streaming data, and machine learning. He has worked with a leading data warehouse vendor for a decade as part of a professional services organization, advising customers in telco, retail, life sciences, and financial industries located in Asia, Europe, and Australia during presales and post-sales implementation cycles. Hammad holds an MSc. in computer science and has published conference papers in the domains of machine learning, sensor networks, software engineering, and remote sensing.
Read more about Hammad Sharif

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages