You're reading from Data Modeling with Snowflake

Product typeBook

Published inMay 2023

PublisherPackt

ISBN-139781837634453

Edition1st Edition

Concepts

Data Engineering

Author (1)

Serge Gershkovich

Mastering Snowflake Objects

The previous chapter introduced the innovative architecture that powers the Data Cloud and unlocks possibilities that have never been possible in other databases. In this chapter, we will explore the database objects that Snowflake customers will use in their modeling journey. Objects such as tables and views will undoubtedly be familiar to most people who have previously worked with SQL. Still, even here, Snowflake’s unique features unlock potential efficiencies in designing physical models, so users should be well acquainted with them.

Due to its variable spend pricing, Snowflake data modeling requires users to be well acquainted with the cost/performance trade-offs of its supported objects and their parameters. For those who have never worked with Snowflake, exclusive features such as streams and external tables may still be unfamiliar. But making good design decisions demands that users have a firm grasp and a thorough understanding of Snowflake...

Stages

Before data can make its way into tables, it must be loaded into Snowflake using a stage. Stages are logical objects that abstract cloud filesystems so they can be used in a standard manner to load data into Snowflake. There are two types of stages that can be defined: external and internal. External stages can be created on top of a cloud location (for the supported cloud storage services) outside Snowflake and are used to load data from external source systems. Internal stages are created within the Snowflake account and therefore use the storage type of the hosting provider. Internal stages are used to stage files that originate from within the Snowflake account and can not be used to load external data.

Supported cloud storage services for external stages are as follows:

Amazon S3 buckets
Google Cloud Storage buckets
Microsoft Azure containers

The following figure shows the layout of external stages in relation to internal stage types.

...

File formats

File formats can be thought of as templates describing the file types for reading or writing data to a file. A file format defines properties such as the delimiter type (e.g., comma, tab, or other), date/time formats, and encoding (for example, Hex Base64, and UTF-8).

Snowflake provides several structured and semi-structured file formats, such as CSV, JSON, and Parquet, with commonly used defaults. However, users are free to create their own named file formats when loading or unloading files of a specified type. File format properties may also be specified at runtime when using the COPY INTO command.

A description of supported file types and file formats for data loading is available in the Snowflake documentation: https://docs.snowflake.com/en/user-guide/data-load-prepare.html.

For a complete overview of data loading of all kinds, including streaming, see the guide on the Snowflake website: https://docs.snowflake.com/en/user-guide-data-load.html.

Having seen...

Tables

Data in Snowflake is stored in tables, which, as discussed, are one of the fundamental components of data modeling. However, before exploring them in a modeling context, we should understand the various table types that exist in Snowflake and their costs.

The previous chapter described Snowflake’s Time Travel, a feature that allows restoring dropped objects or querying data at a prior point in time. However, Time Travel comes with associated storage costs, and the number of available Time Travel days—known as the retention period—depends on the table type, as we’ll shortly review in detail.

Snowflake also offers a managed type of Time Travel, known as Fail-safe. All permanent tables have a Fail-safe period of seven days. Unlike Time Travel, which the user can access, Fail-safe is managed by and accessible only to Snowflake to protect user data from disasters such as system failures and data breaches. To recover data stored in Fail-safe, users...

Snowflake views

Snowflake views behave like most relational databases; they store a SELECT statement over physical objects as an object in a schema. Storing a SELECT statement as a shareable database object offers several advantages to users of the system:

The SELECT statement does not need to be written from scratch each time it is required. This provides time savings and maintainability benefits through reusable modular code for data pipelines and analytics.
Consistency in filters and business logic for everyone with access to the view.
Views are separate database objects to the data sources they reference and can therefore have different permissions that do not expose all the underlying data.
Can select data from multiple sources at once.
Always shows the latest data from source tables without having to refresh it (as opposed to materializing the results as a physical table).
Zero storage cost as data is not physically copied but read in real time when...

Materialized views

Not exactly views, yet not quite tables, materialized views sit at the intersection of these two object types and offer interesting benefits with one important caveat.

As the name suggests, materialized views are actually physical tables that store the results of a view. When the underlying data changes, the result is re-materialized automatically. This means that materialized views offer the performance of a table combined with a cached query result while offering all the security benefits of a regular view.

The only trade-off is the cost. Since materialized views store query results in physical tables, they incur related storage costs. Additionally, materialized views will use credits when refreshing the data.

Materialized views have certain limitations in the logic they permit in the SELECT statement compared to regular views. Chief among them is that materialized views can only be built over a single data source (stages are allowed), so they cannot use...

Streams

Streams are logical objects that capture data changes in underlying sources, including the previously mentioned objects (physical tables, views, and external and directory tables). Whenever a DML operation occurs in the source object, a stream tracks the changes (inserts, deletions, and the before/after images of updates). Streams achieve this through an offset storage technique—logically taking an initial snapshot of data and then tracking changes through metadata columns. Although a stream can be queried like a table, it is not a separate object and does not contain table data.

When a stream is created, metadata columns are tacked onto the source object and begin tracking changes. The following table describes the metadata fields and their contents:

Figure 4.4 – Stream metadata columns

The following command creates a stream on a table:

CREATE STREAM <stream_name> ON TABLE <table_name>

For every subsequent DML...

Change tracking

While a stream is created as a standalone object in a schema, change tracking is enabled directly on tables, allowing Snowflake users to query CDC metadata. Change tracking uses the same metadata fields found in streams but appends them directly to a table. Unlike streams, the changes are not eliminated if they are used to update downstream objects; instead, the change tracking persists for the data retention time of the table.

Here is an example of how change tracking is enabled and queried for a table. In this example, three records are inserted into a change tracking-enabled table. Subsequently, one record is updated and another is deleted:

create or replace table myTable (
   myTable_id varchar(10),
   my_date date
);
-- Enable change tracking
alter table myTable set change_tracking = true;
-- Initialize a session variable for the current
--create timestamps
set cDts  = (select current_timestamp());
set cDate = (select...

Tasks

Snowflake uses tasks to schedule and automate data loading and transformation. Although data movement is not tracked in relational modeling, it is an integral part of transformational modeling and is covered here for completeness.

Tasks automate data pipelines by executing SQL in serial or parallel steps. Tasks can be combined with streams for continuous ELT workflows to process recently changed table rows. This can be done serverlessly (using auto-scalable Snowflake-managed compute clusters that do not require an active warehouse) or using a dedicated user-defined warehouse.

The code for creating a task is as follows:

CREATE TASK <task_name>
...
[ AFTER <parent_task_1> [ , <parent_task_2> , ... ] ]
[ WHEN <boolean_expr> ]
AS <sql>

Tasks are simple to understand—they run a SQL command (or execute a stored procedure) on a schedule or when called as part of a parent task. The following figure shows how tasks can be chained serially...

Summary

In this chapter, we demonstrated that Snowflake objects pack a lot of features, even behind familiar ones like tables and views. A table in Snowflake can store more than just the data—depending on its settings, it can also hold months of historical and disaster recovery backups and offer offset change tracking for CDC. Views, likewise, exceed expectations by providing change tracking and automated re-materialization.

We saw how stages mark the entry point for data to make its way from external sources to Snowflake tables. Stages also provide helpful features, such as external table access for reading file contents without copying them to a table beforehand.

Finally, to coordinate incoming data, establish automated ELT pipelines, and streamline CDC, Snowflake pairs tasks with streams to give its users full serverless or managed control—tying stages, tables, views, and all the connective transformational logic together.

Having understood the strengths and...

References

You can refer to the following resources for additional information:

Understanding & Using Time Travel (Snowflake documentation): https://docs.snowflake.com/en/user-guide/data-time-travel.html. Accessed 16 November 2022.

The rest of the chapter is locked

You have been reading a chapter from

Data Modeling with Snowflake

Published in: May 2023Publisher: PacktISBN-13: 9781837634453

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Serge Gershkovich

Serge Gershkovich is a seasoned data architect with decades of experience designing and maintaining enterprise-scale data warehouse platforms and reporting solutions. He is a leading subject matter expert, speaker, content creator, and Snowflake Data Superhero. Serge earned a bachelor of science degree in information systems from the State University of New York (SUNY) Stony Brook. Throughout his career, Serge has worked in model-driven development from SAP BW/HANA to dashboard design to cost-effective cloud analytics with Snowflake. He currently serves as product success lead at SqlDBM, an online database modeling tool.
Read more about Serge Gershkovich

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages