Reader small image

You're reading from  Data Modeling with Snowflake

Product typeBook
Published inMay 2023
PublisherPackt
ISBN-139781837634453
Edition1st Edition
Right arrow
Author (1)
Serge Gershkovich
Serge Gershkovich
author image
Serge Gershkovich

Serge Gershkovich is a seasoned data architect with decades of experience designing and maintaining enterprise-scale data warehouse platforms and reporting solutions. He is a leading subject matter expert, speaker, content creator, and Snowflake Data Superhero. Serge earned a bachelor of science degree in information systems from the State University of New York (SUNY) Stony Brook. Throughout his career, Serge has worked in model-driven development from SAP BW/HANA to dashboard design to cost-effective cloud analytics with Snowflake. He currently serves as product success lead at SqlDBM, an online database modeling tool.
Read more about Serge Gershkovich

Right arrow

Mastering Snowflake’s Architecture

For as long as databases have existed, they have faced recurring challenges in managing concurrency and scalability in the face of growing data volume and processing demands. Many innovative designs have been attempted over the years and have been met with varying degrees of success. However, that success often came with fresh drawbacks.

The Snowflake team saw that overcoming the age-old challenges of handling independent consumption demands of data storage and analysis required a radically new approach. The team decided to design a database that could operate natively on top of cloud computing platforms and thereby offer near-limitless scalability. Their efforts resulted in the creation of what Snowflake calls the Data Cloud—a platform that enables real-time data sharing and on-demand workload sizing through the separation of storage and compute.

In this chapter, we will cover the following topics:

  • Explore how databases...

Traditional architectures

To appreciate the innovation of the Snowflake Data Cloud, we have to take a step back and recall the designs and related limitations associated with its predecessors. Long before the advent of the cloud, databases started as physical on-premises appliances and, since their inception, have all faced the same challenge: scalability.

In the past, databases were confined to a physical server on which they relied for storage and processing power. As usage increased, memory would fill up, and CPU demand would reach the available limit, forcing the user to add more resources to the server or buy a new one altogether. As either response involved maintenance and downtime, hardware purchases had to be forward-looking, anticipating database growth several years into the future.

The following figure outlines the structure and key pieces of a traditional database. Although processing power, memory, and disk space were all customizable to a degree, they came packaged...

Snowflake’s solution

To address the scalability issue that has plagued databases since inception, the Snowflake team decided to formulate a new approach that would not be tied down by the limitations of past designs. They developed a modern platform built natively for the cloud that uses its unique features to enable concurrency, scalability, and real-time collaboration.

Snowflake’s innovative cloud architecture still relies on physical disks, but it integrates them logically to make centralized storage available to its computing clusters without concurrency bottlenecks or data replication overhead. Finally, the best of what shared-disk and shared-nothing promised: separating the data from compute workloads, which can be independently provisioned and resized.

Snowflake runs entirely on virtually provisioned resources from cloud platforms (Amazon, Microsoft, and Google Cloud). Snowflake handles all interactions with the cloud provider transparently, abstracting the...

Snowflake’s three-tier architecture

Snowflake architecture consists of three layers: storage, compute, and cloud services. Snowflake manages all three layers so that interactions with the underlying cloud architecture are transparent to the users.

The following is an illustration of how Snowflake’s architecture runs on top of cloud data platforms and separates disk from virtual compute clusters while managing a separate operational services layer (so you, the user, don’t have to).

Figure 3.4 – Snowflake hybrid cloud architecture

Figure 3.4 – Snowflake hybrid cloud architecture

Now, let us get to know each of the three layers before explaining how they combine to enable Snowflake’s innovative features, such as zero-copy cloning and Time Travel.

Storage layer

The storage layer physically stores data on disks in the cloud provider hosting the Snowflake account. As data is loaded into Snowflake, it is compressed, encrypted, and logically organized into tables...

Snowflake’s features

With its revolutionary cloud architecture, Snowflake continues to innovate and (pleasantly) surprise its users with game-changing performance enhancements beyond those that made it famous from its inception. While this is by no means a comprehensive list, the following sections highlight some of the most exciting and relevant features when it comes to data modeling.

Zero-copy cloning

Zero-copy cloning allows Snowflake users to clone data without physically duplicating it. Not having to move data means cloning happens instantly—whether cloning a table or an entire database. Cloned objects are virtual copies of their source, so they do not incur storage costs. Once data changes occur in the clone or its source, the clone becomes a physical object and begins consuming storage resources.

Cloning is an ideal way to create system backups and testing environments—achieving in seconds what used to take days. At the object level, cloning is...

Costs to consider

Unlike on-premises databases, which are purchased upfront and used for the duration of their life cycle, Snowflake employs a consumption-based model known as variable spend (commonly referred to as pay-as-you-go). Variable spend enables teams to do rapid prototyping or experiment with proofs of concept without any upfront investment and to control their costs by monitoring and adjusting usage patterns. Here, we will break down the types of costs associated with using the Snowflake platform so that we can make informed design decisions later on.

Let us begin with the cost of storing data in the cloud.

Storage costs

Snowflake bills its customers based on the daily average of the data stored in the platform. Since Snowflake’s services layer automatically compresses data for optimal storage, customers enjoy lower storage costs without sacrificing performance. However, it is not just the raw data that counts toward storage quotas. Time Travel and fail-safe...

Saving cash by using cache

With on-premises databases, inefficient operations resulted in longer execution times. In Snowflake’s variable spend model, that extra time is coupled with monetary penalties. Besides writing efficient SQL, Snowflake users should also understand the various caches associated with the service and virtual compute layers to understand where they can take advantage of pre-calculated results. A firm grasp of Snowflake caching will also inform decisions when modeling and building data pipelines.

Let us start with the services layer and familiarize ourselves with the caches it manages and offers its users.

Services layer

The services layer handles two types of cache: metadata and query results cache.

Metadata cache

The services layer manages object metadata, such as structure, row counts, and distinct values by column. Reviewing this metadata through related SQL functions or the Snowflake UI will not require a running warehouse and does not...

Summary

Snowflake’s hybrid cloud-native design, built for the cloud from the ground up, enables real-time data sharing and on-demand workload sizing that gives its users unparalleled flexibility—overcoming many scalability limitations of previous database architectures. Snowflake’s architecture allows secure data sharing between organizations across regions and cloud providers as quickly as it does between databases in the same account.

By understanding each of the layers that make up Snowflake’s cloud architecture (storage, compute, and services), we gained insight into how they enable powerful features such as zero-copy cloning, Time Travel, Hybrid Unistore, and hybrid transactional/analytical processing (HTAP) tables and open the gates to interacting with semi-structured and unstructured data.

This chapter also outlined the costs of each of the three architecture layers and how to keep them in check. Furthermore, we discussed how various caching...

Further reading

For the definitive guide (says so right there in the title!) on all of Snowflake’s features and object types, beyond the modeling-related content covered in this book, consider Joyce Avila’s excellent and complete reference:

  • Avila, Joyce. Snowflake: The Definitive Guide: Architecting, Designing, and Deploying on the Snowflake Data Cloud. O’Reilly Media, 2022.
lock icon
The rest of the chapter is locked
You have been reading a chapter from
Data Modeling with Snowflake
Published in: May 2023Publisher: PacktISBN-13: 9781837634453
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Serge Gershkovich

Serge Gershkovich is a seasoned data architect with decades of experience designing and maintaining enterprise-scale data warehouse platforms and reporting solutions. He is a leading subject matter expert, speaker, content creator, and Snowflake Data Superhero. Serge earned a bachelor of science degree in information systems from the State University of New York (SUNY) Stony Brook. Throughout his career, Serge has worked in model-driven development from SAP BW/HANA to dashboard design to cost-effective cloud analytics with Snowflake. He currently serves as product success lead at SqlDBM, an online database modeling tool.
Read more about Serge Gershkovich