Reader small image

You're reading from  Data Modeling with Snowflake

Product typeBook
Published inMay 2023
PublisherPackt
ISBN-139781837634453
Edition1st Edition
Right arrow
Author (1)
Serge Gershkovich
Serge Gershkovich
author image
Serge Gershkovich

Serge Gershkovich is a seasoned data architect with decades of experience designing and maintaining enterprise-scale data warehouse platforms and reporting solutions. He is a leading subject matter expert, speaker, content creator, and Snowflake Data Superhero. Serge earned a bachelor of science degree in information systems from the State University of New York (SUNY) Stony Brook. Throughout his career, Serge has worked in model-driven development from SAP BW/HANA to dashboard design to cost-effective cloud analytics with Snowflake. He currently serves as product success lead at SqlDBM, an online database modeling tool.
Read more about Serge Gershkovich

Right arrow

Scaling Data Models through Modern Techniques

After covering theory, architecture, terminology, methodology, and Snowflake-centered transformation strategies throughout the book, this chapter will build upon that foundational knowledge to address common data management challenges in large, complex environments. Specifically, this chapter will explore Data Vault 2.0 and Data Mesh methodologies—popular solutions that have emerged in response to some of the biggest challenges facing large organizations today. Despite their similar naming, Data Vault and Data Mesh attempt to tackle very different challenges, and are often used together.

Data Vault is a methodology that focuses on the efficient and flexible storage of data, with a primary focus on auditing and effortless scalability. It is made up of three pillars: modeling, methodology, and architecture. Its standardized, repeatable design patterns can be applied regardless of the complexity of the data or how many source systems...

Technical requirements

The scripts used to instantiate and load the examples in this chapter are available in the following GitHub repo: (https://github.com/PacktPublishing/Data-Modeling-with-Snowflake/tree/main/ch17). While key sections of this script will be highlighted in this chapter, please refer to the ch_17_data_vault.sql file for the complete code required for following the Data Vault exercise, as it is too long to reprint here in full.

Demystifying Data Vault 2.0

Data Vault emerged in the early 2000s as a response to the extensibility limitations of warehouses built using 3NF and star schema (discussed later in the chapter) models. Data Vault overcame these limitations while retaining the strengths of 3NF and star schema architectures by using a methodology especially suited to meet the needs of large enterprises. Around 2013, Data Vault was expanded to accommodate the growing demand for distributed computing and NoSQL databases, giving rise to its current iteration, Data Vault 2.0.

Data Vault uses a pattern-based design methodology to build an auditable and extensible data warehouse. When most people refer to Data Vault, they are referring to the Raw Vault, which consists of Link, Hub, and Satellite tables. Atop the Raw Vault, sits the Business Vault—designed to be a business-centric layer that abstracts the technical complexities of the underlying data sources and uses constructs such as Point-in-Time...

Modeling the data marts

This section will explore the Star and Snowflake schemas—popular options for architecting user-facing self-service schemas and data marts due to their efficiency and ease of understanding. Both approaches are designed to optimize the performance of data analysis by organizing data into a structure that makes it easy to query and analyze. But first, a quick overview of what a data mart is.

Data mart versus data warehouse

A data warehouse and a data mart are repositories for storing and managing data, but they differ in scope, purpose, and design. A data warehouse is a large, centralized repository of integrated data used to support decision-making and analysis across an entire organization. Data warehouses are optimized for complex queries and often use Kimball’s dimensional modeling technique or Inmon’s 3NF approach (described in his book Building the Data Warehouse). On the other hand, a data mart is a subset of a data warehouse designed...

Discovering Data Mesh

Data Mesh (DM) is an approach to organizing and managing data in large, complex organizations, introduced in 2019 by Zhamak Dehghani, a thought leader in the field of data architecture.

The DM approach advocates for decentralized data ownership and governance, with data treated as a product owned and managed by the teams using it. This contrasts with the traditional centralized (or, as Zhamak calls it, monolithic) approach to data management, where a single team or department is responsible for all data-related activities.

In a DM architecture, data is organized into self-contained domains, each responsible for its own data curation and sharing. These domains are often organized around business capabilities or processes and are staffed by cross-functional teams that include technical and business experts.

DM consists of four principles that aim to enable effective communication and collaboration between domains: domain-driven design, self-service, and...

Summary

Data Vault 2.0 is designed to address the challenges of managing large, complex, and rapidly changing data environments. It is a hybrid approach that combines elements of 3NF and star schema and uses a standardized, repeatable design pattern that can be applied to any dataset, regardless of size or complexity.

Data Vault design begins by defining the business model and constructing the base layer, known as the Raw Vault. The Raw Vault contains the following elements:

  • Hubs – natural keys that identify business entities
  • Links – store the interactions between business entities
  • Satellites – store the descriptions and attributes of business entities
  • Reference tables – include descriptive information and metadata

On top of the Raw Vault, a Business Vault is constructed to meet changing business needs and requirements without disrupting the overall data architecture. Next, domain-oriented information marts are built to meet...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Data Modeling with Snowflake
Published in: May 2023Publisher: PacktISBN-13: 9781837634453
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at ₹800/month. Cancel anytime

Author (1)

author image
Serge Gershkovich

Serge Gershkovich is a seasoned data architect with decades of experience designing and maintaining enterprise-scale data warehouse platforms and reporting solutions. He is a leading subject matter expert, speaker, content creator, and Snowflake Data Superhero. Serge earned a bachelor of science degree in information systems from the State University of New York (SUNY) Stony Brook. Throughout his career, Serge has worked in model-driven development from SAP BW/HANA to dashboard design to cost-effective cloud analytics with Snowflake. He currently serves as product success lead at SqlDBM, an online database modeling tool.
Read more about Serge Gershkovich