You're reading from Data Modeling with Snowflake

Product typeBook

Published inMay 2023

PublisherPackt

ISBN-139781837634453

Edition1st Edition

Concepts

Data Engineering

Author (1)

Serge Gershkovich

Scaling Data Models through Modern Techniques

After covering theory, architecture, terminology, methodology, and Snowflake-centered transformation strategies throughout the book, this chapter will build upon that foundational knowledge to address common data management challenges in large, complex environments. Specifically, this chapter will explore Data Vault 2.0 and Data Mesh methodologies—popular solutions that have emerged in response to some of the biggest challenges facing large organizations today. Despite their similar naming, Data Vault and Data Mesh attempt to tackle very different challenges, and are often used together.

Data Vault is a methodology that focuses on the efficient and flexible storage of data, with a primary focus on auditing and effortless scalability. It is made up of three pillars: modeling, methodology, and architecture. Its standardized, repeatable design patterns can be applied regardless of the complexity of the data or how many source systems...

Technical requirements

The scripts used to instantiate and load the examples in this chapter are available in the following GitHub repo: (https://github.com/PacktPublishing/Data-Modeling-with-Snowflake/tree/main/ch17). While key sections of this script will be highlighted in this chapter, please refer to the ch_17_data_vault.sql file for the complete code required for following the Data Vault exercise, as it is too long to reprint here in full.

Demystifying Data Vault 2.0

Data Vault emerged in the early 2000s as a response to the extensibility limitations of warehouses built using 3NF and star schema (discussed later in the chapter) models. Data Vault overcame these limitations while retaining the strengths of 3NF and star schema architectures by using a methodology especially suited to meet the needs of large enterprises. Around 2013, Data Vault was expanded to accommodate the growing demand for distributed computing and NoSQL databases, giving rise to its current iteration, Data Vault 2.0.

Data Vault uses a pattern-based design methodology to build an auditable and extensible data warehouse. When most people refer to Data Vault, they are referring to the Raw Vault, which consists of Link, Hub, and Satellite tables. Atop the Raw Vault, sits the Business Vault—designed to be a business-centric layer that abstracts the technical complexities of the underlying data sources and uses constructs such as Point-in-Time...

Modeling the data marts

This section will explore the Star and Snowflake schemas—popular options for architecting user-facing self-service schemas and data marts due to their efficiency and ease of understanding. Both approaches are designed to optimize the performance of data analysis by organizing data into a structure that makes it easy to query and analyze. But first, a quick overview of what a data mart is.

Data mart versus data warehouse

A data warehouse and a data mart are repositories for storing and managing data, but they differ in scope, purpose, and design. A data warehouse is a large, centralized repository of integrated data used to support decision-making and analysis across an entire organization. Data warehouses are optimized for complex queries and often use Kimball’s dimensional modeling technique or Inmon’s 3NF approach (described in his book Building the Data Warehouse). On the other hand, a data mart is a subset of a data warehouse designed...

Discovering Data Mesh

Data Mesh (DM) is an approach to organizing and managing data in large, complex organizations, introduced in 2019 by Zhamak Dehghani, a thought leader in the field of data architecture.

The DM approach advocates for decentralized data ownership and governance, with data treated as a product owned and managed by the teams using it. This contrasts with the traditional centralized (or, as Zhamak calls it, monolithic) approach to data management, where a single team or department is responsible for all data-related activities.

In a DM architecture, data is organized into self-contained domains, each responsible for its own data curation and sharing. These domains are often organized around business capabilities or processes and are staffed by cross-functional teams that include technical and business experts.

DM consists of four principles that aim to enable effective communication and collaboration between domains: domain-driven design, self-service, and...

Summary

Data Vault 2.0 is designed to address the challenges of managing large, complex, and rapidly changing data environments. It is a hybrid approach that combines elements of 3NF and star schema and uses a standardized, repeatable design pattern that can be applied to any dataset, regardless of size or complexity.

Data Vault design begins by defining the business model and constructing the base layer, known as the Raw Vault. The Raw Vault contains the following elements:

Hubs – natural keys that identify business entities
Links – store the interactions between business entities
Satellites – store the descriptions and attributes of business entities
Reference tables – include descriptive information and metadata

On top of the Raw Vault, a Business Vault is constructed to meet changing business needs and requirements without disrupting the overall data architecture. Next, domain-oriented information marts are built to meet...

The rest of the chapter is locked

You have been reading a chapter from

Data Modeling with Snowflake

Published in: May 2023Publisher: PacktISBN-13: 9781837634453

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at ₹800/month. Cancel anytime

Author (1)

Serge Gershkovich

Serge Gershkovich is a seasoned data architect with decades of experience designing and maintaining enterprise-scale data warehouse platforms and reporting solutions. He is a leading subject matter expert, speaker, content creator, and Snowflake Data Superhero. Serge earned a bachelor of science degree in information systems from the State University of New York (SUNY) Stony Brook. Throughout his career, Serge has worked in model-driven development from SAP BW/HANA to dashboard design to cost-effective cloud analytics with Snowflake. He currently serves as product success lead at SqlDBM, an online database modeling tool.
Read more about Serge Gershkovich

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages