You're reading from Fundamentals of Analytics Engineering

Product typeBook

Published inMar 2024

PublisherPackt

ISBN-139781837636457

Edition1st Edition

Concepts

Data Analysis

Authors (7):

Dumky De Wilde

Fanny Kassapian

Jovan Gligorevic

Juan Manuel Perafan

Lasse Benninga

Ricardo Angel Granados Lopez

Taís Laurindo Pereira

View More author details

The Modern Data Stack

Data’s exponential growth and the need to incorporate analytics-driven decision-making have brought new challenges and requirements to organizations. Information must be reliable, scalable, and delivered quickly to remain competitive. In this scenario, a new set of tools, technologies, and processes starts to gain traction – the Modern Data Stack (MDS).

In this chapter, you will learn what the MDS is, the principles that differentiate it from legacy stacks, and its advantages and disadvantages. Although this set of tools brought significant technological advancements versus legacy tightly coupled systems, it is important to be aware of its pitfalls and the recent considerations when choosing this type of stack.

The main topics that are going to be covered are as follows:

Understanding the Modern Data Stack
Explaining three key differentiators versus legacy stacks
Discussing the advantages and disadvantages of the MDS

...

Understanding a Modern Data Stack

As the name suggests, the MDS represents a technological evolution compared to previous systems widely used in recent decades. From the development of the business data warehouse in the 1980s to the rise of cloud technology with Amazon Web Services (AWS) in the early 2000s, on-premises legacy data stacks dominated the landscape. These systems had a monolithic IT infrastructure, resulting in complex maintenance. The MDS transformed this scenario – bringing modularity and cloud-native tools. However, before we dive into the details, let’s first define what a data stack is.

A data stack is a collection of tools and services as part of an extensive technology infrastructure designed to ingest, store, transform, and serve data. It makes data accessible across an organization and is fundamental to delivering business insights through reporting and dashboards, advanced analytics, and Machine Learning (ML) applications. Figure 2.1 illustrates...

Explaining three key differentiators versus legacy stacks

The MDS brought a series of technological advancements in comparison with precursor systems (pre-cloud). As demonstrated in Figure 2.1, the main differentiators are its SQL-first approach, cloud-native tools, and managed and modular solutions. In this section, we are going to deep dive into the details of these characteristics.

Lowering technical barriers with a SQL-first approach

Developed in the 1970s by IBM researchers, SQL – or SEQUEL as it was called originally – was designed to access data in an integrated relational database, by both professional programmers and more occasional database users. Here is a link to the full version of the paper, published by Donald D. Chamberlin and Raymond F. Boyce (1976): https://doi.org/10.1145/800296.811515.

Decades later, SQL remains indispensable for this purpose, and its popularity has increased sharply. In the Stack Overflow Developer Survey 2022, it was the...

Discussing the advantages and disadvantages of the MDS

Although the MDS simplifies the data life cycle significantly, it is important to highlight and discuss its pitfalls.

The advantages can be summarized as follows:

Speed of delivery: Setting up an MDS can take as little as a few hours to a couple of days, depending on the specific use case. Additionally, its self-service characteristic brings data products and solutions closer to where the business questions arise.
Cost: As the MDS is cloud-based and uses out-of-the-box tools, costs are cut down significantly due to reduced complexity in architecture and hardware. In addition, cloud-native data warehouses provide cheaper data processing and storage when compared to on-premises systems.
Democratization: Here, we see two dimensions of democratization – lowered technical barriers and the dissemination of information across an organization. The combination of managed ingestion and SQL-first tools lowers the...

Summary

In this chapter, you learned what the Modern Data Stack is and how it can transform the work of data teams by lowering technical barriers, improving the speed of information delivery, and bringing modular and managed solutions. With its combination of a SQL-first approach, cloud-based systems, and managed and modular tools, the MDS brings improvements compared to legacy, on-premises, and monolithic stacks. Still, it is important to be aware of its pitfalls, such as the large number of tools and considerations about siloed information.

With the MDS, analytics engineers can leverage managed ingestion and data transformation tools to deliver end-to-end analytics.

In the following chapter, you will get the opportunity to deepen your knowledge of one of the MDS’s components – data ingestion. You will learn common approaches to it and what the steps are to set up an efficient and performant ingestion pipeline.

The rest of the chapter is locked

You have been reading a chapter from

Fundamentals of Analytics Engineering

Published in: Mar 2024Publisher: PacktISBN-13: 9781837636457

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (7)

Dumky De Wilde

Dumky is an award-winning analytics engineer with close to 10 years of experience in setting up data pipelines, data models and cloud infrastructure. Dumky has worked with a multitude of clients from government to fintech and retail. His background is in marketing analytics and web tracking implementations, but he has since branched out to include other areas and deliver value from data and analytics across the entire organization.
Read more about Dumky De Wilde

Fanny Kassapian

Fanny has a multidisciplinary background across various industries, giving her a unique perspective on analytics workflows, from engineering pipelines to driving value for the business. As a consultant, Fanny helps companies translate opportunities and business needs into technical solutions, implement analytics engineering best practices to streamline their pipelines, and treat data as a product. She is an avid promoter of data democratization, through technology and literacy
Read more about Fanny Kassapian

Jovan Gligorevic

Jovan, an Analytics Engineer, specializes in data modeling and building analytical dashboards. Passionate about delivering end-to-end analytics solutions and enabling self-service analytics, he has a background in business and data science. With skills ranging from machine learning to dashboarding, Jovan has democratized data across diverse industries. Proficient in various tools and programming languages, he has extensive experience with the modern data stack. Jovan enjoys providing trainings in dbt and Power BI, sharing his knowledge generously
Read more about Jovan Gligorevic

Juan Manuel Perafan

Juan Manuel Perafan 8 years of experience in the realm of analytics (5 years as a consultant). Juan was the first analytics engineer hired by Xebia back in 2020. Making him one of the earliest adopters of this way of working. Besides helping his clients realizing the value of their data, Juan is also very active in the data community. He has spoken at dozens of conferences and meetups around the world (including Coalesce 2023). Additionally, he is the founder of the Analytics Engineering meetup in the Netherlands as well as the Dutch dbt meetup
Read more about Juan Manuel Perafan

Lasse Benninga

Lasse has been working in the dataspace since 2018, starting out as a Data Engineer at a large airline, then switching towards Cloud Engineering for a consultancy and working for different clients in the retailing and healthcare space. Since 2021, he's an Analytics Engineer at Xebia Data, merging software/platform engineering with analytics passion. As a consultant Lasse has seen many different clients, ranging from retail, healthcare, ridesharing industry, and trading companies. He has implemented multiple data platforms and worked in all three major clouds, leveraging his knowledge of data and analytics to provide value
Read more about Lasse Benninga

Ricardo Angel Granados Lopez

Ricardo, an Analytics Engineer with a strong background in data engineering and analysis, is a quick learner and tech enthusiast. With a Master's in IT Management specializing in Data Science, he excels in using various programming languages and tools to deliver valuable insights. Ricardo, experienced in diverse industries like energy, transport, and fintech, is adept at finding alternative solutions for optimal results. As an Analytics Engineer, he focuses on driving value from data through efficient data modeling, using best practices, automating tasks and improving data quality
Read more about Ricardo Angel Granados Lopez

Taís Laurindo Pereira

Taís is a versatile data professional with experience in a diverse range of organizations - from big corporations to scale-ups. Before her move to Xebia, she had the chance to develop distinct data products, such as dashboards and machine learning implementations. Currently, she has been focusing on end-to-end analytics as an Analytics Engineer. With a mixed background in engineering and business, her mission is to contribute to data democratization in organizations, by helping them to overcome challenges when working with data at scale
Read more about Taís Laurindo Pereira

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages