You're reading from Data Engineering with dbt

Product typeBook

Published inJun 2023

PublisherPackt

ISBN-139781803246284

Edition1st Edition

Concepts

Data Streaming

Author (1)

Roberto Zagni

Analytics Engineering as the New Core of Data Engineering

In this chapter, we are going to understand the full life cycle of data, from its creation during operations to its consumption by business users. Along this journey, we will analyze the most important details of each phase in the data transformation process from raw data to information.

This will help us understand why analytics engineering, the part of the data journey that we focus on when working with dbt, besides remaining the most creative and interesting part of the full data life cycle, has become a crucial part of data engineering projects.

Analytics engineering transforms raw data from disparate company data sources into information ready for use with tools that analysts and businesspeople use to derive insights and support data-driven decisions.

We will then discuss the modern data stack and the way data teams can work better, defining the modern analytics engineering discipline and the roles in a data team...

Technical requirements

This chapter does not require any previous knowledge, but familiarity with data infrastructure and software engineering might make your understanding quicker and deeper.

The data life cycle and its evolution

Data engineering is the discipline of taking data that is born elsewhere, generally in many disparate places, and putting it together to make more sense to business users than the individual pieces of information in the systems they came from.

To put it another way, data engineers do not create data; they manage and integrate existing data.

As Francesco Puppini, the inventor of the Unified Star Schema, likes to say, data comes from “datum”, and it means “given.” The information we work on is given to us; we must be the best possible stewards of it.

The art of data engineering is to store data and make it available for analysis, eventually distilling it into information, without losing the original information and adding noise.

In this section, we will look at how the data flows from where it is created to where it is consumed, introducing the most important topics to consider at each step. In the next section...

Understanding the modern data stack

When we talk about data engineering, we encompass all the skillsets, tooling, and practices that cover the data life cycle from end to end, as presented in the previous section, from data extraction to user data consumption and eventually including the writing back of data.

This is a huge set of competencies and tools, ranging from security to scripting and programming, from infrastructure operation to data visualization.

Beyond very simple cases, it is quite uncommon that a single person can cover all that with a thorough understanding and good skills in all the areas involved, let alone have the time to develop and manage it.

The traditional data stack

The traditional data stack used to be built by data engineers developing ad hoc ETL processes to extract data from the source systems and transform it locally before loading it in a refined form into a traditional database used to power reporting. This is called an ETL pipeline.

The...

Defining analytics engineering

We have seen in the previous section that with the advent of the modern data stack, data movement has become easier, and the focus has therefore switched over to managing raw data and transforming it into the refined data used in reports by business users. There are still plenty of cases where ad hoc integrations and ETL pipelines are needed, but this is not the main focus of the data team as it was in the past.

The other Copernican revolution is that the new data stack enables data professionals to work as a team, instead of perpetuating the work in isolation, which is common in the legacy data stack. The focus is now on applying software engineering best practices to make data transformation development as reliable as building software. You might have heard about DevOps and DataOps.

With this switch of focus, the term analytics engineering has emerged to identify the central part of the data life cycle going from the access to the raw data up...

DataOps – software engineering best practices for data

The fact is that many data teams were, and still are, not staffed by people with software engineering backgrounds, and for this reason, they have missed the adoption of the modern techniques of software engineering that fall under DevOps.

Living up to the hype, the DevOps movement has brought great improvement to the software development area, helping teams to become more productive and satisfied with their jobs.

In short, the core ideas of DevOps are to provide the team with the tools it needs, as well as the authority and the responsibility for all of the development cycle: from software coding to Quality Assurance (QA), to releasing and then running the production operations.

The cornerstones to achieving this are the use of automation to avoid manually doing repetitive tasks, such as releases and testing, the emphasis on automated testing, the reliance on proactive automated monitoring, and, most importantly...

Summary

In this chapter, you have learned about the full data life cycle, and you became familiar with DataOps and the modern data platform concepts that make it possible to develop data projects with similar way of working and to achieve the same satisfaction level as software projects developed using a DevOps approach.

Well done!

We introduced the figure of the analytics engineer, who takes the central role of building the core of a modern data platform, and we saw the best practices and principles that we can adopt from software engineering to make our work on data projects more reliable and satisfying for us and other stakeholders.

With this chapter, we close the first part of this book, which has introduced you to the key elements of data engineering and will enable you to better understand how we work with dbt.

In the next chapter, Agile Data Engineering with dbt, you will start to learn about the core functionalities of dbt and you will start building the first models...

In this chapter, we have talked about the data life cycle, software engineering principles, DevOps, and DataOps. There are many books on these subjects, but none that we are aware of about the modern data stack, so we present here a few very classical references that are written with software application development in mind, but present topics that are of use in every programming context:

My personal favorite, as it makes clear the benefits of clean code:

Robert C. Martin, aka “Uncle Bob”, Clean Code: A Handbook of Agile Software Craftsmanship, Prentice Hall, 2008, ISBN 978-0132350884

If you like this one, there are a few other books about clean code and architecture written by Uncle Bob. You can also refer to his site: http://cleancoder.com/.

The classical book about keeping your code in good shape, from an author that I deeply respect and that has produced my favorite quote, which could be the title of this chapter: “...

The rest of the chapter is locked

You have been reading a chapter from

Data Engineering with dbt

Published in: Jun 2023Publisher: PacktISBN-13: 9781803246284

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Roberto Zagni

Roberto Zagni is a senior leader with extensive hands-on experience in data architecture, software development and agile methodologies. Roberto is an Electronic Engineer by training with a special interest in bringing software engineering best practices to cloud data platforms and growing great teams that enjoy what they do. He has been helping companies to better use their data, and now to transition to cloud based Data Automation with an agile mindset and proper SW engineering tools and processes, aka DataOps. Roberto also coaches data teams hands-on about practical data architecture and the use of patterns, testing, version control and agile collaboration. Since 2019 his go to tools are dbt, dbt Cloud and Snowflake or BigQuery.
Read more about Roberto Zagni

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages

You're reading from Data Engineering with dbt

Analytics Engineering as the New Core of Data Engineering

Technical requirements

The data life cycle and its evolution

Understanding the modern data stack

The traditional data stack

Defining analytics engineering

DataOps – software engineering best practices for data

Summary

Further reading

Unlock this book and the full library FREE for 7 days

Author (1)

Et al.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Mastering Tableau 2023

Building AI Applications with ChatGPT APIs

Building AI Applications with ChatGPT APIs

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

Modern Data Architecture on AWS

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

TinyML Cookbook