Reader small image

You're reading from  Fundamentals of Analytics Engineering

Product typeBook
Published inMar 2024
PublisherPackt
ISBN-139781837636457
Edition1st Edition
Right arrow
Authors (7):
Dumky De Wilde
Dumky De Wilde
author image
Dumky De Wilde

Dumky is an award-winning analytics engineer with close to 10 years of experience in setting up data pipelines, data models and cloud infrastructure. Dumky has worked with a multitude of clients from government to fintech and retail. His background is in marketing analytics and web tracking implementations, but he has since branched out to include other areas and deliver value from data and analytics across the entire organization.
Read more about Dumky De Wilde

Fanny Kassapian
Fanny Kassapian
author image
Fanny Kassapian

Fanny has a multidisciplinary background across various industries, giving her a unique perspective on analytics workflows, from engineering pipelines to driving value for the business. As a consultant, Fanny helps companies translate opportunities and business needs into technical solutions, implement analytics engineering best practices to streamline their pipelines, and treat data as a product. She is an avid promoter of data democratization, through technology and literacy
Read more about Fanny Kassapian

Jovan Gligorevic
Jovan Gligorevic
author image
Jovan Gligorevic

Jovan, an Analytics Engineer, specializes in data modeling and building analytical dashboards. Passionate about delivering end-to-end analytics solutions and enabling self-service analytics, he has a background in business and data science. With skills ranging from machine learning to dashboarding, Jovan has democratized data across diverse industries. Proficient in various tools and programming languages, he has extensive experience with the modern data stack. Jovan enjoys providing trainings in dbt and Power BI, sharing his knowledge generously
Read more about Jovan Gligorevic

Juan Manuel Perafan
Juan Manuel Perafan
author image
Juan Manuel Perafan

Juan Manuel Perafan 8 years of experience in the realm of analytics (5 years as a consultant). Juan was the first analytics engineer hired by Xebia back in 2020. Making him one of the earliest adopters of this way of working. Besides helping his clients realizing the value of their data, Juan is also very active in the data community. He has spoken at dozens of conferences and meetups around the world (including Coalesce 2023). Additionally, he is the founder of the Analytics Engineering meetup in the Netherlands as well as the Dutch dbt meetup
Read more about Juan Manuel Perafan

Lasse Benninga
Lasse Benninga
author image
Lasse Benninga

Lasse has been working in the dataspace since 2018, starting out as a Data Engineer at a large airline, then switching towards Cloud Engineering for a consultancy and working for different clients in the retailing and healthcare space. Since 2021, he's an Analytics Engineer at Xebia Data, merging software/platform engineering with analytics passion. As a consultant Lasse has seen many different clients, ranging from retail, healthcare, ridesharing industry, and trading companies. He has implemented multiple data platforms and worked in all three major clouds, leveraging his knowledge of data and analytics to provide value
Read more about Lasse Benninga

Ricardo Angel Granados Lopez
Ricardo Angel Granados Lopez
author image
Ricardo Angel Granados Lopez

Ricardo, an Analytics Engineer with a strong background in data engineering and analysis, is a quick learner and tech enthusiast. With a Master's in IT Management specializing in Data Science, he excels in using various programming languages and tools to deliver valuable insights. Ricardo, experienced in diverse industries like energy, transport, and fintech, is adept at finding alternative solutions for optimal results. As an Analytics Engineer, he focuses on driving value from data through efficient data modeling, using best practices, automating tasks and improving data quality
Read more about Ricardo Angel Granados Lopez

Taís Laurindo Pereira
Taís Laurindo Pereira
author image
Taís Laurindo Pereira

Taís is a versatile data professional with experience in a diverse range of organizations - from big corporations to scale-ups. Before her move to Xebia, she had the chance to develop distinct data products, such as dashboards and machine learning implementations. Currently, she has been focusing on end-to-end analytics as an Analytics Engineer. With a mixed background in engineering and business, her mission is to contribute to data democratization in organizations, by helping them to overcome challenges when working with data at scale
Read more about Taís Laurindo Pereira

View More author details
Right arrow

Preface

This book was written by a team of seven analytics engineers from Xebia, an international consultancy company in the Netherlands. As Xebians, knowledge sharing is a key part of our culture. In addition to helping our clients, we organize biweekly knowledge exchange sessions and monthly innovation days. As a result, our colleagues are always actively contributing to open-source software, organizing meetups, speaking at data conferences, and writing books.

While we are colleagues at the same company, each of us brings unique expertise and skills to the table. Analytics engineering is a diverse field, and mastering every aspect is not expected. In this book, each author contributed two or three chapters based on their interests and expertise. The same applies to you as a reader. Depending on your organization’s needs, certain chapters may hold greater relevance. Our goal is to provide a comprehensive overview of essential concepts, tailored to the varied demands of this profession.

Since its inception in 2019, analytics engineering has experienced remarkable growth. As its importance continues to be recognized by organizations worldwide, we firmly believe that analytics engineers will become indispensable members of every data team. Through this book, you have the opportunity to become a frontrunner in this evolving field, contributing to its advancement and shaping its future alongside us.

Who this book is for

This book is for data engineers and analysts considering transitioning to analytics engineering, as well as for analytics engineers looking to improve their skills and identify areas for growth.

While introductory in nature, you will benefit from having prior knowledge in key areas such as data analysis processes, including data ingestion, transformation, and visualization, along with database fundamentals such as data modeling, querying, database management, and architecture.

What this book covers

Chapter 1, What Is Analytics Engineering?, traces the history of analytics engineering and its surge in popularity. Dive into its why and what to understand role responsibilities thoroughly.

Chapter 2, The Modern Data Stack, explores the modern data stack, demystifying SQL and cloud impact. Witness the industry’s shift to purpose-built tools for contemporary data management.

Chapter 3, Data Ingestion, explores fundamental techniques, common issues, and strategies for moving data between systems. We break down data ingestion into eight steps and elaborate on common considerations for data quality and scalability.

Chapter 4, Data Warehousing, delves into the core concepts and history of data warehouses. You will gain insights into the evolution of data storage solutions and the impact of cloud technologies on this space.

Chapter 5, Data Modeling, details the proactive design process of establishing relationships between data within information systems. This critical process can reduce costs, enhance computational speed, and elevate user experience.

Chapter 6, Transforming Data, unravels the shift from ETL to ELT pipeline paradigms, the importance of data cleaning and transformation, reusability in query processes, and optimizing SQL queries for modularity.

Chapter 7, Serving Data, discusses presenting data to end users. You will gain insights into exposing data through different means, understanding data as a product, and examining the motivations and challenges associated with achieving self-service analytics in companies.

Chapter 8, Hands-On Analytics Engineering, also describes tools such as Airbyte Cloud for managed ingestion, Google BigQuery for warehousing, dbt Cloud for transformations, and Tableau for visualization.

Chapter 9, Data Quality and Observability, helps you ensure data quality and establish observability in analytics processes. Delving into strategies and tools, this chapter equips you with the skills to maintain data integrity and transparency.

Chapter 10, Writing Code in a Team, focuses on collaborative coding practices within a team setting. With an emphasis on best practices, version control, and effective communication, this chapter focuses on teamwork and efficiency in analytics engineering projects.

Chapter 11, Automating Workflows, concludes the DataOps section by exploring the implementation of continuous workflows. You will be introduced to practices that streamline analytics workflows, optimizing efficiency and productivity.

Chapter 12, Driving Business Adoption, highlights the critical process of collecting and interpreting business requirements. You will be guided through the steps to understand and align analytics initiatives with the unique needs and objectives of the business.

Chapter 13, Data Governance, delves into the principles and practices of data governance. You will gain insights into establishing robust data governance frameworks, ensuring the reliability of organizational data, and aligning analytics strategies with overarching business goals.

Chapter 14, Epilogue, summarizes the learning from this book and gives you extra tips to take your analytics engineering career even further.

To get the most out of this book

Familiarity with Python and SQL will enhance comprehension of Chapters 8, 10, and 11. Additionally, basic knowledge of command-line usage (with tools such as Git and dbt) and familiarity with cloud computing concepts will be beneficial, as this book serves as an introduction to these topics.

The following software/hardware are covered in the book:

Software/Hardware are covered in the book

Operating system requirement

Airbyte Cloud

WindowsOS, macOS, or Linux

dbt Cloud

Google Cloud, Google BigQuery, Google Sheets

Tableau Desktop

Git and GitHub

Some of these tools have paid versions. To follow the instructions in this book, you can use the free versions or make use of a free trial.

If you are using the digital version of this book, we advise you to type the code yourself or access the code from the book’s GitHub repository (a link is available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.

Several chapters in the book feature code snippets designed to showcase best practices. However, executing these snippets may need extra setup, not covered in this book. You should view these snippets as illustrative examples and adapt the underlying best practices to your unique scenarios.

Download the example code files

You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/Fundamentals-of-Analytics-Engineering. If there’s an update to the code, it will be updated in the GitHub repository.

There are several hands-on guides mentioned in the book, they can be found at https://github.com/PacktPublishing/Fundamentals-of-Analytics-Engineering/tree/main/chapter_8/guides.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Conventions used

There are a number of text conventions used throughout this book.

Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: “Notice how the next CTE, called employees, selects from the raw_source CTE.”

A block of code is set as follows:

def add_numbers(a, b):
    c = a + b
    return c

Any command-line input or output is written as follows:

on-run-end: "{{ dbt_project_evaluator.print_dbt_project_evaluator_issues() }}"

Bold: Indicates a new term, an important word, or words that you see onscreen. For instance, words in menus or dialog boxes appear in bold. Here is an example: “A common application for the ETL process is when organizations have strict requirements regarding Personal Identifiable Information (PII).”

Tips or important notes

Appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, email us at customercare@packtpub.com and mention the book title in the subject of your message.

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata and fill in the form.

Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at copyright@packtpub.com with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Share Your Thoughts

Once you’ve read Fundamentals of Analytics Engineering, we’d love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.

Your review is important to us and the tech community and will help us make sure we’re delivering excellent quality content.

Download a free PDF copy of this book

Thanks for purchasing this book!

Do you like to read on the go but are unable to carry your print books everywhere?

Is your eBook purchase not compatible with the device of your choice?

Don’t worry, now with every Packt book you get a DRM-free PDF version of that book at no cost.

Read anywhere, any place, on any device. Search, copy, and paste code from your favorite technical books directly into your application.

The perks don’t stop there, you can get exclusive access to discounts, newsletters, and great free content in your inbox daily

Follow these simple steps to get the benefits:

  1. Scan the QR code or visit the link below

https://packt.link/free-ebook/9781837636457

  1. Submit your proof of purchase
  2. That’s it! We’ll send your free PDF and other benefits to your email directly
lock icon
The rest of the chapter is locked
You have been reading a chapter from
Fundamentals of Analytics Engineering
Published in: Mar 2024Publisher: PacktISBN-13: 9781837636457
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (7)

author image
Dumky De Wilde

Dumky is an award-winning analytics engineer with close to 10 years of experience in setting up data pipelines, data models and cloud infrastructure. Dumky has worked with a multitude of clients from government to fintech and retail. His background is in marketing analytics and web tracking implementations, but he has since branched out to include other areas and deliver value from data and analytics across the entire organization.
Read more about Dumky De Wilde

author image
Fanny Kassapian

Fanny has a multidisciplinary background across various industries, giving her a unique perspective on analytics workflows, from engineering pipelines to driving value for the business. As a consultant, Fanny helps companies translate opportunities and business needs into technical solutions, implement analytics engineering best practices to streamline their pipelines, and treat data as a product. She is an avid promoter of data democratization, through technology and literacy
Read more about Fanny Kassapian

author image
Jovan Gligorevic

Jovan, an Analytics Engineer, specializes in data modeling and building analytical dashboards. Passionate about delivering end-to-end analytics solutions and enabling self-service analytics, he has a background in business and data science. With skills ranging from machine learning to dashboarding, Jovan has democratized data across diverse industries. Proficient in various tools and programming languages, he has extensive experience with the modern data stack. Jovan enjoys providing trainings in dbt and Power BI, sharing his knowledge generously
Read more about Jovan Gligorevic

author image
Juan Manuel Perafan

Juan Manuel Perafan 8 years of experience in the realm of analytics (5 years as a consultant). Juan was the first analytics engineer hired by Xebia back in 2020. Making him one of the earliest adopters of this way of working. Besides helping his clients realizing the value of their data, Juan is also very active in the data community. He has spoken at dozens of conferences and meetups around the world (including Coalesce 2023). Additionally, he is the founder of the Analytics Engineering meetup in the Netherlands as well as the Dutch dbt meetup
Read more about Juan Manuel Perafan

author image
Lasse Benninga

Lasse has been working in the dataspace since 2018, starting out as a Data Engineer at a large airline, then switching towards Cloud Engineering for a consultancy and working for different clients in the retailing and healthcare space. Since 2021, he's an Analytics Engineer at Xebia Data, merging software/platform engineering with analytics passion. As a consultant Lasse has seen many different clients, ranging from retail, healthcare, ridesharing industry, and trading companies. He has implemented multiple data platforms and worked in all three major clouds, leveraging his knowledge of data and analytics to provide value
Read more about Lasse Benninga

author image
Ricardo Angel Granados Lopez

Ricardo, an Analytics Engineer with a strong background in data engineering and analysis, is a quick learner and tech enthusiast. With a Master's in IT Management specializing in Data Science, he excels in using various programming languages and tools to deliver valuable insights. Ricardo, experienced in diverse industries like energy, transport, and fintech, is adept at finding alternative solutions for optimal results. As an Analytics Engineer, he focuses on driving value from data through efficient data modeling, using best practices, automating tasks and improving data quality
Read more about Ricardo Angel Granados Lopez

author image
Taís Laurindo Pereira

Taís is a versatile data professional with experience in a diverse range of organizations - from big corporations to scale-ups. Before her move to Xebia, she had the chance to develop distinct data products, such as dashboards and machine learning implementations. Currently, she has been focusing on end-to-end analytics as an Analytics Engineer. With a mixed background in engineering and business, her mission is to contribute to data democratization in organizations, by helping them to overcome challenges when working with data at scale
Read more about Taís Laurindo Pereira