Reader small image

You're reading from  Fundamentals of Analytics Engineering

Product typeBook
Published inMar 2024
PublisherPackt
ISBN-139781837636457
Edition1st Edition
Right arrow
Authors (7):
Dumky De Wilde
Dumky De Wilde
author image
Dumky De Wilde

Dumky is an award-winning analytics engineer with close to 10 years of experience in setting up data pipelines, data models and cloud infrastructure. Dumky has worked with a multitude of clients from government to fintech and retail. His background is in marketing analytics and web tracking implementations, but he has since branched out to include other areas and deliver value from data and analytics across the entire organization.
Read more about Dumky De Wilde

Fanny Kassapian
Fanny Kassapian
author image
Fanny Kassapian

Fanny has a multidisciplinary background across various industries, giving her a unique perspective on analytics workflows, from engineering pipelines to driving value for the business. As a consultant, Fanny helps companies translate opportunities and business needs into technical solutions, implement analytics engineering best practices to streamline their pipelines, and treat data as a product. She is an avid promoter of data democratization, through technology and literacy
Read more about Fanny Kassapian

Jovan Gligorevic
Jovan Gligorevic
author image
Jovan Gligorevic

Jovan, an Analytics Engineer, specializes in data modeling and building analytical dashboards. Passionate about delivering end-to-end analytics solutions and enabling self-service analytics, he has a background in business and data science. With skills ranging from machine learning to dashboarding, Jovan has democratized data across diverse industries. Proficient in various tools and programming languages, he has extensive experience with the modern data stack. Jovan enjoys providing trainings in dbt and Power BI, sharing his knowledge generously
Read more about Jovan Gligorevic

Juan Manuel Perafan
Juan Manuel Perafan
author image
Juan Manuel Perafan

Juan Manuel Perafan 8 years of experience in the realm of analytics (5 years as a consultant). Juan was the first analytics engineer hired by Xebia back in 2020. Making him one of the earliest adopters of this way of working. Besides helping his clients realizing the value of their data, Juan is also very active in the data community. He has spoken at dozens of conferences and meetups around the world (including Coalesce 2023). Additionally, he is the founder of the Analytics Engineering meetup in the Netherlands as well as the Dutch dbt meetup
Read more about Juan Manuel Perafan

Lasse Benninga
Lasse Benninga
author image
Lasse Benninga

Lasse has been working in the dataspace since 2018, starting out as a Data Engineer at a large airline, then switching towards Cloud Engineering for a consultancy and working for different clients in the retailing and healthcare space. Since 2021, he's an Analytics Engineer at Xebia Data, merging software/platform engineering with analytics passion. As a consultant Lasse has seen many different clients, ranging from retail, healthcare, ridesharing industry, and trading companies. He has implemented multiple data platforms and worked in all three major clouds, leveraging his knowledge of data and analytics to provide value
Read more about Lasse Benninga

Ricardo Angel Granados Lopez
Ricardo Angel Granados Lopez
author image
Ricardo Angel Granados Lopez

Ricardo, an Analytics Engineer with a strong background in data engineering and analysis, is a quick learner and tech enthusiast. With a Master's in IT Management specializing in Data Science, he excels in using various programming languages and tools to deliver valuable insights. Ricardo, experienced in diverse industries like energy, transport, and fintech, is adept at finding alternative solutions for optimal results. As an Analytics Engineer, he focuses on driving value from data through efficient data modeling, using best practices, automating tasks and improving data quality
Read more about Ricardo Angel Granados Lopez

Taís Laurindo Pereira
Taís Laurindo Pereira
author image
Taís Laurindo Pereira

Taís is a versatile data professional with experience in a diverse range of organizations - from big corporations to scale-ups. Before her move to Xebia, she had the chance to develop distinct data products, such as dashboards and machine learning implementations. Currently, she has been focusing on end-to-end analytics as an Analytics Engineer. With a mixed background in engineering and business, her mission is to contribute to data democratization in organizations, by helping them to overcome challenges when working with data at scale
Read more about Taís Laurindo Pereira

View More author details
Right arrow

Writing Code in a Team

Writing code individually and writing code in a team can be very different experiences. As your team grows, new challenges will arise from working together on the same code base. The most apparent challenges are about code quality, ownership, and accountability. Who works on what? How can you keep consistent coding styles and quality? How can you review code? How can you easily onboard new colleagues?

The clear separation of duties, version control, tracking issues and tasks, agreeing on and enforcing a coding standard, CI/CD, code reviews, and documentation are all ways to streamline maintaining a code base within a team.

Version control and CI/CD are tools and techniques that can help data professionals work more efficiently, collaborate more effectively, and maintain the integrity and reliability of their code and data assets over time. As analytics engineers, we should adopt these principles to make collaboration on the same code base more efficient...

Identifying the responsibilities of team members

Working in a data team requires having a clear separation of duties to streamline development work. This is like a musical orchestra, where each member—even within a wind, string, bass, or percussion section—can have a slightly different role to play. Similarly, in a data team, there will be members with different sets of skills and overlapping skills.

As analytics engineers, we might have to collaborate with both data engineers and data analysts or the consumers of data, either within the same team or across departments. It is crucial to define team responsibilities to prevent conflicts over tasks and ensure clear accountability.

The responsibility boundaries help create focus on a specific domain of work, which allows for the following:

  • Mitigating the risks of overlapping work and redundant efforts, optimizing the team’s productivity
  • Fostering a sense of accountability and ownership among team...

Tracking tasks and issues

If the roles have been defined and assigned within a team, using issue and task-tracking tools with an Agile methodology, such as Scrum, can be an effective way to increase productivity within a team. We are not going to talk about Agile methodologies, but we encourage reading up on this topic, as doing task tracking and working in an agile way are quite related.

With the help of task and issue tracking tools, developers and project managers can collaborate more effectively, identify and address issues early on, and ultimately deliver high-quality data products that meet the needs of their users.

This sounds evident; however, in our experience as consultants, we have noticed that teams across different industries have implemented this practice to varying degrees. We have seen it all, from teams who do not do any issue and task tracking to teams who use the latest tools but have no coherent strategy for using these tools, and, thus, the tool quickly becomes...

Managing versions with version control

In analytics engineering, we take best practices from software engineering. You might be familiar with version control already. You have probably heard of Git, GitHub, Bitbucket, and similar tools or have been using such tools for a while now. They are absolutely essential to developing code in a team, and we would also encourage version control in cases where you are the only person working on a code base.

There are several reasons why version control is important:

  • Version control helps developers keep track of changes made to code over time, enabling them to revert to earlier versions if necessary
  • It allows multiple developers to work on the same code base simultaneously without interfering with each other’s work
  • Collaboration and code review are easier; it is easier to catch errors and improve code quality
  • By providing a history of changes, version control can help with debugging and troubleshooting
  • It aids...

Working with coding standards

Coding standards are a set of guidelines that help ensure that code is readable, maintainable, and consistent across a team or organization. By adhering to coding standards, analytics engineers can create more efficient, scalable, and reliable code, ultimately leading to a better understanding of the code and faster development times.

Note

This chapter features code snippets designed to showcase best practices. However, executing these snippets may need extra setup, not covered in this book. Readers should view these snippets as illustrative examples and adapt the underlying best practices to their unique scenarios.

Have a look at the following Python function definition:

def add_numbers(a,b) :
c = a+b;return c

Here is the same code but in a different formatting:

def add_numbers(a, b):
    c = a + b
    return c

What do you notice about the format of the code and how it is written? You probably...

Reviewing code

Code reviews provide a critical safeguard to ensure code quality. They are essential for data teams to collaborate effectively and ensure high-quality coding practices. They provide a framework for team members to evaluate code together, identify potential issues, and share knowledge.

By fostering a culture of learning, code reviews enable individuals to learn from one another, ultimately improving the collective expertise of the team. Additionally, they reinforce accountability by ensuring compliance with coding standards, data privacy protocols, and best practices. Overall, code reviews are crucial for creating robust, reliable, and high-performing data solutions.

Pull requests – The four eyes principle

A pull request (PR) or merge request is a feature in Git that facilitates collaborative development and code review. The purpose of a PR is to propose changes made in one branch (typically, a feature branch) to be merged into another branch (often, the...

Continuous integration/continuous deployment

Perhaps you have come across this term before. Continuous integration and continuous deployment, also known as CI/CD, is a practice used in software development to enhance the speed and quality of the software delivery process. It is an automated approach that involves the CI of code changes and testing to ensure the early detection of issues and faster delivery of software updates.

The benefits of CI/CD are multifold. It helps to reduce the likelihood of introducing new bugs into the code base, as well as ensuring that the code changes are always in a releasable state. This means that code updates can be deployed more frequently, which, in turn, leads to faster feedback loops, and ultimately, better customer satisfaction.

From an analytics engineering perspective, an example of CI/CD in action is the development of models in dbt. In this scenario, as soon as a developer pushes a code change to a shared repository, a series of automated...

Documenting code

Documentation is the holy grail when maintaining a code base. This should sound familiar; how often have you found yourself reviewing or having to work with a new code base, and there is little to no documentation on how things work? It is every engineer’s nightmare in this field of work. When we talk about documentation, we not only talk about code comments or technical documentation but also conceptual documentation that describes what certain code does on a higher level.

In fact, as mentioned in the book Docs for Developers: An Engineer’s Field Guide to Technical Writing (https://docsfordevelopers.com/), we can distinguish between several types of documentation:

  • Code comments
  • READMEs
  • Getting started
  • Conceptual documentation

Let’s talk about each of them in detail.

Documenting code in dbt

As an analytics engineer, we recommend several handy tools for documenting your code. Tools such as dbt allow you to place...

Working with containers

In the rapidly evolving field of analytics engineering, the ability to maintain consistency, efficiency, and collaboration across development teams is very important. A key technology enabling this is containerization, exemplified by tools such as Docker and the use of development containers or devcontainers in Visual Studio Code (VS Code). Containers are lightweight, standalone packages that contain everything needed to run a piece of software, including the code, runtime, system tools, libraries, and settings. Docker, a popular containerization platform, allows developers to package applications into containers, ensuring consistency across environments.

We will not explain Docker in detail here; instead, we will explain (on a higher level) how this technology allows us to have the same development environment for each developer. Docker uses a Dockerfile that contains the instructions to build images, which are used to execute code in a Docker container...

Summary

Working on code together in a team can be challenging; you have seen the complexity of topics such as project organization, task tracking, version control, code formatting, and review and documentation. Your code base is like a living organism, and therefore, maintaining the health of your code base to ensure continuity and consistency in quality can be challenging.

In this chapter, we discussed team member responsibilities, issue and task tracking tools, version control, coding standards, code review, and documentation. You now have a better understanding of the importance of these concepts to ensure proper code quality and make collaborating with team members a lot easier and more error-prone. You also got to know certain tools that can help us to better manage team resources, time, code quality, and the easy onboarding of new users.

In the next chapter, we will have a more detailed look into continuous integration/continuous deployment and delivery, which helps to...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Fundamentals of Analytics Engineering
Published in: Mar 2024Publisher: PacktISBN-13: 9781837636457
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (7)

author image
Dumky De Wilde

Dumky is an award-winning analytics engineer with close to 10 years of experience in setting up data pipelines, data models and cloud infrastructure. Dumky has worked with a multitude of clients from government to fintech and retail. His background is in marketing analytics and web tracking implementations, but he has since branched out to include other areas and deliver value from data and analytics across the entire organization.
Read more about Dumky De Wilde

author image
Fanny Kassapian

Fanny has a multidisciplinary background across various industries, giving her a unique perspective on analytics workflows, from engineering pipelines to driving value for the business. As a consultant, Fanny helps companies translate opportunities and business needs into technical solutions, implement analytics engineering best practices to streamline their pipelines, and treat data as a product. She is an avid promoter of data democratization, through technology and literacy
Read more about Fanny Kassapian

author image
Jovan Gligorevic

Jovan, an Analytics Engineer, specializes in data modeling and building analytical dashboards. Passionate about delivering end-to-end analytics solutions and enabling self-service analytics, he has a background in business and data science. With skills ranging from machine learning to dashboarding, Jovan has democratized data across diverse industries. Proficient in various tools and programming languages, he has extensive experience with the modern data stack. Jovan enjoys providing trainings in dbt and Power BI, sharing his knowledge generously
Read more about Jovan Gligorevic

author image
Juan Manuel Perafan

Juan Manuel Perafan 8 years of experience in the realm of analytics (5 years as a consultant). Juan was the first analytics engineer hired by Xebia back in 2020. Making him one of the earliest adopters of this way of working. Besides helping his clients realizing the value of their data, Juan is also very active in the data community. He has spoken at dozens of conferences and meetups around the world (including Coalesce 2023). Additionally, he is the founder of the Analytics Engineering meetup in the Netherlands as well as the Dutch dbt meetup
Read more about Juan Manuel Perafan

author image
Lasse Benninga

Lasse has been working in the dataspace since 2018, starting out as a Data Engineer at a large airline, then switching towards Cloud Engineering for a consultancy and working for different clients in the retailing and healthcare space. Since 2021, he's an Analytics Engineer at Xebia Data, merging software/platform engineering with analytics passion. As a consultant Lasse has seen many different clients, ranging from retail, healthcare, ridesharing industry, and trading companies. He has implemented multiple data platforms and worked in all three major clouds, leveraging his knowledge of data and analytics to provide value
Read more about Lasse Benninga

author image
Ricardo Angel Granados Lopez

Ricardo, an Analytics Engineer with a strong background in data engineering and analysis, is a quick learner and tech enthusiast. With a Master's in IT Management specializing in Data Science, he excels in using various programming languages and tools to deliver valuable insights. Ricardo, experienced in diverse industries like energy, transport, and fintech, is adept at finding alternative solutions for optimal results. As an Analytics Engineer, he focuses on driving value from data through efficient data modeling, using best practices, automating tasks and improving data quality
Read more about Ricardo Angel Granados Lopez

author image
Taís Laurindo Pereira

Taís is a versatile data professional with experience in a diverse range of organizations - from big corporations to scale-ups. Before her move to Xebia, she had the chance to develop distinct data products, such as dashboards and machine learning implementations. Currently, she has been focusing on end-to-end analytics as an Analytics Engineer. With a mixed background in engineering and business, her mission is to contribute to data democratization in organizations, by helping them to overcome challenges when working with data at scale
Read more about Taís Laurindo Pereira