You're reading from Fundamentals of Analytics Engineering

Product typeBook

Published inMar 2024

PublisherPackt

ISBN-139781837636457

Edition1st Edition

Concepts

Data Analysis

Authors (7):

Dumky De Wilde

Fanny Kassapian

Jovan Gligorevic

Juan Manuel Perafan

Lasse Benninga

Ricardo Angel Granados Lopez

Taís Laurindo Pereira

View More author details

Writing Code in a Team

Writing code individually and writing code in a team can be very different experiences. As your team grows, new challenges will arise from working together on the same code base. The most apparent challenges are about code quality, ownership, and accountability. Who works on what? How can you keep consistent coding styles and quality? How can you review code? How can you easily onboard new colleagues?

The clear separation of duties, version control, tracking issues and tasks, agreeing on and enforcing a coding standard, CI/CD, code reviews, and documentation are all ways to streamline maintaining a code base within a team.

Version control and CI/CD are tools and techniques that can help data professionals work more efficiently, collaborate more effectively, and maintain the integrity and reliability of their code and data assets over time. As analytics engineers, we should adopt these principles to make collaboration on the same code base more efficient...

Identifying the responsibilities of team members

Working in a data team requires having a clear separation of duties to streamline development work. This is like a musical orchestra, where each member—even within a wind, string, bass, or percussion section—can have a slightly different role to play. Similarly, in a data team, there will be members with different sets of skills and overlapping skills.

As analytics engineers, we might have to collaborate with both data engineers and data analysts or the consumers of data, either within the same team or across departments. It is crucial to define team responsibilities to prevent conflicts over tasks and ensure clear accountability.

The responsibility boundaries help create focus on a specific domain of work, which allows for the following:

Mitigating the risks of overlapping work and redundant efforts, optimizing the team’s productivity
Fostering a sense of accountability and ownership among team...

Tracking tasks and issues

If the roles have been defined and assigned within a team, using issue and task-tracking tools with an Agile methodology, such as Scrum, can be an effective way to increase productivity within a team. We are not going to talk about Agile methodologies, but we encourage reading up on this topic, as doing task tracking and working in an agile way are quite related.

With the help of task and issue tracking tools, developers and project managers can collaborate more effectively, identify and address issues early on, and ultimately deliver high-quality data products that meet the needs of their users.

This sounds evident; however, in our experience as consultants, we have noticed that teams across different industries have implemented this practice to varying degrees. We have seen it all, from teams who do not do any issue and task tracking to teams who use the latest tools but have no coherent strategy for using these tools, and, thus, the tool quickly becomes...

Managing versions with version control

In analytics engineering, we take best practices from software engineering. You might be familiar with version control already. You have probably heard of Git, GitHub, Bitbucket, and similar tools or have been using such tools for a while now. They are absolutely essential to developing code in a team, and we would also encourage version control in cases where you are the only person working on a code base.

There are several reasons why version control is important:

Version control helps developers keep track of changes made to code over time, enabling them to revert to earlier versions if necessary
It allows multiple developers to work on the same code base simultaneously without interfering with each other’s work
Collaboration and code review are easier; it is easier to catch errors and improve code quality
By providing a history of changes, version control can help with debugging and troubleshooting
It aids...

Working with coding standards

Coding standards are a set of guidelines that help ensure that code is readable, maintainable, and consistent across a team or organization. By adhering to coding standards, analytics engineers can create more efficient, scalable, and reliable code, ultimately leading to a better understanding of the code and faster development times.

Note

This chapter features code snippets designed to showcase best practices. However, executing these snippets may need extra setup, not covered in this book. Readers should view these snippets as illustrative examples and adapt the underlying best practices to their unique scenarios.

Have a look at the following Python function definition:

def add_numbers(a,b) :
c = a+b;return c

Here is the same code but in a different formatting:

def add_numbers(a, b):
    c = a + b
    return c

What do you notice about the format of the code and how it is written? You probably...

Reviewing code

Code reviews provide a critical safeguard to ensure code quality. They are essential for data teams to collaborate effectively and ensure high-quality coding practices. They provide a framework for team members to evaluate code together, identify potential issues, and share knowledge.

By fostering a culture of learning, code reviews enable individuals to learn from one another, ultimately improving the collective expertise of the team. Additionally, they reinforce accountability by ensuring compliance with coding standards, data privacy protocols, and best practices. Overall, code reviews are crucial for creating robust, reliable, and high-performing data solutions.

Pull requests – The four eyes principle

A pull request (PR) or merge request is a feature in Git that facilitates collaborative development and code review. The purpose of a PR is to propose changes made in one branch (typically, a feature branch) to be merged into another branch (often, the...

Continuous integration/continuous deployment

Perhaps you have come across this term before. Continuous integration and continuous deployment, also known as CI/CD, is a practice used in software development to enhance the speed and quality of the software delivery process. It is an automated approach that involves the CI of code changes and testing to ensure the early detection of issues and faster delivery of software updates.

The benefits of CI/CD are multifold. It helps to reduce the likelihood of introducing new bugs into the code base, as well as ensuring that the code changes are always in a releasable state. This means that code updates can be deployed more frequently, which, in turn, leads to faster feedback loops, and ultimately, better customer satisfaction.

From an analytics engineering perspective, an example of CI/CD in action is the development of models in dbt. In this scenario, as soon as a developer pushes a code change to a shared repository, a series of automated...

Documenting code

Documentation is the holy grail when maintaining a code base. This should sound familiar; how often have you found yourself reviewing or having to work with a new code base, and there is little to no documentation on how things work? It is every engineer’s nightmare in this field of work. When we talk about documentation, we not only talk about code comments or technical documentation but also conceptual documentation that describes what certain code does on a higher level.

In fact, as mentioned in the book Docs for Developers: An Engineer’s Field Guide to Technical Writing (https://docsfordevelopers.com/), we can distinguish between several types of documentation:

Code comments
READMEs
Getting started
Conceptual documentation

Let’s talk about each of them in detail.

Documenting code in dbt

As an analytics engineer, we recommend several handy tools for documenting your code. Tools such as dbt allow you to place...

Working with containers

In the rapidly evolving field of analytics engineering, the ability to maintain consistency, efficiency, and collaboration across development teams is very important. A key technology enabling this is containerization, exemplified by tools such as Docker and the use of development containers or devcontainers in Visual Studio Code (VS Code). Containers are lightweight, standalone packages that contain everything needed to run a piece of software, including the code, runtime, system tools, libraries, and settings. Docker, a popular containerization platform, allows developers to package applications into containers, ensuring consistency across environments.

We will not explain Docker in detail here; instead, we will explain (on a higher level) how this technology allows us to have the same development environment for each developer. Docker uses a Dockerfile that contains the instructions to build images, which are used to execute code in a Docker container...

Summary

Working on code together in a team can be challenging; you have seen the complexity of topics such as project organization, task tracking, version control, code formatting, and review and documentation. Your code base is like a living organism, and therefore, maintaining the health of your code base to ensure continuity and consistency in quality can be challenging.

In this chapter, we discussed team member responsibilities, issue and task tracking tools, version control, coding standards, code review, and documentation. You now have a better understanding of the importance of these concepts to ensure proper code quality and make collaborating with team members a lot easier and more error-prone. You also got to know certain tools that can help us to better manage team resources, time, code quality, and the easy onboarding of new users.

In the next chapter, we will have a more detailed look into continuous integration/continuous deployment and delivery, which helps to...

The rest of the chapter is locked

You have been reading a chapter from

Fundamentals of Analytics Engineering

Published in: Mar 2024Publisher: PacktISBN-13: 9781837636457

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (7)

Dumky De Wilde

Dumky is an award-winning analytics engineer with close to 10 years of experience in setting up data pipelines, data models and cloud infrastructure. Dumky has worked with a multitude of clients from government to fintech and retail. His background is in marketing analytics and web tracking implementations, but he has since branched out to include other areas and deliver value from data and analytics across the entire organization.
Read more about Dumky De Wilde

Fanny Kassapian

Fanny has a multidisciplinary background across various industries, giving her a unique perspective on analytics workflows, from engineering pipelines to driving value for the business. As a consultant, Fanny helps companies translate opportunities and business needs into technical solutions, implement analytics engineering best practices to streamline their pipelines, and treat data as a product. She is an avid promoter of data democratization, through technology and literacy
Read more about Fanny Kassapian

Jovan Gligorevic

Jovan, an Analytics Engineer, specializes in data modeling and building analytical dashboards. Passionate about delivering end-to-end analytics solutions and enabling self-service analytics, he has a background in business and data science. With skills ranging from machine learning to dashboarding, Jovan has democratized data across diverse industries. Proficient in various tools and programming languages, he has extensive experience with the modern data stack. Jovan enjoys providing trainings in dbt and Power BI, sharing his knowledge generously
Read more about Jovan Gligorevic

Juan Manuel Perafan

Juan Manuel Perafan 8 years of experience in the realm of analytics (5 years as a consultant). Juan was the first analytics engineer hired by Xebia back in 2020. Making him one of the earliest adopters of this way of working. Besides helping his clients realizing the value of their data, Juan is also very active in the data community. He has spoken at dozens of conferences and meetups around the world (including Coalesce 2023). Additionally, he is the founder of the Analytics Engineering meetup in the Netherlands as well as the Dutch dbt meetup
Read more about Juan Manuel Perafan

Lasse Benninga

Lasse has been working in the dataspace since 2018, starting out as a Data Engineer at a large airline, then switching towards Cloud Engineering for a consultancy and working for different clients in the retailing and healthcare space. Since 2021, he's an Analytics Engineer at Xebia Data, merging software/platform engineering with analytics passion. As a consultant Lasse has seen many different clients, ranging from retail, healthcare, ridesharing industry, and trading companies. He has implemented multiple data platforms and worked in all three major clouds, leveraging his knowledge of data and analytics to provide value
Read more about Lasse Benninga

Ricardo Angel Granados Lopez

Ricardo, an Analytics Engineer with a strong background in data engineering and analysis, is a quick learner and tech enthusiast. With a Master's in IT Management specializing in Data Science, he excels in using various programming languages and tools to deliver valuable insights. Ricardo, experienced in diverse industries like energy, transport, and fintech, is adept at finding alternative solutions for optimal results. As an Analytics Engineer, he focuses on driving value from data through efficient data modeling, using best practices, automating tasks and improving data quality
Read more about Ricardo Angel Granados Lopez

Taís Laurindo Pereira

Taís is a versatile data professional with experience in a diverse range of organizations - from big corporations to scale-ups. Before her move to Xebia, she had the chance to develop distinct data products, such as dashboards and machine learning implementations. Currently, she has been focusing on end-to-end analytics as an Analytics Engineer. With a mixed background in engineering and business, her mission is to contribute to data democratization in organizations, by helping them to overcome challenges when working with data at scale
Read more about Taís Laurindo Pereira

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages