Beyond Code Debugging
Artificial intelligence (AI), like human intelligence, is a capability and tool that can be used for decision-making and task accomplishment. As humans, we use our intelligence in making our daily decisions and thinking about the challenges and problems we deal with. We use our brains and central nervous systems to receive information from our surroundings and process them for decision-making and reactions.
Machine learning models are the AI techniques that are used nowadays to tackle problems across healthcare and finance. Machine learning models have been used in robotic systems in manufacturing facilities to package products or identify products that might have been damaged. They have been used in our smartphones to identify our faces for security purposes, by e-commerce companies to suggest the most suited products or movies to us, and even for improving healthcare and drug development to bring new more effective drugs onto the market for severe diseases.
In this chapter, we will provide a quick review of different types of machine learning modeling. You will learn about different techniques and challenges in debugging your machine learning code. We will also discuss why debugging machine learning modeling goes far beyond just code debugging.
We will cover the following topics in this chapter:
- Machine learning at a glance
- Types of machine learning modeling
- Debugging in software development
- Flaws in data used for modeling
- Model and prediction-centric debugging
This chapter is an introduction to this book to prepare you for more advanced concepts that will be presented later. This will help you improve your models and move toward becoming an expert in the machine learning era.
Technical requirements
You can find the code files for this chapter on GitHub at https://github.com/PacktPublishing/Debugging-Machine-Learning-Models-with-Python/tree/main/Chapter01.
Machine learning at a glance
You need three fundamental elements to build a machine learning model: an algorithm, data, and computing power (Figure 1.1). A machine learning algorithm needs to be fed with the right data and trained using the necessary computing power. It can then be used to predict what it has been trained on for unseen data:

Figure 1.1 – The three elements in the machine learning triangle
Machine learning applications can be generally categorized as automation and discovery. In the automation category, the goal of the machine learning model and the software and hardware systems built around it is to do the tasks that are possible and usually easy but tedious, repetitive, boring, or dangerous for human beings. Some examples of this include recognizing damaged products in manufacturing lines or recognizing employees’ faces at entrances in high-security facilities. Sometimes, it is not possible to use human beings for some of these tasks, although the task would be easy. For example, for face recognition on your phone, if your phone was stolen, you would not be there to recognize that the person who is trying to log into your phone is not you and your phone should be able to do it by itself. But we cannot come up with a generalizable mathematical formulation for these tasks to tell the machine what to do in each situation. So, the machine learning model learns how to come up with its prediction, for example, in terms of recognizing faces, according to the identified patterns in the data.
On the other hand, in the discovery category of machine learning modeling, we want the models to provide information and insight about unknowns that are either not easy or fully discovered, or even impossible, for human experts or non-experts to extract. For example, discovering new drugs for cancer patients is not a task where you can learn all aspects of it by going through a couple of courses and books. In such cases, machine learning can help us come up with new insights to help discover new drugs.
For both discovery and automation, different types of machine learning modeling can help us achieve our goals. We will explore this in the next section.
Types of machine learning modeling
Machine learning contains multiple modeling types that may rely on output data, a variable type of model output, and learning from prerecorded data or experience. Although the examples in this book focus on supervised learning, we will review other types of modeling, including unsupervised learning, self-supervised learning, semi-supervised learning, reinforcement learning (RL), and generative machine learning to cover the six major categories of machine learning modeling (Figure 1.2). We will also talk about techniques in machine learning modeling and provide code examples that are not parallel to these categories, such as active learning, transfer learning, ensemble learning, and deep learning:

Figure 1.2 – Types of machine learning modeling
Self-supervised and semi-supervised learning are sometimes considered sub-categories of supervised learning. However, we will separate them here so that we can establish the differences between the usual supervised learning models you are familiar with and these two types of modeling.
Supervised learning
Supervised learning is about identifying the relationship between inputs/features and the output for each data point. But what do input and output mean?
Imagine that we want to build a machine learning model to predict whether a person is likely to get breast cancer or not. The output of the model could be 1 for getting breast cancer and 0 for not getting breast cancer and the inputs could be the characteristics of the people, such as their age, weight, and whether they smoke or not. There could even be inputs that are measured using advanced technologies, such as the genetic information of each person. In this case, we want to use our machine learning model to predict which patient will get cancer in the future.
You can also design a machine learning model to estimate the price of houses in a city. Here, your model could use characteristics of houses, such as the number of bedrooms and size of the house, the neighborhood, and access to schools, to estimate house prices.
In both of these examples, we have models trying to identify patterns within input features, such as a high number of bedrooms but only one bathroom, and associate those with the output. Depending on the output variable type, your model can be categorized as a classification model, in which the output is categorical, such as getting or not getting cancer, or a regression model, in which the output is continuous, such as house prices.
Unsupervised learning
The majority of our life, at least in childhood, has been spent using our five senses (eyesight, hearing, taste, touch, and smell) to collect information about our surroundings, food, and so on, without us trying to find supervised learning style relationships such as whether a banana is ripe or not based on its color and shape. Similarly, in unsupervised learning, we are not seeking to identify the relationship between the features (input) and the output. Instead, the goal is to identify relationships between data points, as in clustering, extract new features (that is, embeddings or representations), and, if needed, reduce the dimensionality (that is, the number of features) of our data without using any output for the data points.
Self-supervised learning
The third category of machine learning modeling is called self-supervised learning. In this category, the goal is to identify the relationship between inputs and outputs, but the difference with supervised learning is the source of outputs. For example, if the goal of the supervised machine learning model is to translate from English to French, the inputs come from English words and sentences and the outputs come from French words and sentences. However, we can have a self-supervised learning model within English sentences to try to predict the next word or a missing word in a sentence. For example, let’s say we aim to recognize that “talking” is a good candidate to fill the gap in “Jack is ____ with Julie.” Self-supervised learning models have been used in recent years across different fields to identify new features. This is commonly called representation learning. We will talk about some examples of self-supervised learning in Chapter 14, Introduction to Recent Advancements in Machine Learning.
Semi-supervised learning
Semi-supervised learning can help us benefit from supervised learning without throwing out the data points that don’t have output values. Sometimes, we have data points for which we don’t have the output values and only their feature values are available. In such cases, semi-supervised learning helps us use data points with or without output. One simple process to do so is to group data points that are similar to each other and use known outputs of the data points in each group to assign output for other data points of the same group that don’t have output value.
Reinforcement learning
In RL, a model is rewarded according to its experience in an environment (real or virtual). In other words, RL is about identifying relationships with piecewise example addition. In RL, data is not considered part of the model and is independent of the model itself. We will go through some details of RL in Chapter 14, Introduction to Recent Advancements in Machine Learning.
Generative machine learning
Generative machine learning modeling helps us develop models that can generate images, text, or any data point that is close to the probability distribution of data provided in the training process. ChatGPT is one of the most famous tools that’s built on top of a generative model to generate realistic and meaningful text in response to user questions and answers (https://openai.com/blog/chatgpt). We will go through more details about generative modeling and the available tools built on top of it in Chapter 14, Introduction to Recent Advancements in Machine Learning.
In this section, we provided a brief review of the basic components for building machine learning models and different types of modeling. But if you want to develop machine learning models for automation or discovery, for healthcare or any other application, with a low or high number of data points, on your laptop or the cloud, using a central processing unit (CPU) or graphics processing unit (GPU), you need to develop high-quality code that works as expected. Although this book is not a software debugging book, an overview of software debugging challenges and techniques could help you in developing your machine learning models.
Debugging in software development
If you want to use Python and its libraries to build machine learning and deep learning models, you need to make sure your code works as expected. Let’s consider the following examples of the same function for returning the multiplication of two variables:
- Correct code:
def multiply(x, y): z = x * y return z
- Code with a typo:
def multiply(x, y): z = x * y retunr z
- Code with an indentation issue:
def multiply(x, y):z = x * yreturn z
- Incorrect use of
**
for multiplication:def multiply(x, y): z = x ** y return z
As you can see, there could be typos in the code and issues with indentation that prevent the code from running. You might also face issues because of an incorrect operator being used, such as **
for multiplication instead of *
. In this case, your code will run but the expected result will be different than what the function is supposed to do, which is multiplying the input variables.
Error messages in Python
Sometimes, there are issues with our code that don’t let it continue running. These issues could result in different error messages in Python. Here are some examples of error messages you might face when you’re running your Python code:
SyntaxError
: This is a type of error you’ll get when the syntax you used in your code is not the correct Python syntax. It could be caused by a typo, such as havingretunr
instead ofreturn
, as shown previously, or using a command that doesn’t exist, such as usinggiveme
instead ofreturn
.TypeError
: This error will be raised when your code tries to perform an operation on an object or variable that cannot be done in Python. For example, if your code tries to multiply two numbers while the variables are in string format instead of float or integer format.AttributeError
: This type of error is raised when an attribute is used for an object that it is not defined for. For example,isnull
is not defined for a list. So,my_list.isnull()
results inAttributeError
.NameError
: This error is raised when you try to call a function, class, or other names and modules that are not defined in your code. For example, if you haven’t defined aneural_network
class in your code but call it in your code asneural_network()
, you will get aNameError
message.IndentationError
: Python is a programming language that relies on correct indentation – that is, the necessary spaces at the beginning of each line of code – to understand relationships between the lines. It also helps with code readability.IndentationError
is the result of the wrong type of indentation being used in your code. But not all wrong indentation, based on the objective you have in mind, results inIndentationError
. For example, the following code examples work without any error, but only the first one meets the objective of counting the number of odd numbers in a list. The bottom function returns the length of the input list instead. As a result, if you run the top part of the code, you will get 3 as the output, which is the total number of odd numbers in the input list, while the bottom part of the code returns 5, which is the length of the list. These types of errors, which don’t stop the code from running but generate an incorrect output, are called logical errors.
Here is some example code in which using the wrong indention results in wrong results without any error message:
def odd_counter(num_list: list): """ :param num_list: list of integers to be checked for identifying odd numbers :return: return an integer as the number of odd numbers in the input list """ odd_count = 0 for num in num_list: if (num % 2) == 0: print("{} is even".format(num)) else: print("{} is even".format(num)) odd_count += 1 return odd_count num_list = [1, 2, 5, 8, 9] print(f'Total number of odd numbers in the list: {odd_counter(num_list)}')
The following code runs but generates unintended results:
def odd_counter(num_list: list): """ :param num_list: list of integers to be checked for identifying odd numbers :return: return an integer as the number of odd numbers in the input list """ odd_count = 0 for num in num_list: if (num % 2) == 0: print("{} is even".format(num)) else: print("{} is even".format(num)) odd_count += 1 return odd_count num_list = [1, 2, 5, 8, 9] print(f'Total number of odd numbers in the list: {odd_counter(num_list)}')
There are other errors whose meanings are clear based on their name, such as ZeroDivisionError
when your code tries to return division by zero, IndexError
if your code tries to get a value based on an index that is greater than the length of a list, or ImportError
when you’re trying to import a function or class that cannot be found.
In the previous code examples, we used docstring
to specify the type of input parameter (that is, a list) and the intended output. Having this information helps you and new users of your code to better understand the code and resolve any issue with it quickly.
These are simple examples of issues that can happen in your software and pipelines. In machine learning modeling, you need to conduct debugging to deal with hundreds or thousands of lines of code and tens or hundreds of functions and classes. However, debugging could be much more challenging compared to these examples. It could be even more difficult if you need to start working on a piece of code that you have not written yourself when, for example, you’re joining a new team in the industry or academia. You need to use techniques and tools that help you debug your code with minimum effort and time. Although this book is not designed for code debugging, reviewing some debugging techniques could help you in developing high-quality code that runs as planned.
Debugging techniques
There are techniques to help you in the process of debugging a piece of code or software. You might have used one or more of these techniques, even without remembering or knowing their names. We will review four of them here.
Traceback
When you get an error message in Python, it usually provides you with the necessary information to find the issue. This information creates a report-like message about the lines of your code that the error occurred in, as well as the types of error and function or class calls that resulted in such errors. This report-like message is called a traceback in Python.
Consider the following code, in which the reverse_multiply
function is supposed to return a list of element-wise multiplication of an input list and its reverse. Here, reverse_multiply
uses the multiply
command to multiply the two lists. Since multiply
is designed for multiplying two float numbers, not two lists, the code returns the traceback message with the necessary information for finding the issue, starting from the bottom operation. It specifies that TypeError
occurred on line 8 within multiply
, which is the bottom operation, and then lets us know that this issue results in an error occurring on line 21, in reverse_multiply
, and eventually line 27 in the whole code module. Both the PyCharm IDE and Jupyter return this information. The following code examples show you how to use traceback to find necessary information so that you can debug a small and simple piece of Python code in both PyCharm and Jupyter Notebook:
def multiply(x: float, y: float): """ :param x: input variable of type float :param y: input variable of type float return: returning multiplications of the input variables """ z = x * y return z def reverse_multiply(num_list: list): """ :param num_list: list of integers to be checked for identifying odd numbers :return: return a list containing element-wise multiplication of the input list and its reverse """ rev_list = num_list.copy() rev_list.reverse() mult_list = multiply(num_list, rev_list) return mult_list num_list = [1, 2, 5, 8, 9] print(reverse_multiply(num_list))
The following lines show you the traceback error message when you run the previous code in Jupyter Notebook:
TypeError Traceback (most recent call last)<ipython-input-1-4ceb9b77c7b5> in <module>() 25 26 num_list = [1, 2, 5, 8, 9] ---> 27 print(reverse_multiply(num_list)) <ipython-input-1-4ceb9b77c7b5> in reverse_multiply(num_list) 19 rev_list.reverse() 20 ---> 21 mult_list = multiply(num_list, rev_list) 22 23 return mult_list <ipython-input-1-4ceb9b77c7b5> in multiply(x, y) 6 return: returning multiplications of the input variables 7 """ ----> 8 z = x * y 9 return z 10 TypeError: can't multiply sequence by non-int of type 'list' Traceback error message in Pycharm Traceback (most recent call last): File "<input>", line 27, in <module> File "<input>", line 21, in reverse_multiply File "<input>", line 8, in multiply TypeError: can't multiply sequence by non-int of type 'list'
Python traceback messages seem to be very useful for debugging our code. However, they are not enough for debugging large code bases that contain many functions and classes. You need to use complementary techniques to help you in the debugging process.
Induction and deduction
When you have found an error in your code, you can either start by collecting as much information as you can and try to find potential issues using the information, or you can jump into checking your suspicions. These two approaches differentiate induction from the deduction process in terms of code debugging:
- Induction: In the induction process, you start collecting information and data about the problem in your code that helps you come up with a list of potential issues resulting from the error. Then, you can narrow the list down and, if necessary, collect more information and data from the process until you fix the error.
- Deduction: In the deduction process, you come up with a short list of your points of suspicion regarding the issues in your code and try to find if any one of them is the actual source of the issue. You continue this process and gather more information and come up with new potential sources of the problem. You continue this process until you fix the problem.
In both approaches, you go through an iterative process of coming up with potential sources of issues and building hypotheses and then collect the necessary information until you fix the error in your code. If a piece of code or software is new to you, this process could take time. In such cases, try to get help from your teammates with more experience with the code to collect more data and come up with more relevant hypotheses.
Bug clustering
As stated in the Pareto principle, named after Vilfredo Pareto, a famous Italian sociologist and economist, 80% of the results originate from 20% of the causes. The exact number is not the point here. This principle helps us better understand that the majority of the problems and errors in our code are caused by a minority of its modules. By grouping bugs, we can hit multiple birds with one stone as resolving an issue in a group of bugs could potentially resolve most others within the same group.
Problem simplification
The idea here is to simplify the code so that you can identify the cause of the error and fix it. You could replace big data objects with smaller and even synthetic ones or limit function calling in a big module. This process could help you quickly eliminate the options for identifying the causes of the issues in your code, or even in the data format you have used as inputs of functions or classes in your code. Especially in a machine learning setting, where you might deal with complex data processes, big data files, or streams of data, this simplification process for debugging could be very useful.
Debuggers
Each IDE you might use, such as PyCharm, or if you use Jupyter Notebook to experiment with your ideas using Python, has built-in features for debugging. There are also free or paid tools you can benefit from to facilitate your debugging processes. For example, in PyCharm and most other IDEs, you can use breakpoints as pausing places when running a big piece of code so that you can follow the operations in your code (Figure 1.3) and eventually find the cause of the issue:

Figure 1.3 – Using breakpoints in PyCharm for code debugging
The breakpoint capabilities in different IDEs are not the same. For example, you can use PyCharm’s conditional breakpoints to speed up your debugging process, which helps you not execute a line of code in a loop or repeat function calls manually. Read more about the debugging features of the IDE you use and consider them as another tool in your toolbox for better and easier Python programming and machine learning modeling.
The debugging techniques and tools we’ve briefly explained here, or those you already know about, could help you develop a piece of code that runs and provides the intended results. You could also follow some best practices for high-quality Python programming and building your machine learning models.
Best practices for high-quality Python programming
Prevention is better than a cure. There are practices you can follow to prevent or decrease the chance of bugs occurring in your code. In this section, we will talk about three of those practices: incremental programming, logging, and defensive programming. Let’s look at each in detail.
Incremental programming
Machine learning modeling in practice, in academia or industry, is beyond writing a few lines of code to train a simple model such as a logistic regression model using datasets that already exist in scikit-learn
. It requires many modules for processing data, training and testing model and postprocessing inferences, or predictions to assess the reliability of the models. Writing code for every small component, then testing it and writing test code using PyTest, for example, could help you avoid issues with each function or class you wrote. It also helps you make sure that the outputs of one module that feed another module as its input are compatible. This process is what is called incremental programming. When you write a piece of software or pipeline, try to write and test it piece by piece.
Logging
Every car has a series of dashboard lights that get turned on when there is a problem with the car. These problems could stop the car from running or cause serious damage if they’re not acted upon, such as low gas or engine oil change lights. Now, imagine there was no light or warning, and all of a sudden, the car you are driving stops or makes a terrible sound, and you don’t know what to do. When you develop functions and classes in Python, you can benefit from logging to log information, errors, and other kinds of messages that help you in identifying potential sources of issues when you get an error message. The following example showcases how to use error and info as two attributes of logging. You can benefit from different attributes of logging in terms of the functions and classes you write to improve data and information gathering while running your code. You can also export the log information in a file using basicConfig()
, which does the basic configuration for the logging system:
import loggingdef multiply(x: float, y: float): """ :param x: input variable of type float :param y: input variable of type float return: returning multiplications of the input variables """ if not isinstance(x, (int, float)) or not isinstance(y, (int, float)): logging.error('Input variables are not of type float or integer!') z = x * y return z def reverse_multiply(num_list: list): """ :param num_list: list of integers to be checked for identifying odd numbers :return: return a list containing element-wise multiplication of the input list and its reverse """ logging.info("Length of {num_list} is { list_len}".format(num_list=num_list, list_len = len(num_list))) rev_list = num_list.copy() rev_list.reverse() mult_list = [multiply(num_list[iter], rev_list[iter]) for iter in range(0, len(num_list))] return mult_list num_list = [1, 'no', 5, 8, 9] print(reverse_multiply(num_list))
When you run the previous code, you will get the following messages and output:
ERROR:root:Input variables are not of type float or integer!ERROR:root:Input variables are not of type float or integer! [9, 'nononononononono', 25, 'nononononononono', 9]
The logged error messages are the results of attempting to multiply 'no'
, which is a string with another number.
Defensive programming
Defensive programming is about preparing yourself for mistakes that can be made by you, your teammates, and your collaborators. There are tools, techniques, and Python classes to defend the code against such mistakes, such as assertions. For example, using the following line in your code stops it, if the conditions are met, and returns an error message stating AssertionError: Variable should be of
type float
:
assert isinstance(num, float), 'Variable should be of type float'
Version control
The tools and practices we covered here are just examples of how you can improve the quality of your programming and decrease the amount of time needed to eliminate issues and errors in your code. Another important tool in improving your machine learning modeling is versioning. We will talk about data and model versioning in Chapter 10, Versioning and Reproducible Machine Learning Modeling, but let’s briefly talk about code versioning here.
Version control systems allow you to manage changes in your code and files that exist in a code base and help you in tracking those changes, gain access to the history of changes, and collaborate in developing different components of a machine learning pipeline. You can use version control systems such as Git and its associated hosting services such as GitHub, GitLab, and BitBucket for your projects. These tools let you and your teammates and collaborators work on different branches of code without disrupting each other’s work. It also lets you easily go back to the history of changes and find out when a change happened in the code.
If you have not used version control systems, don’t consider them as a new complicated tool or programming language you need to start learning. There are a couple of core concepts and terms you need to learn first, such as commit
, push
, pull
, and merge
, when using Git. Using these functionalities could be even as simple as a few clicks in an IDE such as PyCharm if you don’t want to or know how to use the command-line interface (CLI).
We reviewed some commonly used techniques and tools to help you in debugging your code and high-quality Python programming. However, there are more advanced tools built on top of models such as GPT, such as ChatGPT (https://openai.com/blog/chatgpt) and GitHub Copilot (https://github.com/features/copilot), that you can use to develop your code faster and increase the quality of your code and even your code debugging efforts. We will talk about some of these tools in Chapter 14, Introduction to Recent Advancements in Machine Learning.
Although using the preceding debugging techniques or best practices to avoid issues in your Python code helps you have a low-bug code base, it doesn’t prevent all the problems with machine learning models. This book is about going beyond Python programming for machine learning to help you identify problems with your machine learning models and develop high-quality models.
Debugging beyond Python
Eliminating code issues doesn’t resolve all the issues that may exist in a machine learning model or a pipeline for data preparation and modeling. There could be issues that don’t result in any error message, such as problems that originate from data used for modeling, and differences between test data and production data (that is, data that the model needs to be used for eventually).
Production versus development environments
The development environment is where we develop our models, such as our computers or cloud environments we use for development. It is where we develop our code, debug it, process data, train models, and validate them. But what we do in this stage doesn’t affect users directly.
The production environment is where the model is ready to be used by end users or could affect them. For example, a model could get into production in the Amazon platform for recommending products, be delivered to other teams in a banking system for fraud detection, or even be used in hospitals to help clinicians in diagnosing patients’ conditions better.
Flaws in data used for modeling
Data is one of the core components of machine learning modeling (Figure 1.1). Applications of machine learning across different industries such as healthcare, finance, automotive, retail, and marketing are made possible by getting access to the necessary data for training and testing machine learning models. As the data gets fed into machine learning models for training (that is, identifying optimal model parameters) and testing, flaws in data could result in problems in models, such as low performance in training (for example, high bias), low generalizability (for example high variance), or socioeconomic biases. Here, we will discuss examples of flaws and properties of data that need to be considered when designing a machine learning model.
Data format and structure
There could be issues with how data is stored, read, and moved through different functions and classes in your code or pipeline. You might need to work with structured or tabular data or unstructured data such as videos and text documents. This data could be stored in relational databases such as MySQL or NoSQL (that is, non-relational) databases, data warehouses, and data lakes, or even stored locally in different file formats, such as CSV. Either way, the expected and existing file data structure and formats need to match. For example, if your code is expecting a tab-separated file format but instead the input file of the corresponding function is comma-separated, then all the columns could be lumped together. Luckily, most of the time, these kinds of issues result in errors in the code.
There could also be mismatches in the provided and expected data that wouldn’t cause any errors if the code is not defended against them and not enough information is logged. For example, imagine a scikit-learn fit
function that expects training data with 100 features and at the same time, you have 100 data points. In this case, your code will not return any errors if features are in rows or columns of an input DataFrame. Then, your code needs to check if each row of an input DataFrame contains values of one feature across all data points or the feature values of one data point. The following figure shows how switching features with data points, such as transposing a DataFrame that switches rows with columns, could provide wrong input files but result in no error. In this figure, we have considered four columns and rows for simplicity. Here, F and D are used as abbreviations for feature and data point, respectively:

Figure 1.4 – Simplified example showcasing how the transpose of a DataFrame can be used by mistake in a scikit-learn fit function that expects four features
Data flaws are not restricted to structure and format issues. Some data characteristics need to be considered when you’re trying to build and improve a machine learning model.
Data quantity and quality
Despite machine learning being a more than half-century-old concept, the rise of excitement around machine learning started in 2012. Although there were algorithmic advancements for image classification between 2010 and 2015, it was the availability of 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest and the necessary computing power that played a crucial role in the development of the first high-performance image classification models, such as AlexNet (Krizhevsky et al., 2012) and VGG (Simonyan and Zisserman, 2014).
In addition to data quantity, the quality of the data also plays a very important role. In some applications, such as clinical cancer settings, a high quantity of high-quality data is not accessible. Benefitting from both quantity and quality could also become a tradeoff as we could have access to more data but with lower quality. We can choose to stick to high-quality data or low-quality ones or try to benefit from both high-quality and low-quality data if possible. Selecting the right approach is domain-specific and depends on the data and algorithm used for modeling.
Data biases
Machine learning models can have different kinds of biases, depending on the data we feed them. Correctional Offender Management Profiling for Alternative Sanctions (COMPAS) is a famous example of machine learning models with reported biases. COMPAS is designed to estimate the likelihood of a defendant to re-offend based on their response to more than 100 survey questions. A summary of the responses to the questions results in a risk score, which includes questions such as whether one of the prisoner’s parents was ever in prison. Although the tool has been successful in many examples, when it has been wrong in terms of prediction, the results for white and black offenders were not the same. The developer company of COMPAS presented data that supports its algorithm’s findings. You can find articles and blog posts to read more about its current status and whether it is still used or still has biases or not.
These were some examples of issues in data and their consequences in the resulting machine learning models. But there are other problems in models that do not originate from data.
Model and prediction-centric debugging
The predictions of a model in the training, testing, and production stages could help us detect issues with the models and find opportunities to improve them. Here, we will briefly review some aspects of model- and prediction-centric model debugging. You can read more details about these problems and other considerations in achieving a reliable model, how to identify the source of the issues, and how to resolve them in future chapters of this book.
Underfitting and overfitting
When we train a model, such as a supervised learning model, the goal is to have high performance not just in training but also in testing. When a model has low performance even in a training set, we need to deal with the issue of underfitting. We can develop more complicated models, such as a random forest or deep learning model, instead of linear and logistic regression models. More complex models might result in lower underfitting, but they might cause overfitting and result in lower generalizability of the prediction to test or production data (Figure 1.5):

Figure 1.5 – Schematic illustration of underfitting and overfitting
Algorithm and hyperparameter selection determine the level of complexity and the chance of underfitting or overfitting when training and testing a machine learning model. For example, by choosing a model that can learn nonlinear patterns instead of linear models, your model could have a higher chance of low underfitting as it could identify more complex patterns in training data. But at the same time, you could increase the chance of overfitting as some of the complex patterns in the training data might not be generalizable to the test data (Figure 1.5). There are approaches to assess underfitting and overfitting that will help you develop a high-performance and generalizable model. We will discuss these in future chapters.
Model hyperparameters
Some parameters can affect the performance of a machine learning model that usually do not get optimized automatically in the training process. These are called hyperparameters. We will go through examples of such hyperparameters, such as the number of trees in a random forest model or the size of hidden layers in neural network models, in future chapters.
Inference in model testing and production
The eventual goal of machine learning modeling is to have a highly effective model in production. When we test the model, we are assessing its generalizability, but we cannot be sure about its performance on the data it has not seen. The data that’s used for training machine learning models could become out of date. For example, the changes in the trends of the clothing market could make predictions of a model for clothing recommendation unreliable.
There are different concepts in this topic, such as data variance, data drift, and model drift, all of which we will cover in the next few chapters.
Data or hyperparameters for changing landscapes
When we train a machine learning model with specific training data and a set of hyperparameters, the values of model parameters get changed so that they’re as close to an optimum point as possible for a defined objective or loss function. The two other tools to achieve a better model are providing better data for training and selecting better hyperparameters. Each algorithm has a capacity for performance improvement. By playing with model hyperparameters alone, you cannot develop the best possible model. In the same way, by increasing the quality and quantity of your data and keeping your model hyperparameters the same, you could also not achieve the best performance possible. So, data and hyperparameters come hand in hand. Before you read the next chapters, remember that by spending more time and money on hyperparameter optimization alone, you cannot necessarily get a better model. We will look at this in more detail later in this book.
Summary
In this chapter, we reviewed important concepts and approaches for debugging in software development and their differences with machine learning model debugging. You learned that debugging in machine learning modeling is beyond software debugging and how data and algorithms, in addition to code, could cause flawed or low-performance models and unreliable predictions. You can benefit from these understandings and the tools and techniques you will learn about throughout this book to develop reliable machine learning models.
In the next chapter, you will learn about the different components of the machine learning life cycle. You will also learn how modularizing machine learning modeling with these components helps us in identifying opportunities for improving our models before and after training and testing.
Questions
- Could your code have unintended indentation but not return any error message?
- What is the difference between
AttributeError
andNameError
in Python? - How does data dimensionality affect model performance?
- What information do traceback messages in Python provide you about the errors in your code?
- Could you explain two best practices for high-quality Python programming?
- Could you explain why you might have features or data points with different levels of confidence?
- Could you provide suggestions on how to reduce underfitting or overfitting when building a model for a given dataset?
- Could we have a model with significantly lower performance in production than testing?
- Is it a good idea to focus on hyperparameter optimization when we can also improve the quality or quantity of the training data?
References
- Widyasari, Ratnadira, et al. BugsInPy: A database of existing bugs in Python programs to enable controlled testing and debugging studies. Proceedings of the 28th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering. 2020.
- The Art of Software Testing, Second Edition, by Glenford J. Myers, Corey Sandler, Tom Badgett, Todd M. Thomas.
- Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25 (2012).
- Simonyan, Karen, and Andrew Zisserman. “Very deep convolutional networks for large-scale image recognition.” arXiv preprint arXiv:1409.1556 (2014). https://arxiv.org/abs/1409.1556.