Home Data Machine Learning Infrastructure and Best Practices for Software Engineers

Machine Learning Infrastructure and Best Practices for Software Engineers

By Miroslaw Staron
books-svg-icon Book
eBook $35.99 $24.99
Print $44.99
Subscription $15.99 $10 p/m for three months
$10 p/m for first 3 months. $15.99 p/m after that. Cancel Anytime!
What do you get with a Packt Subscription?
This book & 7000+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook + Subscription?
Download this book in EPUB and PDF formats, plus a monthly download credit
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook?
Download this book in EPUB and PDF formats
Access this title in our online reader
DRM FREE - Read whenever, wherever and however you want
Online reader with customised display settings for better reading experience
What do you get with video?
Download this video in MP4 format
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with video?
Stream this video
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with Audiobook?
Download a zip folder consisting of audio files (in MP3 Format) along with supplementary PDF
BUY NOW $10 p/m for first 3 months. $15.99 p/m after that. Cancel Anytime!
eBook $35.99 $24.99
Print $44.99
Subscription $15.99 $10 p/m for three months
What do you get with a Packt Subscription?
This book & 7000+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook + Subscription?
Download this book in EPUB and PDF formats, plus a monthly download credit
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook?
Download this book in EPUB and PDF formats
Access this title in our online reader
DRM FREE - Read whenever, wherever and however you want
Online reader with customised display settings for better reading experience
What do you get with video?
Download this video in MP4 format
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with video?
Stream this video
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with Audiobook?
Download a zip folder consisting of audio files (in MP3 Format) along with supplementary PDF
  1. Free Chapter
    Machine Learning Compared to Traditional Software
About this book
Although creating a machine learning pipeline or developing a working prototype of a software system from that pipeline is easy and straightforward nowadays, the journey toward a professional software system is still extensive. This book will help you get to grips with various best practices and recipes that will help software engineers transform prototype pipelines into complete software products. The book begins by introducing the main concepts of professional software systems that leverage machine learning at their core. As you progress, you’ll explore the differences between traditional, non-ML software, and machine learning software. The initial best practices will guide you in determining the type of software you need for your product. Subsequently, you will delve into algorithms, covering their selection, development, and testing before exploring the intricacies of the infrastructure for machine learning systems by defining best practices for identifying the right data source and ensuring its quality. Towards the end, you’ll address the most challenging aspect of large-scale machine learning systems – ethics. By exploring and defining best practices for assessing ethical risks and strategies for mitigation, you will conclude the book where it all began – large-scale machine learning software.
Publication date:
January 2024
Publisher
Packt
Pages
346
ISBN
9781837634064

 

Machine Learning Compared to Traditional Software

Machine learning software is a special kind of software that finds patterns in data, learns from them, and even recreates these patterns on new data. Developing the machine learning software is, therefore, focused on finding the right data, matching it with the appropriate algorithm, and evaluating its performance. Traditional software, on the contrary, is developed with the algorithm in mind. Based on software requirements, programmers develop algorithms that solve specific tasks and then test them. Data is secondary, although not completely unimportant. Both types of software can co-exist in the same software system, but the programmer must ensure compatibility between them.

In this chapter, we’ll explore where these two types of software systems are most appropriate. We’ll learn about the software development processes that programmers use to create both types of software. We’ll also learn about the four classical types of machine learning software – rule-based learning, supervised learning, unsupervised learning, and reinforcement learning. Finally, we’ll learn about the different roles of data in traditional and machine learning software – as input to pre-programmed algorithms in traditional software and input to training models in machine learning software.

The best practices introduced in this chapter provide practical guidance on when to choose each type of software and how to assess the advantages and disadvantages of these types. By exploring a few modern examples, we’ll understand how to create an entire software system with machine learning algorithms at the center.

In this chapter, we’re going to cover the following main topics:

  • Machine learning is not a traditional software
  • Probability and software – how well do they go together?
  • Testing and validation – the same but different
 

Machine learning is not traditional software

Although machine learning and artificial intelligence have been around since the 1950s, introduced by Alan Turing, they only became popular with the first MYCIN system and our understanding of machine learning systems changed over time. It was not until the 2010s that we started to perceive, design, and develop machine learning in the same way as we do today (in 2023). In my view, two pivotal moments shaped the landscape of machine learning as we see it today.

The first pivotal moment was the focus on big data in the late 2000s and early 2010s. With the introduction of smartphones, companies started to collect and process increasingly large quantities of data, mostly about our behavior online. One of the companies that perfected this was Google, which collected data about our searches, online behavior, and usage of Google’s operating system, Android. As the volume of the collected data increased (and its speed/velocity), so did its value and the need for its veracity – the five Vs. These five Vs – volume, velocity, value, veracity, and variety – required a new approach to working with data. The classical approach of relational databases (SQL) was no longer sufficient. Relational databases became too slow in handling high-velocity data streams, which gave way to map-reduce algorithms, distributed databases, and in-memory databases. The classical approach of relational schemas became too constraining for the variety of data, which gave way for non-SQL databases, which stored documents.

The second pivotal moment was the rise of modern machine learning algorithms – deep learning. Deep learning algorithms are designed to handle unstructured data such as text, images, or music (compared to structured data in the form of tables and matrices). Classical machine learning algorithms, such as regression, decision trees, or random forest, require data in a tabular form. Each row is a data point, and each column is one characteristic of it – a feature. The classical models are designed to handle relatively small datasets. Deep learning algorithms, on the other hand, can handle large datasets and find more complex patterns in the data because of the power of large neural networks and their complex architectures.

Machine learning is sometimes called statistical learning as it is based on statistical methods. The statistical methods calculate properties of data (such as mean values, standard deviations, and coefficients) and thus find patterns in the data. The core characteristic of machine learning is that it uses data to find patterns, learn from them, and then repeat these patterns on new data. We call this way of learning patterns training, and repeating these patterns as reasoning, or in machine learning language, predicting. The main benefits of using machine learning software come from the fact that we do not need to design the algorithms – we focus on the problem to be solved and the data that we use to solve the problem. Figure 1.1 shows an example of how such a flowchart of machine learning software can be realized.

First, we import a generic machine learning model from a library. This generic model has all elements that are specific to it, but it is not trained to solve any tasks. An example of such a model is a decision tree model, which is designed to learn dependencies in data in the form of decisions (or data splits), which it uses later for new data. To make this model somewhat useful, we need to train it. For that, we need data, which we call the training data.

Second, we evaluate the trained model on new data, which we call the test data. The evaluation process uses the trained model and applies it to check whether its inferences are correct. To be precise, it checks to which degree the inferences are correct. The training data is in the same format as the test data, but the content of these datasets is different. No data point should be present in both.

In the third step, we use the model as part of a software system. We develop other non-machine learning components, and we connect them to the trained model. The entire software system usually consists of data procurement components, real-time validation components, data cleaning components, user interfaces, and business logic components. All these components, including the machine learning model, provide a specific functionality for the end user. Once the software system has been developed, it needs to be tested, which is where the input data comes into play. The input data is something that the end user inputs to the system, such as by filling in a form. The input data is designed in such a way that has both the input and expected output – to test whether the software system works correctly.

Finally, the last step is to deploy the entire system. The deployment can be very different, but most modern machine learning systems are organized into two parts – the onboard/edge algorithms for non-machine learning components and the user interface, and the offboard/cloud algorithms for machine learning inferences. Although it is possible to deploy all parts of the system on the target device (both machine learning and non-machine learning components), complex machine learning models require significant computational power for good performance and seamless user experience. The principle is simple – more data/complex data means more complex models, which means that more computational power is needed:

Figure 1.1 – Typical flow of machine learning software development

Figure 1.1 – Typical flow of machine learning software development

As shown in Figure 1.1, one of the crucial elements of the machine learning software is the model, which is one of the generic machine learning models, such as a neural network, that’s been trained on specific data. Such a model is used to make predictions and inferences. In most systems, this kind of component – the model – is often prototyped and developed in Python.

Models are trained for different datasets and, therefore, the core characteristic of machine learning software is its dependence on that dataset. An example of such a model is a vision system, where we train a machine learning algorithm such as a convolutional neural network (CNN) to classify images of cats and dogs.

Since the models are trained on specific datasets, they perform best on similar datasets when making inferences. For example, if we train a model to recognize cats and dogs in 160 x 160-pixel grayscale images, the model can recognize cats and dogs in such images. However, the same model will perform very poorly (if at all!) if it needs to recognize cats and dogs in colorful images instead of grayscale images – the accuracy of the classification will be low (close to 0).

On the other hand, when we develop and design traditional software systems, we do not rely on data that much, as shown in Figure 1.2. This figure provides an overview of a software development process for traditional, non-machine learning software. Although it is depicted as a flow, it is usually an iterative process where Steps 1 to 3 are done in cycles, each one ending with new functionality added to the product.

The first step is developing the software system. This includes the development of all its components – user interface, business logic (processing), handling of data, and communication. The step does not involve much data unless the software engineer creates data for testing purposes.

The second step is system testing, where we use input data to validate the software system. In essence, this step is almost identical to testing machine learning software. The input data is complemented with the expected outcome data, which allows software testers to assess whether the software works correctly.

The third step is to deploy the software. The deployment can be done in many ways. However, if we consider traditional software that is similar in function to machine learning software, it is usually simpler. It usually does not require deployment on the cloud, just like machine learning models:

Figure 1.2 - Typical flow of traditional software development

Figure 1.2 - Typical flow of traditional software development

The main difference between traditional software and machine learning-based software is that we need to design, develop, and test all the elements of the traditional software. In machine learning-based software, we take an empty model, which contains all the necessary elements, and we use the data to train it. We do not need to develop the individual components of the machine learning model from scratch.

One of the main parts of traditional software is the algorithm, which is developed by software engineers from scratch, based on the requirements or user stories. The algorithm is usually written as a sequential set of steps that are implemented in a programming language. Naturally, all algorithms use data to operate on it, but they do it differently than machine learning systems. They do it based on the software engineer’s design – if x, then y or something similar.

We usually consider these traditional algorithms as deterministic, explainable, and traceable. This means that the software engineer’s design decisions are documented in the algorithm and the algorithm can be analyzed afterward. They are deterministic because they are programmed based on rules; there is no training from data or identifying patterns from data. They are explainable because they are designed by programmers and each line of the program has a predefined meaning. Finally, they are traceable as we can debug every step of these programs.

However, there is a drawback – the software engineer needs to thoroughly consider all corner cases and understand the problem very well. The data that the software engineer uses is only to support them in analyzing the algorithm, not training it.

An example of a system that can be implemented using both machine learning algorithms and traditional ones is one for reading passport information. Instead of using machine learning for image recognition, the software uses specific marks in the passport (usually the <<< sequence of characters) to mark the beginning of the line or the beginning of the sequence of characters denoting a surname. These marks can be recognized quite quickly using rule-based optical character recognition (OCR) algorithms without the need for deep learning or CNNs.

Therefore, I would like to introduce the first best practice.

Best practice #1

Use machine learning algorithms when your problem is focused on data, not on the algorithm.

When selecting the right technology, we need to understand whether it is based on the classical approach, where the design of the algorithm is in focus, or whether we need to focus on handling data and finding patterns in it. It is usually beneficial to start with the following guidelines.

If the problem requires processing large quantities of data in raw format, use the machine learning approach. Examples of such systems are conversational bots, image recognition tools, text processing tools, or even prediction systems.

However, if the problem requires traceability and control, use the traditional approach. Examples of such systems are control software in cars (anti-lock braking, engine control, and so on) and embedded systems.

If the problem requires new data to be generated based on the existing data, a process known as data manipulation, use the machine learning approach. Examples of such systems are image manipulation programs (DALL-E), text generation programs, deep fake programs, and source code generation programs (GitHub Copilot).

If the problem requires adaptation over time and optimization, use machine learning software. Examples of such systems are power grid optimization software, non-playable character behavior components in computer games, playlist recommendation systems, and even GPS navigation systems in modern cars.

However, if the problem requires stability and traceability, use the traditional approach. Examples of such systems are systems to make diagnoses and recommendation systems in medicine, safety-critical systems in cars, planes, and trains, and infrastructure controlling and monitoring systems.

Supervised, unsupervised, and reinforcement learning – it is just the beginning

Now is a good time to mention that the field of machine learning is huge, and it is organized into three main areas – supervised learning, unsupervised learning, and reinforcement learning. Each of these areas has hundreds of different algorithms. For example, the area of supervised learning has over 1,000 algorithms, all of which can be automatically selected by meta-heuristic algorithms such as AutoML:

  • Supervised learning: This is a group of algorithms that are trained based on annotated data. The data that’s used in these algorithms needs to have a target or a label. The label is used to tell the algorithm which pattern to look for. For example, such a label can be cat or dog for each image that the supervised learning model needs to recognize. Historically, supervised learning algorithms are the oldest ones as they come directly from statistical methods such as linear regression and multinomial regression. Modern algorithms are advanced and include methods such as deep learning neural networks, which can recognize objects in 3D images and segment them accordingly. The most advanced algorithms in this area are deep learning and multimodal models, which can process text and images at the same time.

    A sub-group of supervised learning algorithms is self-supervised models, which are often based on transformer architectures. These models do not require labels in the data, but they use the data itself as labels. The most prominent examples of these algorithms are translation models for natural languages and generative models for images or texts. Such algorithms are trained by masking words in the original texts and predicting them. For the generative models, these algorithms are trained by masking parts of their output to predict it.

  • Unsupervised learning: This is a group of models that are applied to find patterns in data without any labels. These models are not trained, but they use statistical properties of the input data to find patterns. Examples of such algorithms are clustering algorithms and semantic map algorithms. The input data for these algorithms is not labeled and the goal of applying these algorithms is to find structure in the dataset according to similarities; these structures can then be used to add labels to this data. We encounter these algorithms daily when we get recommendations for products to buy, books to read, music to listen to, or films to watch.
  • Reinforcement learning: This is a group of models that are applied to data to solve a particular task given a goal. For these models, we need to provide this goal in addition to the data. It is called the reward function, and it is an expression that defines when we achieve the goal. The model is trained based on this fitness function. Examples of such models are algorithms that play Go, Chess, or StarCraft. These algorithms are also used to solve hard programming problems (AlphaCode) or optimize energy consumption.

So, let me introduce the second best practice.

Best practice #2

Before you start developing a machine learning system, do due diligence and identify the right group of algorithms to use.

As each of these groups of models has different characteristics, solves different problems, and requires different data, a mistake in selecting the right algorithm can be costly. Supervised models are very good at solving problems related to predictions and classifications. The most powerful models in this area can compete with humans in selected areas – for example, GitHub Copilot can create programs that can pass as human-written. Unsupervised models are very powerful if we want to group entities and make recommendations. Finally, reinforcement learning models are the best when we want to have continuous optimization with the need to retrain models every time the data or the environment changes.

Although all these models are based on statistical learning, they are all components of larger systems to make them useful. Therefore, we need to understand how this probabilistic and statistical nature of machine learning goes with traditional, digital software products.

An example of traditional and machine learning software

To illustrate the difference between traditional software and machine learning software, let’s implement the same program using these two paradigms. We’ll implement a program that calculates a Fibonacci sequence using the traditional approach, which we have seen a million times in computer science courses. Then, we’ll implement the same program using machine learning models – or one model to be exact – that is, logistic regression.

The traditional implementation is presented here. It is based on one recursive function and a loop that tests it:

# a recursive function to calculate the fibonacci number
# this is a standard solution that is used in almost all
# of computer science examples
def fibRec(n):
  if n < 2:
      return n
  else:
      return fibRec(n-1) + fibRec(n-2)
# a short loop that uses the above function
for i in range(23):
  print(fibRec(i))

The implementation is very simple and is based on the algorithm – in our case, the fibRec function. It is simplistic, but it has its limitations. The first one is its recursive implementation, which costs resources. Although it can be written as an iterative one, it still suffers from the second problem – it is focused on the calculations and not on the data.

Now, let’s see how the machine learning implementation is done. I’ll explain this by dividing it into two parts – data preparation and model training/inference:

#predicting fibonacci with linear regression
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
# training data for the algorithm
# the first two columns are the numbers and the third column is the result
dfTrain = pd.DataFrame([[1, 1, 2],
                        [2, 1, 3],
                        [3, 2, 5],
                        [5, 3, 8],
                        [8, 5, 13]
])
# now, let's make some predictions
# we start the sequence as a list with the first two numbers
lstSequence = [0,1]
# we add the names of the columns to make it look better
dfTrain.columns = ['first number','second number','result']

In the case of machine learning software, we prepare data to train the algorithm. In our case, this is the dfTrain DataFrame. It is a table that contains the numbers that the machine learning algorithm needs to find the pattern.

Please note that we prepared two datasets – dfTrain, which contains the numbers to train the algorithm, and lstSequence, which is the sequence of Fibonacci numbers that we’ll find later.

Now, let’s start training the algorithm:

# algorithm to train
# here, we use linear regression
model = LinearRegression()
# now, the actual process of training the model
model.fit(dfTrain[['first number', 'second number']],
                               dfTrain['result'])
# printing the score of the model, i.e. how good the model is when trained
print(model.score(dfTrain[['first number', 'second number']], dfTrain['result']))

The magic of the entire code fragment is in the bold-faced code – the model.fit method call. This method trains the logistic regression model based on the data we prepared for it. The model itself is created one line above, in the model = LinearRegression() line.

Now, we can make inferences or create new Fibonacci numbers using the following code fragment:

# and loop through the newly predicted numbers
for k in range(23):
  # the line below is where the magic happens
  # it takes two numbers from the list
  # formats them to an array
  # and makes the prediction
  # since the model returns a float,
  # we need to convert it to it
  intFibonacci = int(model.predict(np.array([[lstSequence[k],lstSequence[k+1]]])))
  # add this new number to the list for the next iteration
  lstSequence.append(intFibonacci)
  # and print it
  print(intFibonacci)

This code fragment contains a similar line to the previous one – model.predict(). This line uses the previously created model to make an inference. Since the Fibonacci sequence is recursive, we need to add the newly created number to the list before we can make the new inference, which is done in the lstSequence.append() line.

Now, it is very important to emphasize the difference between these two ways of solving the same problem. The traditional implementation exposes the algorithm used to create the numbers. We do not see the Fibonacci sequence there, but we can see how it is calculated. The machine learning implementation exposes the data used to create the numbers. We see the first sequence as training data, but we never see how the model creates that sequence. We do not know whether that model is always correct – we would need to test it against the real sequence – simply because we do not know how the algorithm works. This takes us to the next part, which is about just that – probabilities.

 

Probability and software – how well they go together

The fundamental characteristic that makes machine learning software different from traditional software is the fact that the core of machine learning models is statistics. This statistical learning means that the output of the machine learning model is a probability and, as such, it is not as clear as in traditional software systems.

The probability, which is the result of the model, means that the answer we receive is a probability of something. For example, if we classify an image to check whether it contains a dog or a cat, the result of this classification is a probability – for example, there is a 93% probability that the image contains a dog and a 7% probability that it contains a cat. This is illustrated in Figure 1.3:

Figure 1.3 – Probabilistic nature of machine learning software

Figure 1.3 – Probabilistic nature of machine learning software

To use these probabilistic results in other parts of the software, or other systems, the machine learning software usually uses thresholds (for example, if x<0.5) to provide only one result. Such thresholds specify which probability is acceptable to be able to consider the results to belong to a specific class. For our example of image classification, this probability would be 50% – if the probability of identifying a dog in the image is larger than 50%, then the model states that the image contains a dog (without the probability).

Changing these probabilistic results to digital ones, as we did in the previous example, is often correct, but not always. Especially in corner cases, such as when the probability is close to the threshold’s lower bound, the classification can lead to errors and thus to software failures. Such failures are often negligible, but not always. In safety-critical systems, there should be no mistakes as they can lead to unnecessary hazards with potentially catastrophic consequences.

In contexts where the probabilistic nature of machine learning software is problematic, but we still need machine learning for its other benefits, we can construct mechanisms that mitigate the consequences of mispredictions, misclassifications, and sub-optimizations. These mechanisms can guard the machine learning models and prevent them from suggesting wrong recommendations. For example, when we use machine learning image classification in the safety system of a car, we construct a so-called safety cage around the model. This safety cage is a non-machine learning component that uses rules to check whether a specific recommendation, classification, or prediction is plausible in the specific context. It can, for instance, prevent a car from suddenly stopping for a non-existent traffic light signal on a highway, which is a consequence of a misclassification of a camera feed from the front camera.

Therefore, let’s look at another best practice that encourages the use of machine learning software even in safety-critical systems.

Best practice #3

If your software is safety-critical, make sure that you can design mechanisms to prevent hazards caused by the probabilistic nature of machine learning.

Although this best practice is formulated toward safety-critical systems, it is more general than that. Even for mission-critical or business-critical systems, we can construct mechanisms that can gatekeep the machine learning models and prevent erroneous behavior of the entire software system. An example of how such a cage can be constructed is shown in Figure 1.4, where the gatekeeper component provides an additional signal that the model’s prediction cannot be trusted/used:

Figure 1.4 – Gatekeeping of machine learning models

Figure 1.4 – Gatekeeping of machine learning models

In this figure, the additional component is placed as the last one in this processing pipeline to ensure that the result is always binary (for this case). In other cases, such a gatekeeper can be placed in parallel to the machine learning model and can act as a parallel processing flow, where data quality is checked rather than the classification model.

Such gatekeeper models are used quite frequently, such as when detecting objects in perception systems – the model detects objects in individual images, while the gatekeeper checks that the same object is identified consistently over sequences of consecutive images. They can form redundant processing channels and pipelines. They can form feasibility-checking components, or they can correct out-of-bounds results into proper values. Finally, they can also disconnect machine learning components from the pipeline and adapt these pipelines to other components of the software, usually algorithms that make decisions – thus forming self-adaptive or self-healing software systems.

This probabilistic nature of machine learning software means that pre-deployment activities are different from the traditional software. In particular, the process of testing machine learning and traditional software is different.

 

Testing and evaluation – the same but different

Every machine learning model needs to be validated, which means that the model needs to be able to provide correct inferences for a dataset that the model did not see before. The goal is to assess whether the model has learned patterns in the data, the data itself, or neither. The typical measures of correctness in classification problems are accuracy (the quotient of correctly inferred instances to all classified instances), Area Under Curve/Receiver Operation Characteristics (AUROC), and the true positive ratio (TPR) and false positive ratio (FPR).

For prediction problems, the quality of the model is measured in the mispredictions, such as the mean squared error (MSE). These measures quantify the errors in predictions – the smaller the values, the better the model. Figure 1.5 shows the process for the most common form of supervised learning:

Figure 1.5 – Model evaluation process for supervised learning

Figure 1.5 – Model evaluation process for supervised learning

In this process, the model is subjected to different data for every iteration of training, after which it is used to make inferences (classifications or regression) on the same test data. The test data is set aside before training, and it is used as input to the model only when validating, never during training.

Finally, some models are reinforcement learning models, where the quality is assessed by the ability of the model to optimize the output according to a predefined function (reward function). These measures allow the algorithm to optimize its operations and find the optimal solution – for example, in genetic algorithms, self-driving cars, or energy grid operations. The challenge with these models is that there is no single metric that can measure performance – it depends on the scenario, the function, and the amount of training that the model received. One famous example of such training is the algorithm from the War Games movie (from 1983), where the main supercomputer plays millions of tic-tac-toe games to understand that there is no strategy to win – the game has no winner.

Figure 1.6 presents the process of training a reinforcement system graphically:

Figure 1.6 – Reinforcement learning training process

Figure 1.6 – Reinforcement learning training process

We could get the impression that training, testing, and validating machine learning models are all we need when developing machine learning software. This is far from being true. The models are parts of larger systems, which means that they need to be integrated with other components; these components are not validated in the process of validation described in Figure 1.5 and Figure 1.6.

Every software system needs to undergo rigorous testing before it can be released. The goal of this testing is to find and remove as many defects as possible so that the user of the software experiences the best possible quality. Typically, the process of testing software is a process that comprises multiple phases. The process of testing follows the process of software development and aligns with that. In the beginning, software engineers (or testers) use unit tests to verify the correctness of their components.

Figure 1.7 presents how these three types of testing are related to one another. In unit testing, the focus is on algorithms. Often, this means that the software engineers must test individual functions and modules. Integration testing focuses on the connections between modules and how they can conduct tasks together. Finally, system testing and acceptance testing focus on the entire software product. The testers imitate real users to check that the software fulfills the requirements of the users:

Figure 1.7 – Three types of software testing – unit testing (left), integration testing (middle), and system and acceptance testing (right)

Figure 1.7 – Three types of software testing – unit testing (left), integration testing (middle), and system and acceptance testing (right)

The software testing process is very different than the process of model validation. Although we could mistake unit testing for model validation, this is not entirely the case. The output from the model validation process is one of the metrics (for example, accuracy), whereas the output from the unit test is true/false – whether the software produces the expected output or not. No known defects (equivalent to the false test results) are acceptable for a software company.

In traditional software testing, software engineers prepare a set of test cases to check whether their software works according to the specification. In machine learning software, the process of testing is based on setting aside part of the dataset (the test set) and checking how well the trained model (on the train set) works on that data.

Therefore, here is my fourth best practice for testing machine learning systems.

Best practice #4

Test the machine learning software as an addition to the typical train-validation-evaluation process of machine learning model development.

Testing the entire system is very important as the entire software system contains mechanisms to cope with the probabilistic nature of machine learning components. One such mechanism is the safety cage mechanism, where we can monitor the behavior of the machine learning components and prevent them from providing low-quality signals to the rest of the system (in the case of corner cases, close to the decision boundaries, in the inference process).

When we test the software, we also learn about the limitations of the machine learning components and our ability to handle the corner cases. Such knowledge is important for deploying the system when we need to specify the operational environment for the software. We need to understand the limitations related to the requirements and the specification of the software – the use cases for our software. Even more importantly, we need to understand the implications of the use of the software in terms of ethics and trustworthiness.

We’ll discuss ethics in Chapter 15 and Chapter 16, but it is important to understand that we need to consider ethics from the very beginning. If we don’t, we risk that our system makes potentially harmful mistakes, such as the ones made by large artificial intelligence hiring systems, face recognition systems, or self-driving vehicles. These harmful mistakes entail monetary costs, but more importantly, they entail loss of trust in the product and even missed opportunities.

 

Summary

Machine learning and traditional software are often perceived as two alternatives. However, they are more like siblings – one cannot function without the other. Machine learning models are very good at solving constrained problems, but they require traditional software for data collection, preparation, and presentation.

The probabilistic nature of machine learning models requires additional elements to make them useful in the context of complete software products. Therefore, we need to embrace this nature and use it to our advantage. Even for safety-critical systems, we could (and should) use machine learning when we know how to design safety mechanisms to prevent hazardous consequences.

In this chapter, we explored the differences between machine learning software and traditional software while focusing on how to design software that can contain both parts. We also showed that there is much more to machine learning software than just training, testing, and evaluating the model – we showed that rigorous testing makes sense and is necessary for deploying reliable software.

Now, it is time to move on to the next chapter, where we’ll open up the black box of machine learning software and explore what we need to develop a complete machine learning software product – starting from data acquisition and ending with user interaction.

 

References

  • Shortliffe, E.H., et al., Computer-based consultations in clinical therapeutics: explanation and rule acquisition capabilities of the MYCIN system. Computers and biomedical research, 1975. 8(4): p. 303-320.
  • James, G., et al., An introduction to statistical learning. Vol. 112. 2013: Springer.
  • Saleh, H., Machine Learning Fundamentals: Use Python and scikit-learn to get up and running with the hottest developments in machine learning. 2018: Packt Publishing Ltd.
  • Raschka, S. and V. Mirjalili, Python machine learning: Machine learning and deep learning with Python, scikit-learn, and TensorFlow 2. 2019: Packt Publishing Ltd.
  • Sommerville, I., Software engineering. 10th. Book Software Engineering. 10th, Series Software Engineering, 2015.
  • Houpis, C.H., G.B. Lamont, and B. Lamont, Digital control systems: theory, hardware, software. 1985: McGraw-Hill New York.
  • Sawhney, R., Can artificial intelligence make software development more productive? LSE Business Review, 2021.
  • He, X., K. Zhao, and X. Chu, AutoML: A survey of the state-of-the-art. Knowledge-Based Systems. 2021. 212: p. 106622.
  • Reed, S., et al., A generalist agent. arXiv preprint arXiv:2205.06175, 2022.
  • Floridi, L. and M. Chiriatti, GPT-3: Its nature, scope, limits, and consequences. Minds and Machines, 2020. 30(4): p. 681-694.
  • Creswell, A., et al., Generative adversarial networks: An overview. IEEE signal processing magazine, 2018. 35(1): p. 53-65.
  • Celebi, M.E. and K. Aydin, Unsupervised learning algorithms. 2016: Springer.
  • Chen, J.X., The evolution of computing: AlphaGo. Computing in Science & Engineering, 2016. 18(4): p. 4-7.
  • Ko, J.-S., J.-H. Huh, and J.-C. Kim, Improvement of energy efficiency and control performance of cooling system fan applied to Industry 4.0 data center. Electronics, 2019. 8(5): p. 582.
  • Dastin, J., Amazon scraps secret AI recruiting tool that showed bias against women. In Ethics of Data and Analytics. 2018, Auerbach Publications. p. 296-299.
  • Castelvecchi, D., Is facial recognition too biased to be let loose? Nature, 2020. 587(7834): p. 347-350.
  • Siddiqui, F., R. Lerman, and J.B. Merrill, Teslas running Autopilot involved in 273 crashes reported since last year. In The Washington Post. 2022.
About the Author
  • Miroslaw Staron

    Miroslaw Staron is a professor of Applied IT at the University of Gothenburg in Sweden with a focus on empirical software engineering, measurement, and machine learning. He is currently editor-in-chief of Information and Software Technology and co-editor of the regular Practitioner's Digest column of IEEE Software. He has authored books on automotive software architectures, software measurement, and action research. He also leads several projects in AI for software engineering and leads an AI and digitalization theme at Software Center. He has written over 200 journal and conference articles.

    Browse publications by this author
Machine Learning Infrastructure and Best Practices for Software Engineers
Unlock this book and the full library FREE for 7 days
Start now