Reader small image

You're reading from  Machine Learning Engineering with Python - Second Edition

Product typeBook
Published inAug 2023
Reading LevelIntermediate
PublisherPackt
ISBN-139781837631964
Edition2nd Edition
Languages
Right arrow
Author (1)
Andrew P. McMahon
Andrew P. McMahon
author image
Andrew P. McMahon

Andrew P. McMahon has spent years building high-impact ML products across a variety of industries. He is currently Head of MLOps for NatWest Group in the UK and has a PhD in theoretical condensed matter physics from Imperial College London. He is an active blogger, speaker, podcast guest, and leading voice in the MLOps community. He is co-host of the AI Right podcast and was named ‘Rising Star of the Year' at the 2022 British Data Awards and ‘Data Scientist of the Year' by the Data Science Foundation in 2019.
Read more about Andrew P. McMahon

Right arrow

Building an Example ML Microservice

This chapter will be all about bringing together some of what we have learned in the book so far with a realistic example. This will be based on one of the scenarios introduced in Chapter 1, Introduction to ML Engineering, where we were required to build a forecasting service for store item sales. We will discuss the scenario in a bit of detail and outline the key decisions that have to be made to make a solution a reality, before showing how we can employ the processes, tools, and techniques we have learned through out this book to solve key parts of the problem from an ML engineering perspective. By the end of this chapter, you should come away with a clear view of how to build your own ML microservices for solving a variety of business problems.

In this chapter, we will cover the following topics:

  • Understanding the forecasting problem
  • Designing our forecasting service
  • Selecting the tools
  • Training at scale
  • ...

Technical requirements

The code examples in this chapter will be simpler to follow if you have the following installed and running on your machine:

  • Postman or another API development tool
  • A local Kubernetes cluster manager like minikube or kind
  • The Kubernetes CLI tool, kubectl

There are several different conda environment .yml files contained in the Chapter08 folder in the book’s GitHub repo for the technical examples, as there are a few different sub-components. These are:

  • mlewp-chapter08-train: This specifies the environment for running the training scripts.
  • mlewp-chapter08-serve: This specifies the environment for the local FastAPI web service build.
  • mlewp-chapter08-register: This gives the environment specification for running the MLflow tracking server.

In each case, create the Conda environment, as usual, with:

conda env create –f <ENVIRONMENT_NAME>.yml

The Kubernetes examples in this...

Understanding the forecasting problem

In Chapter 1, Introduction to ML Engineering, we considered the example of an ML team that has been tasked with providing forecasts of items at the level of individual stores in a retail business. The fictional business users had the following requirements:

  • The forecasts should be rendered and accessible via a web-based dashboard.
  • The user should be able to request updated forecasts if necessary.
  • The forecasts should be carried out at the level of individual stores.
  • Users will be interested in their own regions/stores in any one session and not be concerned with global trends.
  • The number of requests for updated forecasts in any one session will be small.

Given these requirements, we can work with the business to create the following user stories, which we can put into a tool such as Jira, as explained in Chapter 2, The Machine Learning Development Process. Some examples of user stories covering these...

Designing our forecasting service

The requirements in the Understanding the forecasting problem section are the definitions of the targets we need to hit, but they are not the method for getting there. Drawing on our understanding of design and architecture from Chapter 5, Deployment Patterns and Tools, we can start building out our design.

First, we should confirm what kind of design we should be working on. Since we need dynamic requests, it makes sense that we follow the microservice architecture discussed in Chapter 5, Deployment Patterns and Tools. This will allow us to build a service that has the sole focus of retrieving the right model from our model store and performing the requested inference. The prediction service should therefore have interfaces available between the dashboard and the model store.

Furthermore, since a user may want to work with a few different store combinations in any one session and maybe switch back and forth between the forecasts of these...

Selecting the tools

Now that we have a high-level design in mind and we have written down some clear technical requirements, we can begin to select the toolset we will use to implement our solution.

One of the most important considerations on this front will be what framework we use for modeling our data and building our forecasting functionality. Given that the problem is a time-series modeling problem with a need for fast retraining and prediction, we can consider the pros and cons of a few options that may fit the bill before proceeding.

The results of this exercise are shown in Table 8.2:

Tool/Framework

Pros

Cons

Scikit-learn

  • Already understood by almost all data scientists.
  • Very easy-to-use syntax.
  • Lots of great community support.
  • ...

Training at scale

When we introduced Ray in Chapter 6, Scaling Up, we mentioned use cases where the data or processing time requirements were such that using a very scalable parallel computing framework made sense. What was not made explicit is that sometimes these requirements come from the fact that we actually want to train many models, not just one model on a large amount of data or one model more quickly. This is what we will do here.

The retail forecasting example we described in Chapter 1, Introduction to ML Engineering uses a data set with several different retail stores in it. Rather than creating one model that could have a store number or identifier as a feature, a better strategy would perhaps be to train a forecasting model for each individual store. This is likely to give better accuracy as the features of the data at the store level which may give some predictive power will not be averaged out by the model looking at a combination of all the stores together. This...

Serving the models with FastAPI

The simplest and potentially most flexible approach to serving ML models in a microservice with Python is in wrapping the serving logic inside a lightweight web application. Flask has been a popular option among Python users for many years but now the FastAPI web framework has many advantages, which means it should be seriously considered as a better alternative.

Some of the features of FastAPI that make it an excellent choice for a lightweight microservice are:

  • Data validation: FastAPI uses and is based on the Pydantic library, which allows you to enforce type hints at runtime. This allows for the implementation of very easy-to-create data validation steps that make your system way more robust and helps avoid edge case behaviors.
  • Built-in async workflows: FastAPI gives you asynchronous task management out of the box with async and await keywords, so you can build the logic you will need in many cases relatively seamlessly without...

Containerizing and deploying to Kubernetes

When we introduced Docker in Chapter 5, Deployment Patterns and Tools, we showed how you can use it to encapsulate your code and then run it across many different platforms consistently.

Here we will do this again, but with the idea in mind that we don’t just want to run the application as a singleton on a different piece of infrastructure, we actually want to allow for many different replicas of the microservice to be running simultaneously with requests being routed effectively by a load balancer. This means that we can take what works and make it work at almost arbitrarily large scales.

We will do this by executing several steps:

  1. Containerize the application using Docker.
  2. Push this Docker container to Docker Hub to act as our container storage location (you could use another container management solution like AWS Elastic Container Registry or similar solutions on another cloud provider for this step).
  3. ...

Summary

In this chapter, we walked through an example of how to take the tools and techniques from the first seven chapters of this book and apply them together to solve a realistic business problem. We discussed in detail how the need for a dynamically triggered forecasting algorithm can lead very quickly to a design that requires several small services to interact seamlessly. In particular, we created a design with components responsible for handling events, training models, storing models, and performing predictions. We then walked through how we would choose our toolset to build to this design in a real-world scenario, by considering things such as appropriateness for the task at hand, as well as likely developer familiarity. Finally, we carefully defined the key pieces of code that would be required to build the solution to solve the problem repeatedly and robustly.

In the next, and final, chapter, we will build out an example of a batch ML process. We will name the pattern...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Machine Learning Engineering with Python - Second Edition
Published in: Aug 2023Publisher: PacktISBN-13: 9781837631964
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime

Author (1)

author image
Andrew P. McMahon

Andrew P. McMahon has spent years building high-impact ML products across a variety of industries. He is currently Head of MLOps for NatWest Group in the UK and has a PhD in theoretical condensed matter physics from Imperial College London. He is an active blogger, speaker, podcast guest, and leading voice in the MLOps community. He is co-host of the AI Right podcast and was named ‘Rising Star of the Year' at the 2022 British Data Awards and ‘Data Scientist of the Year' by the Data Science Foundation in 2019.
Read more about Andrew P. McMahon