Reader small image

You're reading from  Machine Learning Engineering with Python - Second Edition

Product typeBook
Published inAug 2023
Reading LevelIntermediate
PublisherPackt
ISBN-139781837631964
Edition2nd Edition
Languages
Right arrow
Author (1)
Andrew P. McMahon
Andrew P. McMahon
author image
Andrew P. McMahon

Andrew P. McMahon has spent years building high-impact ML products across a variety of industries. He is currently Head of MLOps for NatWest Group in the UK and has a PhD in theoretical condensed matter physics from Imperial College London. He is an active blogger, speaker, podcast guest, and leading voice in the MLOps community. He is co-host of the AI Right podcast and was named ‘Rising Star of the Year' at the 2022 British Data Awards and ‘Data Scientist of the Year' by the Data Science Foundation in 2019.
Read more about Andrew P. McMahon

Right arrow

Deployment Patterns and Tools

In this chapter, we will dive into some important concepts around the deployment of your machine learning (ML) solution. We will begin to close the circle of the ML development lifecycle and lay the groundwork for getting your solutions out into the world.

The act of deploying software, of taking it from a demo you can show off to a few stakeholders to a service that will ultimately impact customers or colleagues, is a very exhilarating but often challenging exercise. It also remains one of the most difficult aspects of any ML project and getting it right can ultimately make the difference between generating value or just hype.

We are going to explore some of the main concepts that will help your ML engineering team cross the chasm between a fun proof-of-concept to solutions that can run on scalable infrastructure in an automated way. This will require us to first cover questions of how to design and architect your ML systems, particularly if...

Join our book community on Discord

https://packt.link/EarlyAccessCommunity

In previous chapters, we introduced a lot of the tools and techniques you will need to use to successfully build working Machine Learning (ML) products. We also introduced a lot of example pieces of code that helped us to understand how to implement these tools and techniques. So far, this has all been about what we need to program, but this chapter will focus on how to program. In particular, we will introduce and work with a lot of the techniques, methodologies, and standards that are prevalent in the wider Python software development community and apply them to ML use cases. The conversation will be centered around the concept of developing user-defined libraries and packages, reusable pieces of code that you can use for deploying your ML solutions or for developing new ones. It is important to note that everything we discuss here can be applied to all of your Python development activities across your ML...

Technical requirements

In order to run the examples in this chapter you will need to make sure you have installed:

  • Scikit-learn
  • the Unix make(P-code) utility

Writing good Python

As discussed throughout this book, Python is an extremely popular and very versatile programming language. Some of the most widely used software products in the world, and some of the most widely used ML engineering solutions in the world, use Python as a core language. Given this scope and scale, it is clear that if we are to write similarly amazing pieces of ML-driven software, we should once again follow the best practices and standards already adopted by these solutions. In the following sections, we will explore what packaging up means in practice, and start to really level up our ML code in terms of quality and consistency.

Recapping the basics

Before we get stuck into some more advanced concepts, let's make sure we are all on the same page and go over some of the basic terminology of the Python world. This will ensure that we apply the right thought processes to the right things and that we can feel confident when writing our code.

In Python, we have the...

Choosing a style

This section will provide a summary of two coding styles or paradigms, which make use of different organizational principles and capabilities of Python. Whether you write your code in an object-orientated or functional style could just be an aesthetic choice. This choice, however, can also provide other benefits, such as code that is more aligned with the logical elements of your problem, code that is easier to understand, or even more performant code.

In the following sections, we will outline the main principles of each paradigm and allow you to choose for yourself based on your use case.

Object-oriented programming

Object-Oriented Programming (OOP) is a style where the code is organized around, you guessed it, abstract objects with relevant attributes and data instead of around the logical flow of your solution. The subject of OOP is worth a book (or several books!) in itself, so we will focus on the key points that are relevant for our ML engineering journey.

First...

Packaging your code

In some ways, it is interesting that Python has taken the world by storm. It is dynamically typed and non-compiled, so it can be quite different to work with compared to Java or C++. This particularly comes to the fore when we think about packaging our Python solutions. For a compiled language, the main target is to produce a compiled artifact that can run on the chosen environment, a Java jar for example. Python requires that the environment you run in has an appropriate Python interpreter and the ability to install the libraries and packages you need. There is also no single compiled artifact created, so you often need to deploy your whole code base as is.

Despite this, Python has indeed taken the world by storm, especially for ML. As we are ML engineers thinking about taking models to production, we would be remiss to not understand how to package and share Python code in a way that helps others to avoid repetition, to trust in the solution, and to be able to easily...

Building your package

In our example, we can package up our solution using the setuptools library. In order to do this, you must create a file called setup.py that contains the important metadata for your solution, including the location of the relevant packages it requires. An example of setup.py is shown in the following code block. This shows how to do this for a simple package that wraps some of the outlier detection functionality we have been mentioning in this chapter:

from setuptools import setup
setup(name='outliers',
     version='0.1',
     description='A simple package to wrap some outlier detection functionality',
     author='Andrew McMahon',
     license='MIT',
     packages=['outliers'],
     zip_safe=False)

We can see that setuptools allows you to supply metadata such as the name of the package, the version number, and the software license. Once you have this file in the root directory of your project, you can...

Testing, logging, securing and error handling

Building code that performs an ML task may seem like the end goal, but it is only one piece of the puzzle. We also want to be confident that this code will work and if it doesn't, we will be able to fix it. This is where the concepts of testing, logging, and error handling come in, which the next few sections cover at a high level.

Testing

One of the most important features that sets your ML engineered code apart from typical research scripts is the presence of robust testing. It is critical that any system you are designing for deployment can be trusted not to fall down all the time and that you can catch issues during the development process.

Luckily, since Python is a general-purpose programming language, it is replete with tools for performing tests on your software. In this chapter, we will use PyTest, which is one of the most popular, powerful, and easy-to-use testing toolsets for Python code available. PyTest is particularly useful...

Not reinventing the wheel

You will already have noticed through this chapter (or I hope you have!) that a lot of the functionality that you need for your ML and Python project has already been built. One of the most important things you can learn as an ML engineer is that you are not supposed to build everything from scratch. You can do this in a variety of ways, the most obvious of which is to use other packages in your own solution and then build functionality that enriches what is already there. As an example, you do not need to build basic regression modeling capabilities since they exist in a variety of packages, but you might have to add a new type of regressor or use some specific domain knowledge or trick you have developed. In this case, you would be justified in writing your own code on top of the existing solution. You can also use a variety of concepts from Python, such as wrapper classes or decorators, as well. The key message is that although there is a lot of work for you...

Summary

This chapter has been all about best practices for when you write your own Python packages for your ML solutions. We went over some of the basic concepts of Python programming as a refresher before covering some tips and tricks and good techniques to bear in mind. We covered the importance of coding standards in Python and PySpark. We then performed a comparison between object-oriented and functional programming paradigms for writing your code. We moved onto the details of taking the high-quality code you have written and packaging it up into something you can distribute across multiple platforms and use cases. To do this, we looked into different tools, designs, and setups you could use to make this a reality. This included a brief discussion of how to find good use cases for packaging up. We continued with a summary of some housekeeping tips for your code, including how to test, log, and monitor in your solution. We finished with a brief philosophical point on the importance...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Machine Learning Engineering with Python - Second Edition
Published in: Aug 2023Publisher: PacktISBN-13: 9781837631964
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at £13.99/month. Cancel anytime

Author (1)

author image
Andrew P. McMahon

Andrew P. McMahon has spent years building high-impact ML products across a variety of industries. He is currently Head of MLOps for NatWest Group in the UK and has a PhD in theoretical condensed matter physics from Imperial College London. He is an active blogger, speaker, podcast guest, and leading voice in the MLOps community. He is co-host of the AI Right podcast and was named ‘Rising Star of the Year' at the 2022 British Data Awards and ‘Data Scientist of the Year' by the Data Science Foundation in 2019.
Read more about Andrew P. McMahon