You're reading from Machine Learning Engineering with Python - Second Edition

Product type Book

Published in Aug 2023

Publisher Packt

ISBN-13 9781837631964

Pages 462 pages

Edition 2nd Edition

Languages

Python

Concepts

Machine Learning

Author (1):

Andrew P. McMahon

Table of Contents (12) Chapters

Preface

1. Introduction to ML Engineering

2. The Machine Learning Development Process

3. From Model to Model Factory

4. Packaging Up

5. Deployment Patterns and Tools

6. Scaling Up

7. Deep Learning, Generative AI, and LLMOps

8. Building an Example ML Microservice

9. Building an Extract, Transform, Machine Learning Use Case

10. Other Books You May Enjoy

11. Index

Preface

”Software is eating the world, but AI is going to eat software.”

— Jensen Huang, CEO of Nvidia

Machine learning (ML), part of the wider field of Artificial Intelligence (AI), is rightfully recognized as one of the most powerful tools available for organizations to extract value from their data. As the capabilities of ML algorithms have grown over the years, it has become increasingly obvious that implementing such algorithms in a scalable, fault-tolerant, and automated way requires the creation of new disciplines. These disciplines, ML Engineering (MLE) and ML Operations (MLOps), are the focus of this book.

The book covers a wide variety of topics in order to help you understand the tools, techniques, and processes you can apply to engineer your ML solutions, with an emphasis on introducing the key concepts so that you can build on them in your future work. The aim is to develop fundamentals and a broad understanding that will stand the test of time, rather than just provide a series of introductions to the latest tools, although we do cover a lot of the latest tools!

All of the code examples are given in Python, the most popular programming language in the world (at the time of writing) and the lingua franca for data applications. Python is a high-level and object-oriented language with a rich ecosystem of tools focused on data science and ML. For example, packages such as scikit-learn and pandas have have become part of the standard lexicon for data science teams across the world. The central tenet of this book is that knowing how to use packages like these is not enough. In this book, we will use these tools, and many, many more, but focus on how to wrap them up in production-grade pipelines and deploy them using appropriate cloud and open-source tools.

We will cover everything from how to organize your ML team, to software development methodologies and best practices, to automating model building through to packaging your ML code, how to deploy your ML pipelines to a variety of different targets and then on to how to scale your workloads for large batch runs. We will also discuss, in an entirely new chapter for this second edition, the exciting world of applying ML engineering and MLOps to deep learning and generative AI, including how to start building solutions using Large Language Models (LLMs) and the new field of LLM Operations (LLMOps).

The second edition of Machine Learning Engineering with Python goes into a lot more depth than the first edition in almost every chapter, with updated examples and more discussion of core concepts. There is also a far wider selection of tools covered and a lot more focus on open-source tools and development. The ethos of focusing on core concepts remains, but it is my hope that this wider view means that the second edition will be an excellent resource for those looking to gain practical knowledge of machine learning engineering.

Although a far greater emphasis has been placed on using open-source tooling, many examples will also leverage services and solutions from Amazon Web Services (AWS). I believe that the accompanying explanations and discussions will, however, mean that you can still apply everything you learn here to any cloud provider or even in an on-premises setting.

Machine Learning Engineering with Python, Second Edition will help you to navigate the challenges of taking ML to production and give you the confidence to start applying MLOps in your projects. I hope you enjoy it!

Who this book is for

This book is for machine learning engineers, data scientists, and software developers who want to build robust software solutions with ML components. It is also relevant to anyone who manages or wants to understand the production life cycle of these systems. The book assumes intermediate-level knowledge of Python and some basic exposure to the concepts of machine learning. Some basic knowledge of AWS and the use of Unix tools like bash or zsh will also be beneficial.

What this book covers

Chapter 1, Introduction to ML Engineering, explains the core concepts of machine learning engineering and machine learning operations. Roles within ML teams are discussed in detail and the challenges of ML engineering and MLOps are laid out.

Chapter 2, The Machine Learning Development Process, explores how to organize and successfully execute an ML engineering project. This includes a discussion of development methodologies like Agile, Scrum and CRISP-DM, before sharing a project methodology developed by the author that is referenced to throughout the book. This chapter also introduces continuous integration/continouous deployment (CI/CD) and developer tools.

Chapter 3, From Model to Model Factory, shows how to standardize, systematize and automate the process of training and deploying machine learning models. This is done using the author’s concept of the model factory, a methodology for repeatable model creation and validation. This chapter also discusses key theoretical concepts important for understanding machine learning models and covers different types of drift detection and model retrain triggering criteria.

Chapter 4, Packaging Up, discusses best practices for coding in Python and how this relates to building your own packages, libraries and components for reuse in multiple projects. This chapter covers fundamental Python programming concepts before moving onto more advanced concepts, and then discusses package and environment management, testing, logging and error handling and security.

Chapter 5, Deployment Patterns and Tools, teaches you some of the standard ways you can design your ML system and then get it into production. This chapter focusses on architectures, system design and deployment patterns first before moving onto using more advanced tools to deploy microservices, including containerization and AWS Lambda. The popular ZenML and Kubeflow pipelining and deployment platforms are then reviewed in detail with examples.

Chapter 6, Scaling Up, is all about developing with large datasets in mind. For this, the Apache Spark and Ray frameworks are discussed in detail with worked examples. The focus for this chapter is on scaling up batch workloads where massive compute is required.

Chapter 7, Deep Learning, Generative AI and LLMOps, covers the latest concepts and techniques for training and deploying deep learning models for production use cases. This chapter includes material discussing the new wave of generative models, with a particular focus on Large Language Models (LLMs) and the challenges for ML engineers looking to productionize these models. This leads us onto define the core elements of LLM Operations (LLMOps).

Chapter 8, Building an Example ML Microservice, walks through the building of a machine learning microservice that serves a forecasting solution using FastAPI, Docker and Kubernetes. This pulls together many of the previous concepts developed throughout the book.

Chapter 9, Building an Extract, Transform, Machine Learning Use Case, builds out an example of a batch processing ML system that leverages standard ML algorithms and augments these with the use of LLMs. This shows a concrete application of LLMs and LLMOps, as well as providing a more advanced discussion of Airflow DAGs.

To get the most out of this book

In this book, some previous exposure to Python development is assumed. Many introductory concepts are covered for completeness but in general it will be easier to get through the examples if you have already written at least some Python programs before. The book also assumes some exposure to the main concepts from machine learning, such as what a model is, what training and inference refer to and an understanding of similar concepts. Several of these are recapped in the text but again it will be a smoother ride if you have previously been acquainted with the main ideas behind building a machine learning model, even at a rudimentary level.
On the technical side, to get the most out of the examples in the book, you will need access to a computer or server where you have privileges to install and run Python and other software packages and applications. For many of the examples, access to a UNIX type terminal, such as bash or zsh, is assumed. The examples in this book were written and tested on both a Linux machine running Ubuntu LTS and an M2 Macbook Pro running macOS. If you use a different setup, for example Windows, the examples may require some translation in order to work for your system. Note that the use of the M2 Macbook Pro means several examples show some additional information to get the examples working on Apple Silicon devices. These sections can comfortably be skipped if your system does not require this extra setup.
Many of the Cloud based examples leverage Amazon Web Services (AWS) and so require an AWS account with billing setup. Most of the examples will use the free-tier services available from AWS but this is not always possible. Caution is advised in order to avoid large bills. If in doubt, it is recommended you consult the AWS documentation for more information. As a concrete example of this, In Chapter 5, Deployment Patterns and Tools, we use the Managed Workflows with Apache Spark (MWAA) service from AWS. There is no free tier option for MWAA so as soon as you spin up the example, you will be charged for the environment and any instances. Ensure you are happy to do this before proceeding and I recommend tearing down your MWAA instances when finished.
Conda and Pip are used for package and environment management throughout this book, but Poetry is also used in many cases. To facilitate easy reproduction of development environments for each chapters in the book’s GitHub repository, (https://github.com/PacktPublishing/Machine-Learning-Engineering-with-Python-Second-Edition), each chapter of the book has a corresponding folder and within that folder are requirements.txt and Conda environment.yml files, as well as helpful README files. The commands for replicating the environments and any other requirements are given at the beginning of each chapter within the book.
If you are using the digital version of this book, I still adviseyou to type the code yourself or access the code from the book’s GitHub repository (https://github.com/PacktPublishing/Machine-Learning-Engineering-with-Python-Second-Edition). Doing so will help you avoid any potential errors related to the copying and pasting of code.

Download the example code files

As mentioned above, the code bundle for the book is hosted on GitHub at https://github.com/PacktPublishing/Machine-Learning-Engineering-with-Python-Second-Edition. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://packt.link/LMqir.

Conventions used

There are a number of text conventions used throughout this book.

CodeInText: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. For example: “First, we must import the TabularDrift detector from the alibi-detect package, as well as the relevant packages for loading and splitting the data.”

A block of code is set as follows:

from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
import alibi
from alibi_detect.cd import TabularDrift

Any command-line input or output is written as follows and are indicated as command-line commands in the main body of the text:

pip install tensorflow-macos

Bold: Indicates a new term, an important word, or words that you see on the screen. For instance, words in menus or dialog boxes appear in the text like this. For example: “Select the Deploy button. This will provide a dropdown where you can select Create service.”

References to additional resources or background information appear like this.

Helpful tips and important caveats appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: Email feedback@packtpub.com and mention the book’s title in the subject of your message. If you have questions about any aspect of this book, please email us at questions@packtpub.com.

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you reported this to us. Please visit http://www.packtpub.com/submit-errata, click Submit Errata, and fill in the form.

Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at copyright@packtpub.com with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit http://authors.packtpub.com.

Share your thoughts

Once you’ve read Machine Learning Engineering with Python - Second Edition, we’d love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.

Your review is important to us and the tech community and will help us make sure we’re delivering excellent quality content.

Download a free PDF copy of this book

Thanks for purchasing this book!

Do you like to read on the go but are unable to carry your print books everywhere?Is your eBook purchase not compatible with the device of your choice?

Don’t worry, now with every Packt book you get a DRM-free PDF version of that book at no cost.

Read anywhere, any place, on any device. Search, copy, and paste code from your favorite technical books directly into your application.

The perks don’t stop there, you can get exclusive access to discounts, newsletters, and great free content in your inbox daily

Follow these simple steps to get the benefits:

Scan the QR code or visit the link below

https://packt.link/free-ebook/9781837631964
Submit your proof of purchase
That’s it! We’ll send your free PDF and other benefits to your email directly