You're reading from The Deep Learning Architect's Handbook

Product typeBook

Published inDec 2023

PublisherPackt

ISBN-139781803243795

Edition1st Edition

Concepts

Deep Learning

Author (1)

Ee Kin Chin

Deploying Deep Learning Models to Production

In the previous chapters, we delved into the intricacies of data preparation, deep learning (DL) model development, and how to deliver insightful outcomes from our DL models. Through meticulous data analysis, feature engineering, model optimization, and model analysis, we have learned the techniques to ensure our DL models can perform well and as desired. As we transition into the next phase of our journey, the focus now shifts toward deploying these DL models in production environments.

Reaching the stage of deploying a DL model to production is a significant accomplishment, considering that most models don’t make it that far. If your project has reached this milestone, it signifies that you have successfully satisfied stakeholders, presented valuable insights, and performed thorough value and metric analysis. Congratulations, as you are now one step closer to joining the small percentage of successful projects amidst countless...

Technical requirements

We will have a practical topic in the last section of this chapter. This tutorial requires you to have a Linux machine with an NVIDIA GPU device ideally in Ubuntu with Python 3.10 and the nvidia-docker tool installed. Additionally, we will require the following Python libraries to be installed:

numpy
transformers==4.21.3
nvidia-tensorrt==8.4.1.5
torch==1.12.0
transformers-deploy
tritonclient

The code files are available on GitHub: https://github.com/PacktPublishing/The-Deep-Learning-Architect-Handbook/tree/main/CHAPTER_15.

Exploring the crucial components for DL model deployment

So, what does it take to deploy a DL model? It starts with having a holistic view of each required component and defining clear requirements that guide decision-making for every aspect. This approach ensures alignment with the business goals and requirements, maximizing the chances of a successful deployment. With careful planning, diligent execution, and a focus on meeting the needs of the business, you can increase the likelihood of successfully deploying your DL model and unlocking its value for users. We will start by discovering components that are required to deploy a DL model.

Deploying a DL model to production involves more than just the trained model itself. It requires seamless collaboration among various components, working together to enable users to effectively extract value from the model’s predictions. These components are as follows:

Architectural choices: The overall design and structure of...

Identifying key DL model deployment requirements

To determine the most suitable deployment strategy from a variety of options, it is essential to identify and define seven key requirements. These are latency and availability, cost, scalability, model hardware, data privacy, safety, and trust and reliability requirements. Let’s dive into each of these requirements in detail:

Latency and availability requirements: These are two closely connected components and should be defined together. Availability requirements refer to the desired level of uptime and accessibility of the model’s prediction. Latency requirements refer to the maximum acceptable delay or response time that the models must meet to provide timely predictions or results. A deployment with a low availability requirement usually can tolerate high latency predictions, and vice versa. One reason is that a low-latency capable infrastructure can’t ensure low latency if it is not available when model...

Choosing the right DL model deployment options

Selecting the right deployment options for your DL model is a crucial step in ensuring optimal performance, scalability, and cost-effectiveness. To assist you in making an informed decision, we will explore recommended options based on different requirements. These recommendations encompass various aspects, such as hardware and physical infrastructure, monitoring and logging components, and deployment strategies. By carefully evaluating your model’s characteristics, resource constraints, and desired outcomes, you should be able to identify the most suitable deployment solution that aligns with your objectives while maximizing efficiency and return on investment through this guide. The tangible deployment components we will explore here are architectural decisions, computing hardware, model packaging and frameworks, communication protocols, and user interfaces. Let’s dive into each component one by one, starting with architectural...

Exploring deployment decisions based on practical use cases

In this section, we will explore practical deployment decisions for DL models in production, focusing on two distinct use cases: a sentiment analysis application for an e-commerce company and a face detection and recognition system for security cameras. By examining these real-world scenarios, we will gain valuable insights into establishing robust deployment strategies tailored to specific needs and objectives.

Exploring deployment decisions for a sentiment analysis application

Suppose you are developing a sentiment analysis application to be used by an e-commerce company to analyze customer reviews in real-time. The system needs to process a large number of reviews every day, and low latency is essential to provide immediate insights for the company. In this case, your choices could be as follows:

Architectural choice: As an independent service, as it would allow better scalability and easier updates to handle...

Discovering general recommendations for DL deployment

Here, we will discover DL deployment recommendations related to three verticals, namely model safety, trust, and reliability assurance, model latency optimization, and tools that help abstract model deployment-related decisions and ease the model deployment process. We will dive into the three verticals one by one.

Model safety, trust, and reliability assurance

Ensuring model safety, trust, and reliability is a crucial aspect of deploying DL systems. In this section, we will explore various recommendations and best practices to help you establish a robust framework for maintaining the integrity of your models. This includes compliance with regulations, implementing guardrails, prediction consistency, comprehensive testing, staging and production deployment strategies, usability tests, retraining and updating deployed models, human-in-the-loop decision-making, and model governance. By adopting these measures, you can effectively...

Deploying a language model with ONNX, TensorRT, and NVIDIA Triton Server

The three tools are ONNX, TensorRT, and NVIDIA Triton Server. ONNX and TensorRT are meant to perform GPU-based inference acceleration, while NVIDIA Triton Server is meant to host HTTP or GRPC APIs. We will explore these three tools practically in this section. TensorRT is known to perform the best model optimization toward the GPU to speed up inference, while NVIDIA Triton Server is a battle-tested tool for hosting DP models that have compatibility with TensorRT natively. ONNX, on the other hand, is an intermediate framework in the setup, which we will use primarily to host the weight formats that are directly supported by TensorRT.

In this practical tutorial, we will be deploying a Hugging Face-sourced language model that can be supported on most NVIDIA GPU devices. We will be converting our PyTorch-based language model from Hugging Face into ONNX weights, which will allow TensorRT to load the Hugging Face...

Summary

In this chapter, we explored the various aspects of deploying DL models in production environments, focusing on key components, requirements, and strategies. We discussed architectural choices, hardware infrastructure, model packaging, safety, trust, reliability, security, authentication, communication protocols, user interfaces, monitoring, and logging components, along with continuous integration and deployment.

This chapter also provided a step-by-step guide for choosing the right deployment options based on specific needs, such as latency, availability, scalability, cost, model hardware, data privacy, and safety requirements. We also explored general recommendations for ensuring model safety, trust, and reliability, optimizing model latency, and utilizing tools that simplify the deployment process.

A practical tutorial on deploying a language model with ONNX, TensorRT, and NVIDIA Triton Server was presented, showcasing a minimal workflow needed for accelerated deployment...

The rest of the chapter is locked

You have been reading a chapter from

The Deep Learning Architect's Handbook

Published in: Dec 2023Publisher: PacktISBN-13: 9781803243795

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Ee Kin Chin

Ee Kin Chin is a Senior Deep Learning Engineer at DataRobot. He holds a Bachelor of Engineering (Honours) in Electronics with a major in Telecommunications. Ee Kin is an expert in the field of Deep Learning, Data Science, Machine Learning, Artificial Intelligence, Supervised Learning, Unsupervised Learning, Python, Keras, Pytorch, and related technologies. He has a proven track record of delivering successful projects in these areas and is dedicated to staying up to date with the latest advancements in the field.
Read more about Ee Kin Chin

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages