Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
The Deep Learning Architect's Handbook

You're reading from  The Deep Learning Architect's Handbook

Product type Book
Published in Dec 2023
Publisher Packt
ISBN-13 9781803243795
Pages 516 pages
Edition 1st Edition
Languages
Author (1):
Ee Kin Chin Ee Kin Chin
Profile icon Ee Kin Chin

Table of Contents (25) Chapters

Preface Part 1 – Foundational Methods
Chapter 1: Deep Learning Life Cycle Chapter 2: Designing Deep Learning Architectures Chapter 3: Understanding Convolutional Neural Networks Chapter 4: Understanding Recurrent Neural Networks Chapter 5: Understanding Autoencoders Chapter 6: Understanding Neural Network Transformers Chapter 7: Deep Neural Architecture Search Chapter 8: Exploring Supervised Deep Learning Chapter 9: Exploring Unsupervised Deep Learning Part 2 – Multimodal Model Insights
Chapter 10: Exploring Model Evaluation Methods Chapter 11: Explaining Neural Network Predictions Chapter 12: Interpreting Neural Networks Chapter 13: Exploring Bias and Fairness Chapter 14: Analyzing Adversarial Performance Part 3 – DLOps
Chapter 15: Deploying Deep Learning Models to Production Chapter 16: Governing Deep Learning Models Chapter 17: Managing Drift Effectively in a Dynamic Environment Chapter 18: Exploring the DataRobot AI Platform Chapter 19: Architecting LLM Solutions Index Other Books You May Enjoy

Deploying a language model with ONNX, TensorRT, and NVIDIA Triton Server

The three tools are ONNX, TensorRT, and NVIDIA Triton Server. ONNX and TensorRT are meant to perform GPU-based inference acceleration, while NVIDIA Triton Server is meant to host HTTP or GRPC APIs. We will explore these three tools practically in this section. TensorRT is known to perform the best model optimization toward the GPU to speed up inference, while NVIDIA Triton Server is a battle-tested tool for hosting DP models that have compatibility with TensorRT natively. ONNX, on the other hand, is an intermediate framework in the setup, which we will use primarily to host the weight formats that are directly supported by TensorRT.

In this practical tutorial, we will be deploying a Hugging Face-sourced language model that can be supported on most NVIDIA GPU devices. We will be converting our PyTorch-based language model from Hugging Face into ONNX weights, which will allow TensorRT to load the Hugging Face...

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}