Reader small image

You're reading from  Caffe2 Quick Start Guide

Product typeBook
Published inMay 2019
Reading LevelBeginner
PublisherPackt
ISBN-139781789137750
Edition1st Edition
Languages
Tools
Right arrow
Author (1)
Ashwin Nanjappa
Ashwin Nanjappa
author image
Ashwin Nanjappa

Ashwin Nanjappa is a senior architect at NVIDIA, working in the TensorRT team on improving deep learning inference on GPU accelerators. He has a PhD from the National University of Singapore in developing GPU algorithms for the fundamental computational geometry problem of 3D Delaunay triangulation. As a post-doctoral research fellow at the BioInformatics Institute (Singapore), he developed GPU-accelerated machine learning algorithms for pose estimation using depth cameras. As an algorithms research engineer at Visenze (Singapore), he implemented computer vision algorithm pipelines in C++, developed a training framework built upon Caffe in Python, and trained deep learning models for some of the world's most popular online shopping portals.
Read more about Ashwin Nanjappa

Right arrow

Deploying Models to Accelerators for Inference

In Chapter 3, Training Networks, we learned how to train deep neural networks using Caffe2. In this chapter, we will focus on inference: deploying a trained model in the field to infer results on new data. For efficient inference, the trained model is typically optimized for the accelerator on which it is deployed. In this chapter, we will focus on two popular accelerators: GPUs and CPUs, and the inference engines TensorRT and OpenVINO, which can be used to deploy Caffe2 models on them.

In this chapter, we will look at the following topics:

  • Inference engines
  • NVIDIA TensorRT
  • Intel OpenVINO

Inference engines

Popular DL frameworks, such as TensorFlow, PyTorch, and Caffe, are designed primarily for training deep neural networks. They focus on offering features that are more useful for researchers to experiment easily with different types of network structures, training regimens, and techniques to achieve optimum training accuracy to solve a particular problem in the real world. After a neural network model has been successfully trained, practitioners could continue to use the same DL framework for deploying the trained model for inference. However, there are more efficient deployment solutions for inference. These are pieces of inference software that compile a trained model into a computation engine that is most efficient in latency or throughput on the accelerator hardware used for deployment.

Much like a C or C++ compiler, inference engines take the trained model...

NVIDIA TensorRT

TensorRT is the most popular inference engine for deploying trained models on NVIDIA GPUs for inference. Not surprisingly, this library and its set of tools are developed by NVIDIA and it is available free for download and use. A new version of TensorRT typically accompanies the release of every new NVIDIA GPU architecture, adding optimizations for the new GPU architecture and also support for new types of layers, operators, and DL frameworks.

Installing TensorRT

TensorRT installers can be downloaded from the web at https://developer.nvidia.com/tensorrt. Installation packages are available for x86-64 (Intel or AMD 64-bit CPU) computers, PowerPC computers, embedded hardware such as NVIDIA TX1/TX2, and NVIDIA...

Intel OpenVINO

OpenVINO consists of libraries and tools created by Intel that enable you to optimize your trained DL model from any framework and then deploy it using an inference engine on Intel hardware. Supported hardware includes Intel CPUs, integrated graphics in Intel CPUs, Intel's Movidius Neural Compute Stick, and FPGAs. OpenVINO is available for free from Intel.

OpenVINO includes the following components:

  • Model optimizer: A tool that imports trained DL models from other DL frameworks, converts them, and then optimizes them. Supported DL frameworks include Caffe, TensorFlow, MXNet, and ONNX. Note the absence of support for Caffe2 or PyTorch.
  • Inference engine: These are libraries that load the optimized model produced by the model optimizer and provide your applications with the ability to run the model on Intel hardware.
  • Demos and samples: These simple applications...

Summary

In this chapter, we learned about inference engines and how they are an essential tool for the final deployment of a trained Caffe2 model on accelerators. We focused on two types of popular accelerators: NVIDIA GPUs and Intel CPUs. We looked at how to install and use TensorRT for deploying our Caffe2 model on NVIDIA GPUs. We also looked at the installation and use of OpenVINO for deploying our Caffe2 model on Intel CPUs and accelerators.

Many other companies, such as Google, Facebook, Amazon, and start-ups such as Habana and GraphCore, are developing new accelerator hardware for the inference of DL models. There are also efforts such as ONNX Runtime that are bringing together the inference engines from multiple vendors under one umbrella. Please evaluate these options and choose which accelerator hardware and software works best for deployment of your Caffe2 model.

In...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Caffe2 Quick Start Guide
Published in: May 2019Publisher: PacktISBN-13: 9781789137750
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Ashwin Nanjappa

Ashwin Nanjappa is a senior architect at NVIDIA, working in the TensorRT team on improving deep learning inference on GPU accelerators. He has a PhD from the National University of Singapore in developing GPU algorithms for the fundamental computational geometry problem of 3D Delaunay triangulation. As a post-doctoral research fellow at the BioInformatics Institute (Singapore), he developed GPU-accelerated machine learning algorithms for pose estimation using depth cameras. As an algorithms research engineer at Visenze (Singapore), he implemented computer vision algorithm pipelines in C++, developed a training framework built upon Caffe in Python, and trained deep learning models for some of the world's most popular online shopping portals.
Read more about Ashwin Nanjappa