Reader small image

You're reading from  Hands-On Neural Networks with TensorFlow 2.0

Product typeBook
Published inSep 2019
Reading LevelExpert
PublisherPackt
ISBN-139781789615555
Edition1st Edition
Languages
Right arrow
Author (1)
Paolo Galeone
Paolo Galeone
author image
Paolo Galeone

Paolo Galeone is a computer engineer with strong practical experience. After getting his MSc degree, he joined the Computer Vision Laboratory at the University of Bologna, Italy, as a research fellow, where he improved his computer vision and machine learning knowledge working on a broad range of research topics. Currently, he leads the Computer Vision and Machine Learning laboratory at ZURU Tech, Italy. In 2019, Google recognized his expertise by awarding him the title of Google Developer Expert (GDE) in Machine Learning. As a GDE, he shares his passion for machine learning and the TensorFlow framework by blogging, speaking at conferences, contributing to open-source projects, and answering questions on Stack Overflow.
Read more about Paolo Galeone

Right arrow

Bringing a Model to Production

In this chapter, the ultimate goal of any real-life machine learning application will be presented—the deployment and inference of a trained model. As we saw in the previous chapters, TensorFlow allows us to train models and save their parameters in checkpoint files, making it possible to restore the model's status and continue with the training process, while also running the inference from Python.

The checkpoint files, however, are not in the right file format when the goal is to use a trained machine learning model with low latency and a low memory footprint. In fact, the checkpoint files only contain the models' parameters value, without any description of the computation; this forces the program to define the model structure first and then restore the model parameters. Moreover, the checkpoint files contain variable values that...

The SavedModel serialization format

As we explained in Chapter 3, TensorFlow Graph Architecture, representing computations using DataFlow graphs has several advantages in terms of model portability since a graph is a language-agnostic representation of the computation.

SavedModel is a universal serialization format for TensorFlow models that extends the TensorFlow standard graph representation by creating a language-agnostic representation for the computation that is recoverable and hermetic. This representation has been designed not only to carry the graph description and values (like the standard graph) but also to offer additional features that were designed to simplify the usage of the trained models in heterogeneous production environments.

TensorFlow 2.0 has been designed with simplicity in mind. This design choice is visible in the following diagram, where it is possible...

Python deployment

Using Python, it is straightforward to load the computational graphs stored inside a SavedModel and use them as native Python functions. This is all thanks to the TensorFlow Python API. The tf.saved_model.load(path) method deserializes the SavedModel located in path and returns a trackable object with a signatures attribute that contains the mapping from the signature keys to Python functions that are ready to be used.

The load method is capable of deserializing the following:

  • Generic computational graphs, such as the ones we created in the previous section
  • Keras models
  • SavedModel created using TensorFlow 1.x or the Estimator API

Generic computational graph

Let's say we are interested in loading the...

Supported deployment platforms

As shown in the diagram at the beginning of this chapter, SavedModel is the input for a vast ecosystem of deployment platforms, with each one being created to satisfy a different range of use cases:

  • TensorFlow Serving: This is the official Google solution for serving machine learning models. It supports model versioning, multiple models can be deployed in parallel, and it ensures that concurrent models achieve high throughput with low latency thanks to its complete support for hardware accelerators (GPUs and TPUs). TensorFlow Serving is not merely a deployment platform, but an entire ecosystem built around TensorFlow and written in highly efficient C++ code. Currently, this is the solution Google itself uses to run tens of millions of inferences per second on Google Cloud's ML platform.
  • TensorFlow Lite: This is the deployment platform of choice...

Summary

In this chapter, we looked at the SavedModel serialization format. This standardized serialization format was designed with the goal of simplifying the deployment of machine learning models on many different platforms.

SavedModel is a language-agnostic, self-contained representation of the computation, and the whole TensorFlow ecosystem supports it. Deploying a trained machine learning model on embedded devices, smartphones, browsers, or using many different languages is possible thanks to the conversion tools based on the SavedModel format or the native support offered by the TensorFlow bindings for other languages.

The easiest way to deploy a model is by using Python since the TensorFlow 2.0 API has complete support for the creation, restoration, and manipulation of SavedModel objects. Moreover, the Python API offers additional features and integrations between the Keras...

Exercises

The following exercises are programming challenges, combining the expressive power of the TensorFlow Python API and the advantages brought by other programming languages:

  1. What is a checkpoint file?
  2. What is a SavedModel file?
  3. What are the differences between a checkpoint and a SavedModel?
  4. What is a SignatureDef?
  5. Can a checkpoint have a SignatureDef?
  6. Can a SavedModel have more than one SignatureDef?
  7. Export a computational graph as a SavedModel that computes the batch matrix multiplication; the returned dictionary must have a meaningful key value.
  8. Convert the SavedModel defined in the previous exercise into its TensorFlow.js representation.
  9. Use the model.json file we created in the previous exercise to develop a simple web page that computes the multiplication of matrices chosen by the user.
  10. Restore the semantic segmentation model defined in Chapter 8, Semantic Segmentation...
lock icon
The rest of the chapter is locked
You have been reading a chapter from
Hands-On Neural Networks with TensorFlow 2.0
Published in: Sep 2019Publisher: PacktISBN-13: 9781789615555
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Paolo Galeone

Paolo Galeone is a computer engineer with strong practical experience. After getting his MSc degree, he joined the Computer Vision Laboratory at the University of Bologna, Italy, as a research fellow, where he improved his computer vision and machine learning knowledge working on a broad range of research topics. Currently, he leads the Computer Vision and Machine Learning laboratory at ZURU Tech, Italy. In 2019, Google recognized his expertise by awarding him the title of Google Developer Expert (GDE) in Machine Learning. As a GDE, he shares his passion for machine learning and the TensorFlow framework by blogging, speaking at conferences, contributing to open-source projects, and answering questions on Stack Overflow.
Read more about Paolo Galeone