Reader small image

You're reading from  Deep Learning with TensorFlow and Keras – 3rd edition - Third Edition

Product typeBook
Published inOct 2022
PublisherPackt
ISBN-139781803232911
Edition3rd Edition
Right arrow
Authors (3):
Amita Kapoor
Amita Kapoor
author image
Amita Kapoor

Amita Kapoor is an accomplished AI consultant and educator, with over 25 years of experience. She has received international recognition for her work, including the DAAD fellowship and the Intel Developer Mesh AI Innovator Award. She is a highly respected scholar in her field, with over 100 research papers and several best-selling books on deep learning and AI. After teaching for 25 years at the University of Delhi, Amita took early retirement and turned her focus to democratizing AI education. She currently serves as a member of the Board of Directors for the non-profit Neuromatch Academy, fostering greater accessibility to knowledge and resources in the field. Following her retirement, Amita also founded NePeur, a company that provides data analytics and AI consultancy services. In addition, she shares her expertise with a global audience by teaching online classes on data science and AI at the University of Oxford.
Read more about Amita Kapoor

Antonio Gulli
Antonio Gulli
author image
Antonio Gulli

Antonio Gulli has a passion for establishing and managing global technological talent for innovation and execution. His core expertise is in cloud computing, deep learning, and search engines. Currently, Antonio works for Google in the Cloud Office of the CTO in Zurich, working on Search, Cloud Infra, Sovereignty, and Conversational AI.
Read more about Antonio Gulli

Sujit Pal
Sujit Pal
author image
Sujit Pal

Sujit Pal is a Technology Research Director at Elsevier Labs, an advanced technology group within the Reed-Elsevier Group of companies. His interests include semantic search, natural language processing, machine learning, and deep learning. At Elsevier, he has worked on several initiatives involving search quality measurement and improvement, image classification and duplicate detection, and annotation and ontology development for medical and scientific corpora.
Read more about Sujit Pal

View More author details
Right arrow

Tensor Processing Unit

This chapter introduces the Tensor Processing Unit (TPU), a special chip developed at Google for ultra-fast execution of neural network mathematical operations. As with Graphics Processing Units (GPUs), the idea here is to have a special processor focusing only on very fast matrix operations, with no support for all the other operations normally supported by Central Processing Units (CPUs). However, the additional improvement with TPUs is to remove from the chip any hardware support for graphics operations normally present in GPUs (rasterization, texture mapping, frame buffer operations, and so on). Think of a TPU as a special purpose co-processor specialized for deep learning, being focused on matrix or tensor operations. In this chapter, we will compare CPUs and GPUs with the four generations of TPUs and with Edge TPUs. All these accelerators are available as of April 2022. The chapter will include code examples of using TPUs.

In this chapter, you will...

C/G/T processing units

In this section we discuss CPUs, GPUs, and TPUs. Before discussing TPUs, it will be useful for us to review CPUs and GPUs.

CPUs and GPUs

You are probably somewhat familiar with the concept of a CPU, a general-purpose chip sitting in each computer, tablet, and smartphone. CPUs are in charge of all of the computations: from logical controls, to arithmetic, to register operations, to operations with memory, and many others. CPUs are subject to the well-known Moore’s law [1], which states that the number of transistors in a dense integrated circuit doubles about every two years.

Many people believe that we are currently in an era where this trend cannot be sustained for long, and indeed it has already declined during the past decade. Therefore, we need some additional technology if we want to support the demand for faster and faster computation to process the ever-growing amount of data that is available out there.

One improvement came from...

Four generations of TPUs, plus Edge TPU

As discussed, TPUs are domain-specific processors expressly optimized for matrix operations. Now, you might remember that the basic operation of matrix multiplication is a dot product between a line from one matrix and a column from the other matrix. For instance, given a matrix multiplication , computing Y[i, 0] is:

The sequential implementation of this operation is time-consuming for large matrices. A brute-force computation has a time complexity of O(n3) for n x n matrices so it’s not feasible for running large computations.

First generation TPU

The first generation TPU (TPU v1) was announced in May 2016 at Google I/O. TPU v1 [1] supports matrix multiplication using 8-bit arithmetic. TPU v1 is specialized for deep learning inference but it does not work for training. For training there is a need to perform floating-point operations, as discussed in the following paragraphs.

A key function of TPU is the “...

TPU performance

Discussing performance is always difficult because it is important to first define the metrics that we are going to measure, and the set of workloads that we are going to use as benchmarks. For instance, Google reported an impressive linear scaling for TPU v2 used with ResNet-50 [4] (see Figure 15.9 and Figure 15.10):

Chart, line chart  Description automatically generated

Figure 15.9: Linear scalability in the number of TPUs v2 when increasing the number of images

In addition, you can find online a comparison of ResNet-50 [4] where a full Cloud TPU v2 Pod is >200x faster than a V100 NVIDIA Tesla GPU for ResNet-50 training:

A picture containing timeline  Description automatically generated

Figure 15.10: A full Cloud TPU v2 Pod is >200x faster than a V100 NVIDIA Tesla GPU for training a ResNet-50 model

According to Google, TPU v4 givse top-line results for MLPerf1.0 [5] when compared with Nvidia A100 GPUs (see Figure 15.11). Indeed, these accelerators are designed by keeping in mind the latest large models encompassing billions and sometimes trillions of...

How to use TPUs with Colab

In this section, we show how to use TPUs with Colab. Just point your browser to https://colab.research.google.com/ and change the runtime from the Runtime menu as shown in Figure 15.12. First, you’ll need to enable TPUs for the notebook, then navigate to EditNotebook settings and select TPU from the Hardware accelerator drop-down box:

Graphical user interface, text, application, chat or text message  Description automatically generated

Figure 15.12: Setting TPU as the hardware accelerator

Checking whether TPUs are available

First of all, let’s check if there is a TPU available, by using this simple code fragment that returns the IP address assigned to the TPU. Communication between the CPU and TPU happens via gRPC (gRPC Remote Procedure Call), which is a modern, open-source, high-performance Remote Procedure Call (RPC) framework that can run in any environment:

%tensorflow_version 2.x
import tensorflow as tf
print("Tensorflow version " + tf.__version__)
try:
  tpu = tf.distribute.cluster_resolver.TPUClusterResolver...

Using pretrained TPU models

Google offers a collection of models pretrained with TPUs available in the GitHub tensorflow/tpu repository (https://github.com/tensorflow/tpu). Models include image recognition, object detection, low-resource models, machine translation and language models, speech recognition, and image generation. Whenever it is possible, my suggestion is to start with a pretrained model [6], and then fine-tune it or apply some form of transfer learning. As of April 2022, the following models are available:

Image Recognition, Segmentation, and More

Machine Translation and Language Models

Speech Recognition

Image Generation

Image Recognition

AmoebaNet-D

ResNet-50/101/152/2000

Inception v2/v3/v4

Object Detection

RetinaNet...

Summary

TPUs are very special ASIC chips developed at Google for executing neural network mathematical operations in an ultra-fast manner. The core of the computation is a systolic multiplier that computes multiple dot products (row * column) in parallel, thus accelerating the computation of basic deep learning operations. Think of a TPU as a special-purpose co-processor for deep learning that is focused on matrix or tensor operations. Google has announced four generations of TPUs so far, plus an additional Edge TPU for IoT. Cloud TPU v1 is a PCI-based specialized co-processor, with 92 teraops and inference only. Cloud TPU v2 achieves 180 teraflops and it supports training and inference. Cloud TPU v2 Pods released in alpha in 2018 can achieve 11.5 petaflops. Cloud TPU v3 achieves 420 teraflops with both training and inference support. Cloud TPU v3 Pods can deliver more than 100 petaflops of computing power. Each TPU v4 chip provides more than 2x the compute power of a TPU v3 chip...

References

  1. Moore’s law: https://en.wikipedia.org/wiki/Moore%27s_law
  2. Milovanović, I. Ž. et al. (May 2010). Forty-three ways of systolic matrix multiplication. Article in International Journal of Computer Mathematics 87(6):1264–1276.
  3. Jouppi, N. P. et al. (June 2014). In-Datacenter Performance Analysis of a Tensor Processing Unit. 44th International Symposium on Computer Architecture (ISCA).
  4. Google TPU v2 performance: https://storage.googleapis.com/nexttpu/index.html
  5. MLPerf site: https://mlperf.org/
  6. A collection of models pretrained with TPU: https://cloud.google.com/tpu

Join our book’s Discord space

Join our Discord community to meet like-minded people and learn alongside more than 2000 members at: https://packt.link/keras

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Deep Learning with TensorFlow and Keras – 3rd edition - Third Edition
Published in: Oct 2022Publisher: PacktISBN-13: 9781803232911
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime

Authors (3)

author image
Amita Kapoor

Amita Kapoor is an accomplished AI consultant and educator, with over 25 years of experience. She has received international recognition for her work, including the DAAD fellowship and the Intel Developer Mesh AI Innovator Award. She is a highly respected scholar in her field, with over 100 research papers and several best-selling books on deep learning and AI. After teaching for 25 years at the University of Delhi, Amita took early retirement and turned her focus to democratizing AI education. She currently serves as a member of the Board of Directors for the non-profit Neuromatch Academy, fostering greater accessibility to knowledge and resources in the field. Following her retirement, Amita also founded NePeur, a company that provides data analytics and AI consultancy services. In addition, she shares her expertise with a global audience by teaching online classes on data science and AI at the University of Oxford.
Read more about Amita Kapoor

author image
Antonio Gulli

Antonio Gulli has a passion for establishing and managing global technological talent for innovation and execution. His core expertise is in cloud computing, deep learning, and search engines. Currently, Antonio works for Google in the Cloud Office of the CTO in Zurich, working on Search, Cloud Infra, Sovereignty, and Conversational AI.
Read more about Antonio Gulli

author image
Sujit Pal

Sujit Pal is a Technology Research Director at Elsevier Labs, an advanced technology group within the Reed-Elsevier Group of companies. His interests include semantic search, natural language processing, machine learning, and deep learning. At Elsevier, he has worked on several initiatives involving search quality measurement and improvement, image classification and duplicate detection, and annotation and ontology development for medical and scientific corpora.
Read more about Sujit Pal