You're reading from Enhancing Deep Learning with Bayesian Inference

Product type Book

Published in Jun 2023

Publisher Packt

ISBN-13 9781803246888

Pages 386 pages

Edition 1st Edition

Languages

Concepts

Deep Learning

Authors (3):

Matt Benatan

Jochem Gietema

Marian Schneider

View More author details

Table of Contents (11) Chapters

Preface

Chapter 1: Bayesian Inference in the Age of Deep Learning

Chapter 2: Fundamentals of Bayesian Inference

Chapter 3: Fundamentals of Deep Learning

Chapter 4: Introducing Bayesian Deep Learning

Chapter 5: Principled Approaches for Bayesian Deep Learning

Chapter 6: Using the Standard Toolbox for Bayesian Deep Learning

Chapter 7: Practical Considerations for Bayesian Deep Learning

Chapter 8: Applying Bayesian Deep Learning

Chapter 9: Next Steps in Bayesian Deep Learning

Why subscribe?

Chapter 5
Principled Approaches for Bayesian Deep Learning

Now that we’ve introduced the concept of Bayesian Neural Networks (BNNs), we’re ready to explore the various ways in which they can be implemented. As we discussed previously, ideal BNNs are computationally intensive, becoming intractable with more sophisticated architectures or larger amounts of data. In recent years, researchers have developed a range of methods that make BNNs tractable, allowing them to be implemented with larger and more sophisticated neural network architectures.

In this chapter, we’ll explore two particularly popular methods: Probabilistic Backpropagation (PBP) and Bayes by Backprop (BBB). Both methods can be referred to as probabilistic neural network models: neural networks designed to learn probabilities over their weights, rather than simply learning point estimates (a fundamental defining feature of BNNs, as we learned in Chapter 4, Introducing Bayesian Deep Learning)...

5.1 Technical requirements

To complete the practical tasks in this chapter, you will need a Python 3.8 environment with the Python SciPy stack and the following additional Python packages installed:

TensorFlow 2.0
TensorFlow Probability

All of the code for this book can be found on the GitHub repository for the book: https://github.com/PacktPublishing/Enhancing-Deep-Learning-with-Bayesian-Inference.

5.2 Explaining notation

While we’ve introduced much of the notation used throughout the book in the previous chapters, we’ll be introducing more notation associated with BDL in the following chapters. As such, we’ve provided an overview of the notation here for reference:

μ: The mean. To make it easy to cross-reference our chapter with the original Probabilistic Backpropagation paper, this is represented as m when discussing PBP.
σ: The standard deviation.
σ²: The variance (meaning the square of the standard deviation). To make it easy to cross-reference our chapter with the paper, this is represented as v when discussing PBP.
x: A single vector input to our model. If considering multiple inputs, we’ll use X to represent a matrix comprising multiple vector inputs.
x: An approximation of our input x.
y: A single scalar target. When considering multiple targets, we’ll use y to represent a vector of multiple scalar targets.
ŷ:...

5.3 Familiar probabilistic concepts from deep learning

While this book introduces many concepts that may be unfamiliar, you may find that some ideas discussed here are familiar. In particular, Variational Inference (VI) is something you may be familiar with due to its use in Variational Autoencoders (VAEs).

As a quick refresher, VAEs are generative models that learn encodings that can be used to generate plausible data. Much like standard autoencoders, VAEs comprise an encoder-decoder architecture.

Figure 5.1: Illustration of autoencoder architecture

With a standard autoencoder, the model learns a mapping from the encoder to the latent space, and then from the latent space to the decoder.

As we see here, our output is simply defined as x = f_d(z), where our encoding z is simply: z = f_e(x), where f_e() and f_d() are our encoder and decoder functions, respectively. If we want to generate new data using values in our latent space, we could simply inject some random values into the...

5.4 Bayesian inference by backpropagation

In their 2015 paper, Weight Uncertainty in Neural Networks, Charles Blundell and his colleagues at DeepMind introduced a method for using variational learning for Bayesian inference with neural networks. Their method, which learned the BNN parameters via standard backpropagation, was appropriately named Bayes by Backprop (BBB).

In the previous section, we saw how we can use variational learning to estimate the posterior distribution of our encoding, z, learning P(z|x). For BBB, we’re going to be doing very much the same thing, except this time it’s not just the encoding we care about. This time we want to learn the posterior distribution over all of the parameters (or weights) of our network: P(𝜃|D).

You can think of this as having an entire network made up of VAE encoding layers, looking something like this:

Figure 5.5: Illustration of BBB

As such, it’s logical that the learning strategy is also similar...

5.5 Implementing BBB with TensorFlow

In this section, we’ll see how to implement BBB in TensorFlow. Some of the code you’ll see will be familiar; the core concepts of layers, loss functions, and optimizers will be very similar to what we covered in Chapter 3, Fundamentals of Deep Learning. Unlike the examples in Chapter 3, Fundamentals of Deep Learning, we’ll see how we can create neural networks capable of probabilistic inference.

Step 1: Importing packages

We start by importing the relevant packages. Importantly, we will import tensorflow-probability, which will provide us with the layers of the network that replace the point-estimate with a distribution and implement the reparameterization trick. We also set the global parameter for the number of inferences, which will determine how often we sample from the network later:

 
import tensorflow as tf  
import numpy as np  
import matplotlib.pyplot as plt  
import...

5.6 Scalable Bayesian Deep Learning with Probabilistic Backpropagation

BBB provided a great introduction to Bayesian inference with neural networks, but variational methods have one key drawback: their reliance on sampling at training and inference time. Unlike a standard neural network, we need to sample from the weight parameters using a range of 𝜖 values in order to produce the distributions necessary for probabilistic training and inference.

At around the same time that BBB was introduced, researchers at Harvard University were working on their own brand of Bayesian inference with neural networks: Probabilistic Backpropagation, or PBP. Like BBB, PBP’s weights form the parameters of a distribution, in this case mean and variance weights (using variance, σ², rather than σ). In fact, the similarities don’t end here – we’re going to see quite a few similarities to BBB but, crucially, we’re going to end up with a different approach...

5.7 Implementing PBP

Because PBP is quite complex, we’ll implement it as a class. Doing so will keep our example code tidy and allow us to easily compartmentalize our various blocks of code. It will also make it easier to experiment with, for example, if you want to explore changing the number of units or layers in your network.

Step 1: Importing libraries

We begin by importing various libraries. In this example, we will use scikit-learn’s California Housing dataset to predict house prices:

 
from typing import List, Union, Iterable  
import math  
from sklearn import datasets  
from sklearn.model_selection import train_test_split  
import tensorflow as tf  
import numpy as np  
from tensorflow.python.framework import tensor_shape  
import tensorflow_probability as tfp

To make sure we produce the same output every time, we initialize our seeds:

 
RANDOM_SEED...

5.8 Summary

In this chapter, we learned about two fundamental, well-principled, Bayesian deep learning models. BBB showed us how we can make use of variational inference to efficiently sample from our weight space and produce output distributions, while PBP demonstrated that it’s possible to obtain predictive uncertainties without sampling. This makes PBP more computationally efficient than BBB, but each model has its pros and cons.

In BBB’s case, while it’s less computationally efficient than PBP, it’s also more adaptable (particularly with the tools available in TensorFlow for variational layers). We can apply this to a variety of different DNN architectures with relatively little difficulty. The price is incurred through the sampling required at both inference and training time: we need to do more than just a single forward pass to obtain our output distributions.

Conversely, PBP allows us to obtain our uncertainty estimates with a single pass, but –...

5.9 Further reading

Weight Uncertainty in Neural Networks, Charles Blundell et al.: This is the paper that introduced BBB, and is one of the key pieces of BDL literature.
Practical Variational Inference for Neural Networks, Alex Graves et al.: An influential paper on the use of variational inference for neural networks, this work introduces a straightforward stochastic variational method that can be applied to a variety of neural network architectures.
Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks, José Miguel Hernández-Lobato et al.: Another important work in BDL literature, this work introduced PBP, demonstrating how Bayesian inference can be achieved via more scalable means.
Practical Considerations for Probabilistic Backpropagation, Matt Benatan et al.: In this work, the authors introduce methods for making PBP more practical for real-world applications.
Fully Bayesian Recurrent Neural Networks for Safe Reinforcement Learning, Matt Benatan...