Reader small image

You're reading from  Enhancing Deep Learning with Bayesian Inference

Product typeBook
Published inJun 2023
PublisherPackt
ISBN-139781803246888
Edition1st Edition
Right arrow
Authors (3):
Matt Benatan
Matt Benatan
author image
Matt Benatan

Matt Benatan is a Principal Research Scientist at Sonos and a Simon Industrial Fellow at the University of Manchester. His work involves research in robust multimodal machine learning, uncertainty estimation, Bayesian optimization, and scalable Bayesian inference.
Read more about Matt Benatan

Jochem Gietema
Jochem Gietema
author image
Jochem Gietema

Jochem Gietema is an Applied Scientist at Onfido in London where he has developed and deployed several patented solutions related to anomaly detection, computer vision, and interactive data visualisation.
Read more about Jochem Gietema

Marian Schneider
Marian Schneider
author image
Marian Schneider

Marian Schneider is an applied scientist in machine learning. His work involves developing and deploying applications in computer vision, ranging from brain image segmentation and uncertainty estimation to smarter image capture on mobile devices.
Read more about Marian Schneider

View More author details
Right arrow

Chapter 6
Using the Standard Toolbox for Bayesian Deep Learning

As we saw in previous chapters, vanilla NNs often produce poor uncertainty estimates and tend to make overconfident predictions, and some aren’t capable of producing uncertainty estimates at all. By contrast, probabilistic architectures offer principled means to obtain high-quality uncertainty estimates; however, they have a number of limitations when it comes to scaling and adaptability.

While both PBP and BBB can be implemented with popular ML frameworks (as shown in our previous TensorFlow examples), they are very complex. As we saw in the last chapter, implementing even a simple network isn’t straightforward. This means that adapting them to new architectures is awkward and time-consuming (particularly for PBP, although it is possible – see Fully Bayesian Recurrent Neural Networks for Safe Reinforcement Learning). For simple tasks, such as the examples from Chapter 5, Principled Approaches for...

6.1 Technical requirements

To complete the practical tasks in this chapter, you will need a Python 3.8 environment with the SciPy stack and the following additional Python packages installed:

  • TensorFlow 2.0

  • TensorFlow Probability

All of the code for this book can be found on the GitHub repository for the book: https://github.com/PacktPublishing/Enhancing-Deep-Learning-with-Bayesian-Inference.

6.2 Introducing approximate Bayesian inference via dropout

Dropout is traditionally used to prevent overfitting an NN. First introduced in 2012, it is now used in many common NN architectures and is one of the easiest and most widely used regularization methods. The idea of dropout is to randomly turn off (or drop) certain units of a neural network during training. Because of this, the model cannot solely rely on a particular small subset of neurons to solve the task it was given. Instead, the model is forced to find different ways to solve its task. This improves the robustness of the model and makes it less likely to overfit.

If we simplify a network to y = Wx, where y is the output of our network, x the input, and W our model weights, we can think of dropout as:

 ( { wj, p wˆj = ( 0, otherwise

where wj is the new weights after applying dropout, wj is our weights before applying dropout, and p is our probability of not applying dropout.

The original dropout paper recommends randomly dropping 50% of the units in...

6.3 Using ensembles for model uncertainty estimates

This section will introduce you to deep ensembles: a popular method for obtaining Bayesian uncertainty estimates using an ensemble of deep networks.

6.3.1 Introducing ensembling methods

A common strategy in machine learning is to combine several single models into a committee of models. The process of learning such a combination of models is called ensemble learning, and the resulting committee of models is called an ensemble. Ensemble learning involves two main components: first, the different single models need to be trained. There are various strategies to obtain different models from the same training data: the models can be trained on different subsets of data, we can train different model types or models with different architectures, or we can initialize the same model types with different hyperparameters. Second, the outputs of the different single models need to be combined. Common strategies for combining the predictions...

6.4 Exploring neural network augmentation with Bayesian last-layer methods

Through the course of Chapter 5, Principled Approaches for Bayesian Deep Learning and Chapter 6, Using the Standard Toolbox for Bayesian Deep Learning, we’ve explored a variety of methods for Bayesian inference with DNNs. These methods have incorporated some form of uncertainty information at every layer, whether through the use of explicitly probabilistic means or via ensemble-based or dropout-based approximations. These methods have certain advantages. Their consistent Bayesian (or, more accurately, approximately Bayesian) mechanics mean that they are consistent: the same principles are applied at each layer, both in terms of network architecture and update rules. This makes them easier to justify from a theoretical standpoint, as we know that any theoretical guarantees apply at each layer. In addition to this, it means that we have the benefit of being able to access uncertainties at every...

6.5 Summary

In this chapter, we’ve seen how familiar machine learning and deep learning concepts can be used to develop models with predictive uncertainties. We’ve also seen how, with relatively minor modifications, we can add uncertain estimates to pre-trained models. This means we can go beyond the point-estimate approach of standard NNs: using uncertainties to gain valuable insights into the performance of our models, and allowing us to develop more robust applications.

However, as with the methods introduced in Chapter 5, Principled Approaches for Bayesian Deep Learning, all techniques have advantages and disadvantages. For example, last-layer methods may give us the flexibility to add uncertainties to any model, but they’re limited by the representation that the model has already learned. This could result in very low variance outputs, resulting in an overconfident model. Similarly, while ensemble methods allow us to capture variance across every layer...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Enhancing Deep Learning with Bayesian Inference
Published in: Jun 2023Publisher: PacktISBN-13: 9781803246888
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (3)

author image
Matt Benatan

Matt Benatan is a Principal Research Scientist at Sonos and a Simon Industrial Fellow at the University of Manchester. His work involves research in robust multimodal machine learning, uncertainty estimation, Bayesian optimization, and scalable Bayesian inference.
Read more about Matt Benatan

author image
Jochem Gietema

Jochem Gietema is an Applied Scientist at Onfido in London where he has developed and deployed several patented solutions related to anomaly detection, computer vision, and interactive data visualisation.
Read more about Jochem Gietema

author image
Marian Schneider

Marian Schneider is an applied scientist in machine learning. His work involves developing and deploying applications in computer vision, ranging from brain image segmentation and uncertainty estimation to smarter image capture on mobile devices.
Read more about Marian Schneider