5.6 Scalable Bayesian Deep Learning with Probabilistic Backpropagation
BBB provided a great introduction to Bayesian inference with neural networks, but variational methods have one key drawback: their reliance on sampling at training and inference time. Unlike a standard neural network, we need to sample from the weight parameters using a range of 𝜖 values in order to produce the distributions necessary for probabilistic training and inference.
At around the same time that BBB was introduced, researchers at Harvard University were working on their own brand of Bayesian inference with neural networks: Probabilistic Backpropagation, or PBP. Like BBB, PBP’s weights form the parameters of a distribution, in this case mean and variance weights (using variance, σ2, rather than σ). In fact, the similarities don’t end here – we’re going to see quite a few similarities to BBB but, crucially, we’re going to end up with a different approach...