Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Deep Learning Quick Reference

You're reading from  Deep Learning Quick Reference

Product type Book
Published in Mar 2018
Publisher Packt
ISBN-13 9781788837996
Pages 272 pages
Edition 1st Edition
Languages
Author (1):
Mike Bernico Mike Bernico
Profile icon Mike Bernico

Table of Contents (15) Chapters

Preface The Building Blocks of Deep Learning Using Deep Learning to Solve Regression Problems Monitoring Network Training Using TensorBoard Using Deep Learning to Solve Binary Classification Problems Using Keras to Solve Multiclass Classification Problems Hyperparameter Optimization Training a CNN from Scratch Transfer Learning with Pretrained CNNs Training an RNN from scratch Training LSTMs with Word Embeddings from Scratch Training Seq2Seq Models Using Deep Reinforcement Learning Generative Adversarial Networks Other Books You May Enjoy

Hyperparameter Optimization

One of the biggest drawbacks to using deep neural networks is that they have many hyperparameters that should be optimized so that the network performs optimally. In each of the earlier chapters, we've encountered, but not covered, the challenge of hyperparameter estimation. Hyperparameter optimization is a really big topic; it's, for the most part, an unsolved problem and, while we can't cover the entire topic in this book, I think it still deserves its own chapter.

In this chapter, I'm going to offer you what I believe is some practical advice for choosing hyperparameters. To be sure, this chapter may be somewhat opinionated and biased because it comes from my own experience. I hope that experience might be useful while also leading you to greater investigation on the topic.

We will cover the following topics in this chapter:

    ...

Should network architecture be considered a hyperparameter?

In building even the simplest network, we have to make all sorts of choices about network architecture. Should we use 1 hidden layer or 1,000? How many neurons should each layer contain? Should they all use the relu activation function or tanh? Should we use dropout on every hidden layer, or just the first? There are many choices we have to make in designing a network architecture.

In the most typical case, we search exhaustively for optimal values for each hyperparameter. It's not so easy to exhaustively search for network architectures though. In practice, we probably don't have the time or computational power to do so. We rarely see researchers searching for the optimal architecture through exhaustive search because the number of choices is so very vast and because there there is more than one correct answer...

Which hyperparameters should we optimize?

Even if you were to follow my advice above and settle on a good enough architecture, you can and should still attempt to search for ideal hyperparameters within that architecture. Some of the hyperparameters we might want to search include the following:

  • Our choice of optimizer. Thus far, I've been using Adam, but an rmsprop optimizer or a well-tuned SGD may do better.
  • Each of these optimizers has a set of hyperparameters that we might tune, such as learning rate, momentum, and decay.
  • Network weight initialization.
  • Neuron activation.
  • Regularization parameters such as dropout probability or the regularization parameter used in l2 regularization.
  • Batch size.

As implied above, this is not an exhaustive list. There are most certainly more options you could try, including introducing variable numbers of neurons in each hidden layer,...

Hyperparameter optimization strategies

At this point in the chapter, we've suggested that it is, for the most part, computationally impossible, or at least impractical, to try every single combination of hyperparameters we might want to try. Deep neural networks can certainly take a long time to train. While you can parallelize and throw computational resources at the problem, it's likely that your greatest limiter in searching for hyperparameters will continue to be time.

If time is our greatest constraint, and we can't reasonably explore all possibilities in the time we have, then we will have to create a strategy where we get the most utility in the time we have.

In the remainder of this section, I'll cover some common strategies for hyperparameter optimization and then I'll show you how to optimize hyperparameters in Keras with two of my favorite methods...

Summary

Hyperparameter optimization is an important step in getting the very best from our deep neural networks. Finding the best way to search for hyperparameters is an open and active area of machine learning research. While you most certainly can apply the state of the art to your own deep learning problem, you will need to weigh the complexity of implementation against the search runtime in your decision.

There are decisions related to network architecture that most certainly can be searched exhaustively, but a set of heuristics and best practices, as I offered above, might get you close enough or even reduce the number of parameters you search.

Ultimately, hyperparameter search is an economics problem, and the first part of any hyperparameter search should be consideration for your budget of computation time, and personal time, in attempting to isolate the best hyperparameter...

lock icon The rest of the chapter is locked
You have been reading a chapter from
Deep Learning Quick Reference
Published in: Mar 2018 Publisher: Packt ISBN-13: 9781788837996
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}