Reader small image

You're reading from  Deep Learning Quick Reference

Product typeBook
Published inMar 2018
Reading LevelExpert
PublisherPackt
ISBN-139781788837996
Edition1st Edition
Languages
Right arrow
Author (1)
Mike Bernico
Mike Bernico
author image
Mike Bernico

Mike Bernico is a Lead Data Scientist at State Farm Mutual Insurance Companies. He also works as an adjunct for the University of Illinois at Springfield, where he teaches Essentials of Data Science, and Advanced Neural Networks and Deep Learning. Mike earned his MSCS from the University of Illinois at Springfield. He's an advocate for open source software and the good it can bring to the world. As a lifelong learner with umpteen hobbies, Mike also enjoys cycling, travel photography, and wine making.
Read more about Mike Bernico

Right arrow

Optimization algorithms for deep learning

The gradient descent algorithm is not the only optimization algorithm available to optimize our network weights, however it's the basis for most other algorithms. While understanding every optimization algorithm out there is likely a PhD worth of material, we will devote a few sentences to some of the most practical.

Using momentum with gradient descent

Using gradient descent with momentum speeds up gradient descent by increasing the speed of learning in directions the gradient has been constant in direction while slowing learning in directions the gradient fluctuates in direction. It allows the velocity of gradient descent to increase.

Momentum works by introducing a velocity term, and using a weighted moving average of that term in the update rule, as follows:

Most typically is set to 0.9 in the case of momentum, and usually this is not a hyper-parameter that needs to be changed.

The RMSProp algorithm

RMSProp is another algorithm that can speed up gradient descent by speeding up learning in some directions, and dampening oscillations in other directions, across the multidimensional space that the network weights represent:

This has the effect of reducing oscillations more in directions where is large.

The Adam optimizer

Adam is one of the best performing known optimizer and it's my first choice. It works well across a wide variety of problems. It combines the best parts of both momentum and RMSProp into a single update rule:

Where is some very small number to prevent division by 0.

Adam is often a great choice, and it's a great place to start when you're prototyping, so save yourself some time by starting with Adam.
Previous PageNext Page
You have been reading a chapter from
Deep Learning Quick Reference
Published in: Mar 2018Publisher: PacktISBN-13: 9781788837996
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Mike Bernico

Mike Bernico is a Lead Data Scientist at State Farm Mutual Insurance Companies. He also works as an adjunct for the University of Illinois at Springfield, where he teaches Essentials of Data Science, and Advanced Neural Networks and Deep Learning. Mike earned his MSCS from the University of Illinois at Springfield. He's an advocate for open source software and the good it can bring to the world. As a lifelong learner with umpteen hobbies, Mike also enjoys cycling, travel photography, and wine making.
Read more about Mike Bernico