Reader small image

You're reading from  Hands-On Meta Learning with Python

Product typeBook
Published inDec 2018
Reading LevelIntermediate
PublisherPackt
ISBN-139781789534207
Edition1st Edition
Languages
Right arrow
Author (1)
Sudharsan Ravichandiran
Sudharsan Ravichandiran
author image
Sudharsan Ravichandiran

Sudharsan Ravichandiran is a data scientist and artificial intelligence enthusiast. He holds a Bachelors in Information Technology from Anna University. His area of research focuses on practical implementations of deep learning and reinforcement learning including natural language processing and computer vision. He is an open-source contributor and loves answering questions on Stack Overflow.
Read more about Sudharsan Ravichandiran

Right arrow

Chapter 7. Meta-SGD and Reptile

In the last chapter, we learned how MAML is used for finding an optimal parameter that's generalizable across several tasks. We saw how MAML computes this optimal parameter by calculating meta gradients and performing meta optimization. We also saw adversarial meta learning, which acts as an enhancement to MAML by adding adversarial samples and allowing MAML to wrestle between clean and adversarial samples to find the optimal parameter. We also saw CAML—or, context adaptation for meta learning. In this chapter, we'll learn about Meta-SGD, another meta learning algorithm that's used for performing learning quickly. Unlike MAML, Meta-SGD will not just find the optimal parameter, it will also find the optimal learning rate and an update direction. We'll see how to use Meta-SGD in supervised and reinforcement learning settings. We'll also see how to build Meta-SGD from scratch. Going ahead, we'll learn about the Reptile algorithm, which acts an improvement to...

Meta-SGD


Let's say we have some task, T. We use a model,

, parameterized by some parameter,

, and train the model to minimize the loss. We minimize the loss using gradient descent and find the optimal parameter

for the model.

Let's recall the update rule of a gradient descent:

So, what are the key elements that make up our gradient descent? Let's see:

  • Parameter
  • Learning rate
  • Update direction

We usually set the parameter

to some random value and try to find the optimal value during our training process, and we set the value of learning rate

to a small number or decay it over time and an update direction that follows the gradient. Can we learn all of these key elements of the gradient descent by meta learning so that we can learn quickly from a few data points? We've already seen, in the last chapter, how MAML finds the optimal initial parameter

that's generalizable across tasks. With the optimal initial parameter, we can take fewer gradient steps and learn quickly on a new task.

So, now can...

Reptile


The Reptile algorithm has been proposed as an improvement to MAML by OpenAI. It's simple and easier to implement. We know that, in MAML, we calculate second order derivatives—that is, the gradient of gradients. But computationally, this isn't an efficient task. So, OpenAI came up with an improvement over MAML called Reptile. The algorithm of Reptile is very simple. Sample some n number of tasks and run Stochastic Gradient Descent (SGD) for fewer iterations on each of the sampled tasks and then update our model parameter in a direction that's common to all of the tasks. Since we're performing SGD for fewer iterations on each task, it indirectly implies we're calculating the second order derivative over the loss. Unlike MAML, it's computationally effective as we're not calculating the second order derivative directly nor unrolling the computational graph, and so it is easier to implement.

Let's say we sampled two tasks,

and

, from the task distribution and we randomly initialize the...

Summary


In this chapter, we've learned about Meta-SGD and the Reptile algorithm. We saw how Meta-SGD differs from MAML and how Meta-SGD is used in supervised and reinforcement learning settings. We saw how Meta-SGD learns the model parameter along with learning rate and update direction. We also saw how to build Meta-SGD from scratch. Then, we learned about the Reptile algorithm. We saw how Reptile differs from MAML and how Reptile acts as an improvement over the MAML algorithm. We also learned how to use Reptile in a sine wave regression task.

In the next chapter, we'll learn how we can use gradient agreement as an optimization objective in meta learning.

Questions


  1. How does Meta-SGD differ from MAML?
  2. How does Meta-SGD find the optimal learning rate?
  3. What is the update equation of the learning rate in meta-SGD?
  4. How does the Reptile algorithm work?
  5. What is the update equation of the Reptile algorithm?
lock icon
The rest of the chapter is locked
You have been reading a chapter from
Hands-On Meta Learning with Python
Published in: Dec 2018Publisher: PacktISBN-13: 9781789534207
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Sudharsan Ravichandiran

Sudharsan Ravichandiran is a data scientist and artificial intelligence enthusiast. He holds a Bachelors in Information Technology from Anna University. His area of research focuses on practical implementations of deep learning and reinforcement learning including natural language processing and computer vision. He is an open-source contributor and loves answering questions on Stack Overflow.
Read more about Sudharsan Ravichandiran