Reader small image

You're reading from  Machine Learning for Time-Series with Python

Product typeBook
Published inOct 2021
PublisherPackt
ISBN-139781801819626
Edition1st Edition
Right arrow
Author (1)
Ben Auffarth
Ben Auffarth
author image
Ben Auffarth

Ben Auffarth is a full-stack data scientist with more than 15 years of work experience. With a background and Ph.D. in computational and cognitive neuroscience, he has designed and conducted wet lab experiments on cell cultures, analyzed experiments with terabytes of data, run brain models on IBM supercomputers with up to 64k cores, built production systems processing hundreds and thousands of transactions per day, and trained language models on a large corpus of text documents. He co-founded and is the former president of Data Science Speakers, London.
Read more about Ben Auffarth

Right arrow

Reinforcement Learning for Time-Series

Reinforcement learning is a widely successful paradigm for control problems and function optimization that doesn't require labeled data. It's a powerful framework for experience-driven autonomous learning, where an agent interacts directly with the environment by taking actions and improves its efficiency by trial and error. Reinforcement learning has been especially popular since the breakthrough of the London-based Google-owned company DeepMind in complex games.

In this chapter, we'll discuss a classification of reinforcement learning (RL) approaches in time-series specifically economics, and we'll deal with the accuracy and applicability of RL-based time-series models.

We'll start with core concepts and algorithms in RL relevant to time-series and we'll talk about open issues and challenges in current deep RL models.

I am going to cover the following topics:

  • Introduction to Reinforcement...

Introduction to reinforcement learning

Reinforcement learning is one of the main paradigms in machine learning alongside supervised and unsupervised methods. A major distinction is that supervised or unsupervised methods are passive, responding to changes, whereas RL is actively changing the environment and seeking out new data. In fact, from a machine learning perspective, reinforcement learning algorithms can be viewed as alternating between finding good data and doing supervised learning on that data.

Computer programs based on reinforcement learning have been breaking through barriers. In a watershed moment for artificial intelligence, in March 2016, DeepMind's AlphaGo defeated the professional Go board game player Lee Sedol. Previously, the game of Go was considered to be a hallmark of human creativity and intelligence, too complex to be learned by a machine.

It has been argued that it is edging us closer toward Artificial General Intelligence (AGI). For example...

Reinforcement Learning for Time-Series

Reinforcement Learning (RL) can and has been applied to time-series, however, the problem has to be framed in a certain way. For reinforcement learning, we need to have significant feedback between predictions and ongoing (actions) of the system.

In order to apply RL to time-series forecasting or predictions, the prediction has to condition an action, therefore the state evolution depends on the current state and the agent's action (and randomness). Hypothetically, rewards could be a performance metric about the accuracy of predictions. However, the consequences of good or bad predictions do not affect the original environment. Essentially this corresponds to a supervised learning problem.

More meaningfully, if we want to frame our situation as an RL problem, the state of the systems should be affected by the agents' decisions. For instance, in the case of interacting with the stock market, we would buy or sell based on predictions...

Bandit algorithms

A Multi-Armed Bandit (MAB) is a classic reinforcement learning problem, in which a player is faced with a slot machine (bandit) that has k levers (arms), each with a different reward distribution. The agent's goal is to maximize its cumulative reward on a trial-by-trial basis. Since MABs are a simple but powerful framework for algorithms that make decisions over time under uncertainty, a large number of research articles have been dedicated to them.

Bandit learning refers to algorithms that aim to optimize a single unknown stationary objective function. An agent chooses an action from a set of actions . The environment reveals reward of the chosen action at time t. As information is accumulated over multiple rounds, the agent can build a good representation of the value (or reward) distribution for each arm, .

Therefore, a good policy might converge so that the choice of arm becomes optimal. According to one policy, UCB1 (published by Peter Auer, Nicol...

Deep Q-Learning

Q-learning, introduced by Chris Watkins in 1989, is an algorithm to learn the value of an action in a particular state. Q-learning revolves around representing the expected rewards for an action taken in a given state.

The expected reward of the state-action combination is approximated by the Q function:

Q is initialized to a fixed value, usually at random. At each time step t, the agent selects an action and sees a new state of the environment as a consequence and receives a reward.

The value function Q can then be updated according to the Bellman equation as the weighted average of the old value and the new information:

The weight is by , the learning rate – the higher the learning rate, the more adaptive the Q-function. The discount factor is weighting the rewards by their immediacy – the higher , the more impatient (myopic) the agent becomes.

represents the current reward. is the reward obtained by weighted by...

Python Practice

Let's get into modeling. We'll start by giving some recommendations for users using MABs.

Recommendations

For this example, we'll take joke preferences by users, and we'll use them to simulate feedback on recommended jokes on our website. We'll use this feedback to tune our recommendations. We want to select the 10 best jokes to present to people visiting our site. The recommendations are going to be produced by 10 MABs that each have as many arms as there are jokes.

This is adapted from an example from the mab-ranking library on GitHub by Kenza-AI.

It's a handy library that comes with implementations of different bandits. I've simplified the installation of this library in my fork of the library, so we'll be using my fork here:

pip install git+https://github.com/benman1/mab-ranking

After this is finished, we can get right to it!

We'll download the jester dataset with joke preferences from...

Summary

While online learning, which we talked about in Chapter 8, Online Learning for Time-Series is tackling traditional supervised learning, reinforcement learning tries to deal with the environment. In this chapter, I've introduced reinforcement learning concepts relevant to time-series, and we've discussed many algorithms, such as deep Q-learning and MABs.

Reinforcement learning algorithms are very useful in certain contexts like recommendations, trading, or – more generally – control scenarios. In the practice section, we implemented a recommender using MABs and a trading bot with a DQN.

In the next chapter, we'll look at case studies with time-series. Among other things, we'll look at multivariate forecasts of energy demand.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Machine Learning for Time-Series with Python
Published in: Oct 2021Publisher: PacktISBN-13: 9781801819626
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Ben Auffarth

Ben Auffarth is a full-stack data scientist with more than 15 years of work experience. With a background and Ph.D. in computational and cognitive neuroscience, he has designed and conducted wet lab experiments on cell cultures, analyzed experiments with terabytes of data, run brain models on IBM supercomputers with up to 64k cores, built production systems processing hundreds and thousands of transactions per day, and trained language models on a large corpus of text documents. He co-founded and is the former president of Data Science Speakers, London.
Read more about Ben Auffarth