You're reading from Modern Time Series Forecasting with Python

Product typeBook

Published inNov 2022

PublisherPackt

ISBN-139781803246802

Edition1st Edition

Concepts

Data Science

Author (1)

Manu Joseph

Attention and Transformers for Time Series

In the previous chapter, we rolled up our sleeves and implemented a few deep learning (DL) systems for time series forecasting. We used the common building blocks we discussed in Chapter 12, Building Blocks of Deep Learning for Time Series, put them together in an encoder-decoder architecture, and trained them to produce the forecast we desired.

Now, let’s talk about another key concept in DL that has taken the field by storm over the past few years—attention. Attention has a long-standing history, which has culminated in it being one of the most sought-after tools in the DL toolkit. This chapter takes you on a journey to understand attention and transformer models from the ground up from a theoretical perspective and solidify that understanding with practical examples.

In this chapter, we will be covering these main topics:

What is attention?
Generalized attention model
Forecasting with sequence-to-sequence...

Technical requirements

If you have not set up the Anaconda environment following the instructions in the Preface, please do that in order to get a working environment with all the packages and datasets required for the code in this book.

You need to run the following notebooks for this chapter:

02 - Preprocessing London Smart Meter Dataset.ipynb in Chapter02
01-Setting up Experiment Harness.ipynb in Chapter04
01-Feature Engineering.ipynb in Chapter06
02-One-Step RNN.ipynb and 03-Seq2Seq RNN.ipynb in Chapter13 (for benchmarking)
00-Single Step Backtesting Baselines.ipynb and 01-Forecasting with ML.ipynb in Chapter08

The associated code for the chapter can be found at https://github.com/PacktPublishing/Modern-Time-Series-Forecasting-with-Python-/tree/main/notebooks/Chapter14.

What is attention?

The idea of attention was inspired by human cognitive function. At any moment, the optic nerves in our eyes, the olfactory nerves in our noses, and the auditory nerves in our ears send a massive amount of sensory input to the brain. This is way too much information, definitely more than the brain can handle. But our brains have developed a mechanism that helps us to pay attention to only the stimuli that matter—such as a sound or a smell that doesn’t belong. Years of evolution have trained our brains to pick out anomalous sounds or smells because that was key for us surviving in the wild where predators roamed free.

Apart from this kind of instinctive attention, we are also able to control our attention by what we call focusing on something. You are doing it right now by choosing to ignore all the other stimuli that you are getting and focusing your attention on the contents of this book. While you are reading, your mobile phone pings you, and the...

The generalized attention model

Over the course of years, researchers have come up with different ways of calculating attention weights and using attention in DL models. Sneha Choudhari et al. published a survey paper on attention models that proposes a generalized attention model that tries to incorporate all the variations in a single framework. Let’s structure our discussion around this generalized framework.

We can think of an attention model as learning an attention distribution () for a set of keys, K, using a set of queries, q. In the example we discussed in the last section, the query would be —the hidden state from the last timestep during decoding—and the keys would be —all the hidden states generated using the input sequence. In some cases, the generated attention distribution is applied to another set of inputs called values, V. In many cases, K and V are the same, but to maintain the general form of the framework, we consider these separately...

Forecasting with sequence-to-sequence models and attention

Let’s pick up the thread from Chapter 13, Common Modeling Patterns for Time Series, where we used Seq2Seq models to forecast a sample household (if you have not read the previous chapter, I strongly suggest you do it now) and modify the Seq2SeqModel class to also include an attention mechanism.

Notebook alert

To follow along with the complete code, use the notebook named 01-Seq2Seq RNN with Attention.ipynb in the Chapter14 folder and the code in the src folder.

We are still going to inherit the BaseModel class we have defined in src/dl/models.py, and the overall structure is going to be very similar to the Seq2SeqModel class. The key difference will be that in our new model, with attention, we do not accept a fully connected layer as the decoder. It is not because it is not possible, but for convenience and brevity of the implementation. In fact, implementing a Seq2Seq model with a fully connected decoder can...

Transformers – Attention is all you need

While the introduction of attention was a shot in the arm for RNNs and Seq2Seq models, they still had one problem. The RNNs were recurrent, and that meant it needed to process each word in a sentence in a sequential manner. And for popular Seq2Seq model applications such as language translation, it meant processing long sequences of words became really time-consuming. In short, it was difficult to scale them to a large corpus of data. In 2017, Vaswani et al. authored a seminal paper titled Attention Is All You Need. Just as the title of the paper implies, they explored an architecture that used attention (scaled dot product attention) and threw away recurrent networks altogether. And to the surprise of NLP researchers around the world, these models (which were dubbed Transformers) outperformed the then state-of-the-art Seq2Seq models in language translation.

This spurred a flurry of research activity around this new class of models...

Forecasting with Transformers

For some continuity, we will continue with the same household we were forecasting with RNNs and RNNs with attention.

Notebook alert

To follow along with the complete code, use the notebook named 03-Transformers.ipynb in the Chapter14 folder and the code in the src folder.

Although we learned about the vanilla Transformer as a model with an encoder-decoder architecture, it was really designed for language translation tasks. In language translation, the source sequence and target sequence are quite different, and therefore the encoder-decoder architecture made sense. But soon after, researchers figured out that using the decoder part of the Transformer alone does well. It is called a decoder-only Transformer in literature. The naming is a bit confusing because if you think about it, the decoder is different from the encoder in two things—masked self-attention and encoder-decoder attention. So, in a decoder-only Transformer, how do we have...

Summary

We have been storming through the world of DL in the last few chapters. We started off with the basic premise of DL, what it is, and why it became so popular. Then, we saw a few common building blocks that are typically used in time series forecasting and got our hands dirty, learning how we can put what we have learned into practice using PyTorch. Although we talked about RNNs, LSTMs, GRUs, and so on, we purposefully left out attention and Transformers because they deserved a separate chapter.

We started the chapter by learning about the generalized attention model, helping you put a framework around all the different schemes of attention out there, and then went into detail on a few common attention schemes, such as scaled dot product, additive, and general attention. Right after incorporating attention into the Seq2Seq models we were playing with in Chapter 12, Building Blocks of Deep Learning for Time Series, we started with the Transformer. We went into detail on all...

References

Following is the list of the references used in this chapter:

Dzmitry Bahdanau, KyungHyun Cho, and Yoshua Bengio (2015). Neural Machine Translation by Jointly Learning to Align and Translate. In 3rd International Conference on Learning Representations. https://arxiv.org/pdf/1409.0473.pdf
Thang Luong, Hieu Pham, and Christopher D. Manning (2015). Effective Approaches to Attention-based Neural Machine Translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. https://aclanthology.org/D15-1166/
André F. T. Martins, Ramón Fernandez Astudillo (2016). From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification. In Proceedings of the 33rd International Conference on Machine Learning. http://proceedings.mlr.press/v48/martins16.html
Ben Peters, Vlad Niculae, André F. T. Martins (2019). Sparse Sequence-to-Sequence Models. In Proceedings of the 57th Annual Meeting of the Association...

The Illustrated Transformer by Jay Alammar: https://jalammar.github.io/illustrated-transformer/
Transformer Networks: A mathematical explanation why scaling the dot products leads to more stable gradients: https://towardsdatascience.com/transformer-networks-a-mathematical-explanation-why-scaling-the-dot-products-leads-to-more-stable-414f87391500
Why is Bahdanau’s attention sometimes called concat attention?: https://stats.stackexchange.com/a/524729
Noam Shazeer (2020). GLU Variants Improve Transformer. arXiv preprint: Arxiv-2002.05202. https://arxiv.org/abs/2002.05202
What is Residual Connection? by Wanshun Wong: https://towardsdatascience.com/what-is-residual-connection-efb07cab0d55
Attn: Illustrated Attention by Raimi Karim: https://towardsdatascience.com/attn-illustrated-attention-5ec4ad276ee3

The rest of the chapter is locked

You have been reading a chapter from

Modern Time Series Forecasting with Python

Published in: Nov 2022Publisher: PacktISBN-13: 9781803246802

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Manu Joseph

Manu Joseph is a self-made data scientist with more than a decade of experience working with many Fortune 500 companies enabling digital and AI transformations, specifically in machine learning-based demand forecasting. He is considered an expert, thought leader, and strong voice in the world of time series forecasting. Currently, Manu leads applied research at Thoucentric, where he advances research by bringing cutting-edge AI technologies to the industry. He is also an active open-source contributor and developed an open-source library—PyTorch Tabular—which makes deep learning for tabular data easy and accessible. Originally from Thiruvananthapuram, India, Manu currently resides in Bengaluru, India, with his wife and son
Read more about Manu Joseph

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages

You're reading from Modern Time Series Forecasting with Python

Attention and Transformers for Time Series

Technical requirements

What is attention?

The generalized attention model

Forecasting with sequence-to-sequence models and attention

Transformers – Attention is all you need

Forecasting with Transformers

Summary

References

Further reading

Unlock this book and the full library FREE for 7 days

Author (1)

Et al.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Mastering Tableau 2023

Building AI Applications with ChatGPT APIs

Building AI Applications with ChatGPT APIs

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

Modern Data Architecture on AWS

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

TinyML Cookbook