Reader small image

You're reading from  Mastering PyTorch

Product typeBook
Published inFeb 2021
Reading LevelIntermediate
PublisherPackt
ISBN-139781789614381
Edition1st Edition
Languages
Tools
Right arrow
Author (1)
Ashish Ranjan Jha
Ashish Ranjan Jha
author image
Ashish Ranjan Jha

Ashish Ranjan Jha received his bachelor's degree in electrical engineering from IIT Roorkee (India), a master's degree in Computer Science from EPFL (Switzerland), and an MBA degree from Quantic School of Business (Washington). He has received a distinction in all 3 of his degrees. He has worked for large technology companies, including Oracle and Sony as well as the more recent tech unicorns such as Revolut, mostly focused on artificial intelligence. He currently works as a machine learning engineer. Ashish has worked on a range of products and projects, from developing an app that uses sensor data to predict the mode of transport to detecting fraud in car damage insurance claims. Besides being an author, machine learning engineer, and data scientist, he also blogs frequently on his personal blog site about the latest research and engineering topics around machine learning.
Read more about Ashish Ranjan Jha

Right arrow

Chapter 5: Hybrid Advanced Models

In the previous two chapters, we learned extensively about the various convolutional and recurrent network architectures available, along with their implementations in PyTorch. In this chapter, we will take a look at some other deep learning model architectures that have proven to be successful on various machine learning tasks and are neither purely convolutional nor recurrent in nature. We will continue from where we left off in both Chapter 3, Deep CNN Architectures, and Chapter 4, Deep Recurrent Model Architectures.

First, we will explore transformers, which, as we learnt toward the end of Chapter 4, Deep Recurrent Model Architectures, have outperformed recurrent architectures on various sequential tasks. Then, we will pick up from the EfficientNets discussion at the end of Chapter 3, Deep CNN Architectures, and explore the idea of generating randomly wired neural networks, also known as RandWireNNs.

With this chapter, we aim to conclude...

Technical requirements

We will be using Jupyter notebooks for all our exercises. The following is a list of Python libraries that must be installed for this chapter using pip. Here, you must run pip install torch==1.4.0 on the command line and then use the following commands:

jupyter==1.0.0
torch==1.4.0
tqdm==4.43.0
matplotlib==3.1.2
torchtext==0.5.0
torchvision==0.5.0
torchviz==0.0.1
networkx==2.4

All the code files that are relevant to this chapter are available at https://github.com/PacktPublishing/Mastering-PyTorch/tree/master/Chapter05.

Building a transformer model for language modeling

In this section, we will explore what transformers are and build one using PyTorch for the task of language modeling. We will also learn how to use some of its successors, such as BERT and GPT, via PyTorch's pretrained model repository. Before we start building a transformer model, let's quickly recap what language modeling is.

Reviewing language modeling

Language modeling is the task of figuring out the probability of the occurrence of a word or a sequence of words that should follow a given sequence of words. For example, if we are given French is a beautiful _____ as our sequence of words, what is the probability that the next word will be language or word, and so on? These probabilities are computed by modeling the language using various probabilistic and statistical techniques. The idea is to observe a text corpus and learn the grammar by learning which words occur together and which words never occur together...

Developing a RandWireNN model from scratch

We discussed EfficientNets in Chapter 3, Deep CNN Architectures, where we explored the idea of finding the best model architecture instead of specifying it manually. RandWireNNs, or randomly wired neural networks, as the name suggests, are built on a similar concept. In this section, we will study and build our own RandWireNN model using PyTorch.

Understanding RandWireNNs

First, a random graph generation algorithm is used to generate a random graph with a predefined number of nodes. This graph is converted into a neural network by a few definitions being imposed on it, such as the following:

  • Directed: The graph is restricted to be a directed graph, and the direction of edge is considered to be the direction of data flow in the equivalent neural network.
  • Aggregation: Multiple incoming edges to a node (or neuron) are aggregated by weighted sum, where the weights are learnable.
  • Transformation: Inside each node of this graph...

Summary

In this chapter, we looked at two distinct hybrid types of neural networks. First, we looked at the transformer model – the attention-only-based models with no recurrent connections that have outperformed all recurrent models on multiple sequential tasks. We ran through an exercise where we built, trained, and evaluated a transformer model on a language modeling task with the WikiText-2 dataset using PyTorch. During this exercise, we explored the transformer architecture in detail, both through explained architectural diagrams as well as relevant PyTorch code.

We concluded the first section by briefly discussing the successors of transformers – models such as BERT, GPT, and so on. We demonstrated how PyTorch helps in getting started with loading pre-trained versions of most of these advanced models in less than five lines of code.

In the second and final section of this chapter, we took up from where we left off in Chapter 3, Deep CNN Architectures, where...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Mastering PyTorch
Published in: Feb 2021Publisher: PacktISBN-13: 9781789614381
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Ashish Ranjan Jha

Ashish Ranjan Jha received his bachelor's degree in electrical engineering from IIT Roorkee (India), a master's degree in Computer Science from EPFL (Switzerland), and an MBA degree from Quantic School of Business (Washington). He has received a distinction in all 3 of his degrees. He has worked for large technology companies, including Oracle and Sony as well as the more recent tech unicorns such as Revolut, mostly focused on artificial intelligence. He currently works as a machine learning engineer. Ashish has worked on a range of products and projects, from developing an app that uses sensor data to predict the mode of transport to detecting fraud in car damage insurance claims. Besides being an author, machine learning engineer, and data scientist, he also blogs frequently on his personal blog site about the latest research and engineering topics around machine learning.
Read more about Ashish Ranjan Jha