Reader small image

You're reading from  Transformers for Natural Language Processing - Second Edition

Product typeBook
Published inMar 2022
PublisherPackt
ISBN-139781803247335
Edition2nd Edition
Right arrow
Author (1)
Denis Rothman
Denis Rothman
author image
Denis Rothman

Denis Rothman graduated from Sorbonne University and Paris-Diderot University, designing one of the very first word2matrix patented embedding and patented AI conversational agents. He began his career authoring one of the first AI cognitive Natural Language Processing (NLP) chatbots applied as an automated language teacher for Moet et Chandon and other companies. He authored an AI resource optimizer for IBM and apparel producers. He then authored an Advanced Planning and Scheduling (APS) solution used worldwide.
Read more about Denis Rothman

Right arrow

Appendix I — Terminology of Transformer Models

The past decades have produced Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and more types of Artificial Neural Networks (ANNs). They all have a certain amount of vocabulary in common.

Transformer models introduced some new words and used existing words slightly differently. This appendix briefly describes transformer models to clarify the usage of deep learning vocabulary when applied to transformers.

The motivation of transformer model architecture relies upon an industrial approach to deep learning. The geometric nature of transformers boosts parallel processing. In addition, the architecture of transformers perfectly fits hardware optimization requirements. Google, for example, took advantage of the stack structure of transformers to design domain-specific optimized hardware that requires less floating-number precision.

Designing transformers models implies taking hardware into account...

Stack

A stack contains identically sized layers that differ from classical deep learning models, as shown in Figure I.1. A stack runs from bottom to top. A stack can be an encoder or a decoder.

Figure I.1: Layers form a stack

Transformer stacks learn and see more as they rise in the stacks. Each layer transmits what it learned to the next layer just as our memory does.

Imagine that a stack is the Empire State Building in New York City. At the bottom, you cannot see much. But you will see more and farther as you ascend throught the offices on higher floors and look out the windows. Finally, at the top, you have a fantastic view of Manhattan!

Sublayer

Each layer contains sublayers, as shown in Figure I.2. Each sublayer of different layers has an identical structure, which boosts hardware optimization.

The original Transformer contains two sublayers that run from bottom to top:

  • A self-attention sublayer, designed specifically for NLP and hardware optimization
  • A classical feedforward network with some tweaking

Figure I.2: A layer contains two sublayers

Attention heads

A self-attention sublayer is divided into n independent and identical layers called heads. For example, the original Transformer contains eight heads.

Figure I.3 represents heads as processors to show that transformers’ industrialized structure fits hardware design:

A picture containing text, scoreboard  Description automatically generated

Figure I.3: A self-attention sublayer contains heads

Note that the attention heads are represented by microprocessors in Figure I.3 to stress the parallel processing power of transformer architectures.

Transformer architectures fit both NLP and hardware-optimization requirements.

Join our book’s Discord space

Join the book’s Discord workspace for a monthly Ask me Anything session with the authors:

https://www.packt.link/Transformers

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Transformers for Natural Language Processing - Second Edition
Published in: Mar 2022Publisher: PacktISBN-13: 9781803247335
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime

Author (1)

author image
Denis Rothman

Denis Rothman graduated from Sorbonne University and Paris-Diderot University, designing one of the very first word2matrix patented embedding and patented AI conversational agents. He began his career authoring one of the first AI cognitive Natural Language Processing (NLP) chatbots applied as an automated language teacher for Moet et Chandon and other companies. He authored an AI resource optimizer for IBM and apparel producers. He then authored an Advanced Planning and Scheduling (APS) solution used worldwide.
Read more about Denis Rothman