Transformers and attention
Transformers are an architecture taking the machine learning world by storm, especially in the fields of natural language processing. An improvement over classical recurrent neural networks (RNN) for sequence modeling, transformers work on the principle of attention. In this section, we will discuss the attention mechanism, transformers, and the BERT architecture.
Understanding attention
We will now take a look at attention, a recent deep learning paradigm that has made great advances in the world of natural language processing.
Sequence-to-sequence models
Most natural language tasks rely heavily on sequence-to-sequence models. While traditional methods are used for classifying a particular data point, sequence-to-sequence architectures map sequences in one domain to sequences in another. An excellent example of this is language translation. An automatic machine translator will take in sequences of tokens (sentences and words) from the source...