Reader small image

You're reading from  Mastering NLP from Foundations to LLMs

Product typeBook
Published inApr 2024
PublisherPackt
ISBN-139781804619186
Edition1st Edition
Right arrow
Authors (2):
Lior Gazit
Lior Gazit
author image
Lior Gazit

Lior Gazit is a highly skilled Machine Learning professional with a proven track record of success in building and leading teams drive business growth. He is an expert in Natural Language Processing and has successfully developed innovative Machine Learning pipelines and products. He holds a Master degree and has published in peer-reviewed journals and conferences. As a Senior Director of the Machine Learning group in the Financial sector, and a Principal Machine Learning Advisor at an emerging startup, Lior is a respected leader in the industry, with a wealth of knowledge and experience to share. With much passion and inspiration, Lior is dedicated to using Machine Learning to drive positive change and growth in his organizations.
Read more about Lior Gazit

Meysam Ghaffari
Meysam Ghaffari
author image
Meysam Ghaffari

Meysam Ghaffari is a Senior Data Scientist with a strong background in Natural Language Processing and Deep Learning. Currently working at MSKCC, where he specialize in developing and improving Machine Learning and NLP models for healthcare problems. He has over 9 years of experience in Machine Learning and over 4 years of experience in NLP and Deep Learning. He received his Ph.D. in Computer Science from Florida State University, His MS in Computer Science - Artificial Intelligence from Isfahan University of Technology and his B.S. in Computer Science at Iran University of Science and Technology. He also worked as a post doctoral research associate at University of Wisconsin-Madison before joining MSKCC.
Read more about Meysam Ghaffari

View More author details
Right arrow

Text Classification Reimagined: Delving Deep into Deep Learning Language Models

In this chapter, we delve into the realm of deep learning (DL) and its application in natural language processing (NLP), specifically focusing on the groundbreaking transformer-based models such as Bidirectional Encoder Representations from Transformers (BERT) and generative pretrained transformer (GPT). We begin by introducing the fundamentals of DL, elucidating its powerful capability to learn intricate patterns from large amounts of data, making it the cornerstone of state-of-the-art NLP systems.

Following this, we delve into transformers, a novel architecture that has revolutionized NLP by offering a more effective method of handling sequence data compared to traditional recurrent neural networks (RNNs) and convolutional neural networks (CNNs). We unpack the transformer’s unique characteristics, including its attention mechanisms, which allow it to focus on different parts of the input sequence...

Technical requirements

To successfully navigate through this chapter, certain technical prerequisites are necessary, as follows:

  • Programming knowledge: A strong understanding of Python is essential, as it’s the primary language used for most DL and NLP libraries.
  • Machine learning fundamentals: A good grasp of basic ML concepts such as training/testing data, overfitting, underfitting, accuracy, precision, recall, and F1 score will be valuable.
  • DL basics: Familiarity with DL concepts and architectures, including neural networks, backpropagation, activation functions, and loss functions, will be essential. Knowledge of RNNs and CNNs would be advantageous but not strictly necessary as we will focus more on transformer architectures.
  • NLP basics: Some understanding of basic NLP concepts such as tokenization, stemming, lemmatization, and word embeddings (such as Word2Vec or GloVe) would be beneficial.
  • Libraries and frameworks: Experience with libraries such...

Understanding deep learning basics

In this part, we explain what neural network and deep neural networks are, what is the motivation for using them, and the different types (architectures) of deep learning models.

What is a neural network?

Neural networks are a subfield of artificial intelligence (AI) and ML that focuses on algorithms inspired by the structure and function of the brain. It is also known as “deep” learning because these neural networks often consist of many repetitive layers, creating a deep architecture.

These DL models are capable of “learning” from large volumes of complex, high-dimensional, and unstructured data. The term “learning” refers to the ability of the model to automatically learn and improve from experience without being explicitly programmed to do so for any one particular task of the tasks it learns.

DL can be supervised, semi-supervised, or unsupervised. It’s used in numerous applications...

The architecture of different neural networks

Neural networks come in various types, each with a specific architecture suited to a different kind of task. The following list contains general descriptions of some of the most common types:

  • Feedforward neural network (FNN): This is the most straightforward type of neural network. Information in this network moves in one direction only, from the input layer through any hidden layers to the output layer. There are no cycles or loops in the network; it’s a straight, “feedforward” path.
Figure 6.2 – Feedforward neural network

Figure 6.2 – Feedforward neural network

  • Multilayer perceptron (MLP): An MLP is a type of feedforward network that has at least one hidden layer in addition to its input and output layers. The layers are fully connected, meaning each neuron in a layer connects with every neuron in the next layer. MLPs can model complex patterns and are widely used for tasks such as image recognition...

The challenges of training neural networks

Training neural networks is a complex task and comes with challenges during the training, such as local minima and vanishing/exploding gradients, as well as computational costs and interpretability. All challenges are explained in detail in the following points:

  • Local minima: The objective of training a neural network is to find the set of weights that minimizes the loss function. This is a high-dimensional optimization problem, and there are many points (sets of weights) where the loss function has local minima. A suboptimal local minimum is a point where the loss is lower than for the nearby points but higher than the global minimum, which is the overall lowest possible loss. The training process can get stuck in such suboptimal local minima. It’s important to remember that the local minima problem exists even in convex loss functions due to the discrete representation that is a part of digital computation.
  • Vanishing/exploding...

Language models

A language model is a statistical model in NLP that is designed to learn and understand the structure of human language. More specifically, it is a probabilistic model that is trained to estimate the likelihood of words when provided with a given word scenario. For instance, a language model could be trained to predict the next word in a sentence, given the previous words.

Language models are fundamental to many NLP tasks. They are used in machine translation, speech recognition, part-of-speech tagging, and named entity recognition, among other things. More recently, they have been used to create conversational AI models such as chatbots and personal assistants and to generate human-like text.

Traditional language models were often based on explicitly statistical methods, such as n-gram models, which consider only the previous n words when predicting the next word, or hidden Markov models (HMMs).

More recently, neural networks have become popular for creating...

Understanding transformers

Transformers are a type of neural network architecture that was introduced in a paper called Attention is All You Need by Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin (Advances in neural information processing systems 30 (2017), Harvard). They have been very influential in the field of NLP and have formed the basis for state-of-the-art models such as BERT and GPT.

The key innovation in transformers is the self-attention mechanism, which allows the model to weigh the relevance of each word in the input when producing an output, thereby considering the context of each word. This is unlike previous models such as RNNs or RNNs, which process the input sequentially and, therefore, have a harder time capturing the long-range dependencies between words.

Architecture of transformers

A transformer is composed of an encoder and a decoder, both of which are made up of several...

Learning more about large language models

Large language models are a class of ML models that have been trained on a broad range of internet text.

The term “large” in “large language models” refers to the number of parameters that these models have. For example, GPT-3 has 175 billion parameters. These models are trained using self-supervised learning on a large corpus of text, which means they predict the next word in a sentence (such as GPT) or a word based on surrounding words (such as BERT, which is also trained to predict whether a pair of sentences is sequential). Because they are exposed to such a large amount of text, these models learn grammar, facts about the world, reasoning abilities, and also biases in the data they’re trained on.

These models are transformer-based, meaning they leverage the transformer architecture, which uses self-attention mechanisms to weigh the importance of words in input data. This architecture allows these...

The challenges of training language models

Training large language models is a complex and resource-intensive task that poses several challenges. Here are some of the key issues:

  • Computational resources: The training of large language models requires substantial computational resources. These models have billions of parameters that need to be updated during training, which involves performing a large amount of computation over an extensive dataset. This computation is usually carried out on high-performance GPUs or tensor processing units (TPUs), and the costs associated can be prohibitive.
  • Memory limitations: As the size of the model increases, the amount of memory required to store the model parameters, intermediate activations, and gradients during training also increases. This can lead to memory issues on even the most advanced hardware. Techniques such as model parallelism, gradient checkpointing, and offloading can be used to mitigate these issues, but they add complexity...

Challenges of using GPT-3

Despite its impressive capabilities, GPT-3 also presents some challenges. Due to its large size, it requires substantial computational resources to train. It can sometimes generate incorrect or nonsensical responses, and it can reflect biases present in the training data. It also struggles with tasks that require a deep understanding of the world or common sense reasoning beyond what can be learned from text.

Reviewing our use case – ML/DL system design for NLP classification in a Jupyter Notebook

In this section, we are going to work on a real-world problem and see how we can use an NLP pipeline to solve it. The code for this part is shared as a Google Colab notebook at Ch6_Text_Classification_DL.ipynb.

The business objective

In this scenario, we are in the healthcare sector. Our objective is to develop a general medical knowledge engine that is very up to date with recent findings in the world of healthcare.

The technical objective

...

Summary

In this enlightening chapter, we embarked on a comprehensive exploration of DL and its remarkable application to text classification tasks through language models. We began with an overview of DL, revealing its profound ability to learn complex patterns from vast amounts of data and its indisputable role in advancing state-of-the-art NLP systems.

We then delved into the transformative world of transformer models, which have revolutionized NLP by providing an effective alternative to traditional RNNs and CNNs for processing sequence data. By unpacking the attention mechanism—a key feature in transformers—we highlighted its capacity to focus on different parts of the input sequence, hence facilitating a better understanding of context.

Our journey continued with an in-depth exploration of the BERT model. We detailed its architecture, emphasizing its pioneering use of bidirectional training to generate contextually rich word embeddings, and we highlighted its...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Mastering NLP from Foundations to LLMs
Published in: Apr 2024Publisher: PacktISBN-13: 9781804619186
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
Lior Gazit

Lior Gazit is a highly skilled Machine Learning professional with a proven track record of success in building and leading teams drive business growth. He is an expert in Natural Language Processing and has successfully developed innovative Machine Learning pipelines and products. He holds a Master degree and has published in peer-reviewed journals and conferences. As a Senior Director of the Machine Learning group in the Financial sector, and a Principal Machine Learning Advisor at an emerging startup, Lior is a respected leader in the industry, with a wealth of knowledge and experience to share. With much passion and inspiration, Lior is dedicated to using Machine Learning to drive positive change and growth in his organizations.
Read more about Lior Gazit

author image
Meysam Ghaffari

Meysam Ghaffari is a Senior Data Scientist with a strong background in Natural Language Processing and Deep Learning. Currently working at MSKCC, where he specialize in developing and improving Machine Learning and NLP models for healthcare problems. He has over 9 years of experience in Machine Learning and over 4 years of experience in NLP and Deep Learning. He received his Ph.D. in Computer Science from Florida State University, His MS in Computer Science - Artificial Intelligence from Isfahan University of Technology and his B.S. in Computer Science at Iran University of Science and Technology. He also worked as a post doctoral research associate at University of Wisconsin-Madison before joining MSKCC.
Read more about Meysam Ghaffari