Understanding Sentiment in Natural Language with BiLSTMs
Natural Language Understanding (NLU) is a significant subfield of Natural Language Processing (NLP). In the last decade, there has been a resurgence of interest in this field with the dramatic success of chatbots such as Amazon's Alexa and Apple's Siri. This chapter will introduce the broad area of NLU and its main applications.
Specific model architectures called Recurrent Neural Networks (RNNs), with special units called Long Short-Term Memory (LSTM) units, have been developed to make the task of understanding natural language easier. LSTMs in NLP are analogous to convolution blocks in computer vision. We will take two examples to build models that can understand natural language. Our first example is understanding the sentiment of movie reviews. This will be the focus of this chapter. The other example is one of the fundamental building blocks of NLU, Named Entity Recognition (NER). That will be the...
Natural language understanding
NLU enables the processing of unstructured text and extracts meaning and critical pieces of information that are actionable. Enabling a computer to understand sentences of text is a very hard challenge. One aspect of NLU is understanding the meaning of sentences. Sentiment analysis of a sentence becomes possible after understanding the sentence. Another useful application is the classification of sentences to a topic. This topic classification can also help in the disambiguation of entities. Consider the following sentence: "A CNN helps improve the accuracy of object recognition." Without understanding that this sentence is about machine learning, an incorrect inference may be made about the entity CNN. It may be interpreted as the news organization as opposed to a deep learning architecture used in computer vision. An example of a sentiment analysis model is built using a specific RNN architecture called BiLSTMs later in this chapter....
Bi-directional LSTMs – BiLSTMs
LSTMs are one of the styles of recurrent neural networks, or RNNs. RNNs are built to handle sequences and learn the structure of them. An RNN does that by using the output generated after processing the previous item in the sequence along with the current item to generate the next output.
Mathematically, this can be expressed like so:
This equation says that to compute the output at time t, the output at t-1 is used as an input along with the input data xt at the same time step. Along with this, a set of parameters or learned weights, represented by , are also used in computing the output. The objective of training an RNN is to learn these weights This particular formulation of an RNN is unique. In previous examples, we have not used the output of a batch to determine the output of a future batch. While we focus on applications of RNNs on language where a sentence is modeled as a sequence of words appearing one after the other, RNNs...
This is a foundational chapter in our journey through advanced NLP problems. Many advanced models use building blocks such as BiRNNs. First, we used the TensorFlow Datasets package to load data. Our work of building a vocabulary, tokenizer, and encoder for vectorization was simplified through the use of this library. After understanding LSTMs and BiLSTMs, we built models to do sentiment analysis. Our work showed promise but was far away from the state-of-the-art results, which will be addressed in future chapters. However, we are now armed with the fundamental building blocks that will enable us to tackle more challenging problems.
Armed with this knowledge of LSTMs, we are ready to build our first NER model using BiLSTMs in the next chapter. Once this model is built, we will try to improve it using CRFs and Viterbi decoding.