Reader small image

You're reading from  Advanced Deep Learning with Python

Product typeBook
Published inDec 2019
Reading LevelIntermediate
PublisherPackt
ISBN-139781789956177
Edition1st Edition
Languages
Right arrow
Author (1)
Ivan Vasilev
Ivan Vasilev
author image
Ivan Vasilev

Ivan Vasilev started working on the first open source Java deep learning library with GPU support in 2013. The library was acquired by a German company, with whom he continued its development. He has also worked as a machine learning engineer and researcher in medical image classification and segmentation with deep neural networks. Since 2017, he has focused on financial machine learning. He co-founded an algorithmic trading company, where he's the lead engineer. He holds an MSc in artificial intelligence from Sofia University St. Kliment Ohridski and has written two previous books on the same topic.
Read more about Ivan Vasilev

Right arrow

Language Modeling

This chapter is the first of several in which we'll discuss different neural network algorithms in the context of natural language processing (NLP). NLP teaches computers to process and analyze natural language data in order to perform tasks such as machine translation, sentiment analysis, natural language generation, and so on. But to successfully solve such complex problems, we have to represent the natural language in a way that the computer can understand, and this is not a trivial task.

To understand why, let's go back to image recognition. The neural network input is fairly intuitive—a 2D tensor with preprocessed pixel intensities, which preserves the spatial features of the image. Let's take a 28 x 28 MNIST image, which contains 784 pixels. All the information about the digit in the image is contained within these pixels only and...

Understanding n-grams

A word-based language model defines a probability distribution over sequences of words. Given a sequence of words of length m (for example, a sentence), it assigns a probability P(w1, ... , wm) to the full sequence of words. We can use these probabilities as follows:

  • To estimate the likelihood of different phrases in NLP applications.
  • As a generative model to create new text. A word-based language model can compute the likelihood of a given word following a sequence of words.

The inference of the probability of a long sequence, say w1, ..., wm, is typically infeasible. We can calculate the joint probability of P(w1, ... , wm) with the chain rule of joint probability (Chapter 1, The Nuts and Bolts of Neural Networks):

The probability of the later words given the earlier words would be especially difficult to estimate from the data. That's why this...

Introducing neural language models

One way to overcome the curse of dimensionality is by learning a lower-dimensional, distributed representation of the words (A Neural Probabilistic Language Model, http://www.jmlr.org/papers/volume3/bengio03a/bengio03a.pdf). This distributed representation is created by learning an embedding function that transforms the space of words into a lower-dimensional space of word embeddings as follows:

Words -> one-hot encoding -> word embedding vectors

Words from the vocabulary with size V are transformed into one-hot encoding vectors of size V (each word is encoded uniquely). Then, the embedding function transforms this V-dimensional space into a distributed representation of size D (here, D=4).

The idea is that the embedding function learns semantic information about the words. It associates each word in the vocabulary with a continuous-valued...

Implementing language models

In this section, we'll implement a short pipeline for preprocessing text sequences and training a word2vec model with the processed data. We'll also implement another example to visualize embedding vectors and check some of their interesting properties.

The code in this section requires the following Python packages:

  • Gensim (version 3.80, https://radimrehurek.com/gensim/) is an open source Python library for unsupervised topic modeling and NLP. It supports all three models that we have discussed so far (word2vec, GloVe, and fastText).
  • The Natural Language Toolkit (NLTK, https://www.nltk.org/, ver 3.4.4) is a Python suite of libraries and programs for symbolic and statistical NLP.
  • Scikit-learn (ver 0.19.1, https://scikit-learn.org/) is an open source Python ML library with various classification, regression, and clustering algorithms. More...

Summary

This was the first chapter devoted to NLP. Appropriately, we started with the basic building blocks of most NLP algorithms today—the words and their context-based vector representations. We started with n-grams and the need to represent words as vectors. Then, we discussed the word2vec, fastText, and GloVe models. Finally, we implemented a simple pipeline to train an embedding model and we visualized word vectors with t-SNE.

In the next chapter, we'll discuss RNNs—a neural network architecture that naturally lends itself to NLP tasks.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Advanced Deep Learning with Python
Published in: Dec 2019Publisher: PacktISBN-13: 9781789956177
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Ivan Vasilev

Ivan Vasilev started working on the first open source Java deep learning library with GPU support in 2013. The library was acquired by a German company, with whom he continued its development. He has also worked as a machine learning engineer and researcher in medical image classification and segmentation with deep neural networks. Since 2017, he has focused on financial machine learning. He co-founded an algorithmic trading company, where he's the lead engineer. He holds an MSc in artificial intelligence from Sofia University St. Kliment Ohridski and has written two previous books on the same topic.
Read more about Ivan Vasilev