Packt+ | Advance your knowledge in tech

You're reading from Natural Language Processing with TensorFlow

Product typeBook

Published inMay 2018

Reading LevelBeginner

PublisherPackt

ISBN-139781788478311

Edition1st Edition

Languages

Python

Tools

TensorFlow

Concepts

Mobile Application Development

Authors (2):

Motaz Saad

Thushan Ganegedara

View More author details

Chapter 4. Advanced Word2vec

In Chapter 3, Word2vec – Learning Word Embeddings, we introduced you to Word2vec, the basics of learning word embeddings, and the two common Word2vec algorithms: skip-gram and CBOW. In this chapter, we will discuss several topics related to Word2vec, focusing on these two algorithms and extensions.

First, we will explore how the original skip-gram algorithm was implemented and how it compares to its more modern variant, which we used in Chapter 3, Word2vec – Learning Word Embeddings. We will examine the differences between skip-gram and CBOW and look at the behavior of the loss over time of the two approaches. We will also discuss which method works better, using both our observation and the available literature.

We will discuss several extensions to the existing Word2vec methods that boost performance. These extensions include using more effective sampling techniques to sample negative examples for negative sampling and ignoring uninformative words in the learning...

The original skip-gram algorithm

The skip-gram algorithm discussed up to this point in the book is actually an improvement over the original skip-gram algorithm proposed in the original paper by Mikolov and others, published in 2013. In this paper, the algorithm did not use an intermediate hidden layer to learn the representations. In contrast, the original algorithm used two different embedding or projection layers (the input and output embeddings in Figure 4.1) and defined a cost function derived from the embeddings themselves:

Figure 4.1: The original skip-gram algorithm without hidden layers

The original negative sampled loss was defined as follows:

Here, v is the input embeddings layer, v' is the output word embeddings layer, corresponds to the embedding vector for the word w_i in the input embeddings layer and corresponds to the word vector for the word w_i in the output embeddings layer.

is the noise distribution, from which we sample noise samples (for example, it can be as simple...

Comparing skip-gram with CBOW

Before looking at the performance differences and investigating reasons, let's remind ourselves about the fundamental difference between the skip-gram and CBOW methods.

As shown in the following figures, given a context and a target word, skip-gram observes only the target word and a single word of the context in a single input/output tuple. However, CBOW observes the target word and all the words in the context in a single sample. For example, if we assume the phrase dog barked at the mailman, skip-gram sees an input-output tuple such as ["dog", "at"] at a single time step, whereas CBOW sees an input-output tuple [["dog","barked","the","mailman"], "at"]. Therefore, in a given batch of data, CBOW receives more information than skip-gram about the context of a given word. Let's next see how this difference affects the performance of the two algorithms.

As shown in the preceding figures, the CBOW model has access to more information (inputs) at a given time compared...

Extensions to the word embeddings algorithms

The original paper by Mikolov and others, published in 2013, discusses several extensions that can improve the performance of the word embedding learning algorithms even further. Though they are initially introduced to be used for skip-gram, they are extendable to CBOW as well. Also, as we already saw that CBOW outperforms the skip-gram algorithm in our example, we will use CBOW for understanding all the extensions.

Using the unigram distribution for negative sampling

It has been found that the performance results of negative sampling are better when performed by sampling from certain distributions rather than from the uniform distribution. One such distribution is the unigram distribution. The unigram probability of a word w_i is given by the following equation:

Here, count(w_i) is the number of times w_i appears in the document. When the unigram distribution is distorted as for some constant Z, it has shown to provide better performance than the...

More recent algorithms extending skip-gram and CBOW

We already saw that the Word2vec techniques are quite powerful in capturing semantics of words. However, they are not without their limitations. For example, they do not pay attention to the distance between a context word and the target word. However, if the context word is further away from the target word, its impact on the target word should be less. Therefore, we will discuss techniques that pay separate attention to different positions in the context. Another limitation of Word2vec is that it only pays attention to a very small window around a given word when computing the word vector. However, in reality, the way the word co-occurs throughout a corpus should be considered to compute good word vectors. So, we will look at a technique that not only looks at the context of a word, but also at the global co-occurrence information of the word.

A limitation of the skip-gram algorithm

The previously-discussed skip-gram algorithm and all its...

GloVe – Global Vectors representation

Methods for learning word vectors fall into either of two categories: global matrix factorization-based methods or local context window-based methods. Latent Semantic Analysis (LSA) is an example of a global matrix factorization-based method, and skip-gram and CBOW are local context window-based methods. LSA is used as a document analysis technique that maps words in the documents to something known as a concept, a common pattern of words that appears in a document. Global matrix factorization-based methods efficiently exploit the global statistics of a corpus (for example, co-occurrence of words in a global scope), but have shown to perform poorly at word analogy tasks. On the other hand, context window-based methods have been shown to perform well at word analogy tasks, but do not utilize global statistics of the corpus, leaving space for improvement. GloVe attempts to get the best of both worlds—an approach that efficiently leverages global corpus...

Document classification with Word2vec

Although Word2vec gives a very elegant way of learning numerical representations of words, as we saw quantitatively (loss value) and qualitatively (t-SNE embeddings), learning word representations alone is not convincing enough to realize the power of word vectors in real-world applications. Word embeddings are used as the feature representation of words for many tasks, such as image caption generation and machine translation. However, these tasks involve combining different learning models (such as Convolution Neural Networks (CNNs) and Long Short-Term Memory (LSTM) models or two LSTM models). These will be discussed in later chapters. To understand a real-world usage of word embeddings let's stick to a simpler task—document classification.

Document classification is one of the most popular tasks in NLP. Document classification is extremely useful for anyone who is handling massive collections of data such as those for news websites, publishers, and...

Summary

In this chapter, we examined the performance difference between the skip-gram and CBOW algorithms. For the comparison, we used a popular two-dimensional visualization technique, t-SNE, which we also briefly introduced to you, touching on the fundamental intuition and mathematics behind the method.

Next, we introduced you to the several extensions to Word2vec algorithms that boost their performance, followed by several novel algorithms that were based on the skip-gram and CBOW algorithms. Structured skip-gram extends the skip-gram algorithm by preserving the position of the context word during optimization, allowing the algorithm to treat input-output based on the distance between them. The same extension can be applied to the CBOW algorithm, and this results in the continuous window algorithm.

Then we discussed GloVe—another word embedding learning technique. GloVe takes the current Word2vec algorithms a step further by incorporating global statistics into the optimization, thus increasing...

The rest of the chapter is locked

You have been reading a chapter from

Natural Language Processing with TensorFlow

Published in: May 2018Publisher: PacktISBN-13: 9781788478311

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (2)

Motaz Saad

Other recommended products

Related to this chapter

Getting Started with Google BERT

Getting Started with Google BERT will help you become well-versed with the BERT model from scratch and learn how to create interesting NLP applications. You'll understand several variants of BERT such as ALBERT, RoBERTa, DistilBERT, ELECTRA, VideoBERT, and many others in detail.

BookJan 2021352 pages

Recurrent Neural Networks with Python Quick Start Guide

Developers struggle to find an easy to follow learning resource for implementing Recurrent Neural Network(RNN) models. RNNs are the state-of-the-art model in deep learning for dealing with sequential data. From language translation to generating captions for an image, RNNs are used to continuously improve the results. This book will teach you the fundamentals of RNNs with example applications in Python and the TensorFlow library. The examples are accompanied by the right combination of theoretical knowledge and real-world implementations of concepts to build a solid foundation of neural network modeling.

BookNov 2018122 pages

Deep Learning Essentials

Deep Learning is one of the trending topics in the field of Artificial Intelligence today and can be considered to be an advanced form of machine learning. This book will help you take your first steps when it comes to training efficient deep learning models, and apply them in various practical scenarios. You will model, train and deploy different kinds of neural networks such as Convolutional Neural Network, Recurrent Neural Network, and see their applications in real-world domains such as computer vision, natural language processing, and speech recognition. This book also covers solutions to tackle different problems you might come across while training your models and ensure their high performance. This book does not assume any prior knowledge of deep learning. By the end of this book, you will have a firm understanding of the basics of deep learning and neural network modeling, along with their practical applications.

BookJan 2018284 pages3

Hands-On Deep Learning Algorithms with Python

This book introduces basic-to-advanced deep learning algorithms used in a production environment by AI researchers and principal data scientists; it explains algorithms intuitively, including the underlying math, and shows how to implement them using popular Python-based deep learning libraries such as TensorFlow.

BookJul 2019512 pages

Hands-On Natural Language Processing with PyTorch 1.x

Developers working with NLP will be able to put their knowledge to work with this practical guide to PyTorch. You will learn to use PyTorch offerings and how to understand and analyze text using Python. You will learn to extract the underlying meaning in the text using deep neural networks and modern deep learning algorithms.

BookJul 2020276 pages

Advanced Natural Language Processing with TensorFlow 2

This book provides hands-on training in NLP tools and techniques with intrinsic details. Apart from gaining expertise, you will be able to carry out novel state-of-the-art research using the skills gained.

BookFeb 2021380 pages

Intelligent Projects Using Python

This book includes 9 projects on building smart and practical AI-based systems. These projects cover solutions to different domain-specific problems in healthcare, e-commerce and more. With this book, you will apply different machine learning and deep learning techniques and learn how to build your own intelligent applications for smart predictions and other insight-driven tasks.

BookJan 2019342 pages

Deep Learning with Theano

This book covers a complete overview of Deep Learning with Theano, a Python-based library that makes optimizing numerical expressions easy. Practical code examples address supervised, unsupervised, generative and reinforcement learning for image recognition, natural language processing, or game strategy, with best performing nets and principles.

BookJul 2017300 pages

Neural Networks with Keras Cookbook

This book presents solutions to the majority of the challenges you will face while training neural networks to solve deep learning problems. It covers the trending deep learning architectures used in industry and tackles a variety of use cases in computer vision, text processing, audio analysis, recommender systems, and game bots

BookFeb 2019568 pages

Hands-On Python Natural Language Processing

This book provides a blend of both the theoretical and practical aspects of Natural Language Processing (NLP). It covers the concepts essential to develop a thorough understanding of NLP and also delves into a detailed discussion on NLP based use-cases such as language translation, sentiment analysis, etc. Every module covers real-world examples

BookJun 2020316 pages4

Hands-On Natural Language Processing with Python

This book teaches you to leverage deep learning models in performing various NLP tasks along with showcasing the best practices in dealing with the NLP challenges. The book equips you with practical knowledge to implement deep learning in your linguistic applications using NLTk and Python's popular deep learning library, TensorFlow.

BookJul 2018312 pages

Deep Learning with Keras

Keras is a high-level neural network library written in Python that runs on top of either Theano or TensorFlow. With this book, you’ll learn the basics of Keras in a highly practical way and understand how this minimal, highly modular framework runs on both CPU and GPU, allowing you to put your ideas into action in the shortest possible time.

BookApr 2017318 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages