Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Hands-On Natural Language Processing with Python
Hands-On Natural Language Processing with Python

Hands-On Natural Language Processing with Python: A practical guide to applying deep learning architectures to your NLP applications

By Rajesh Arumugam , Rajalingappaa Shanmugamani , Auguste Byiringiro , Chaitanya Joshi , Karthik Muthuswamy
$35.99 $24.99
Book Jul 2018 312 pages 1st Edition
eBook
$35.99 $24.99
Print
$43.99
Subscription
$15.99 Monthly
eBook
$35.99 $24.99
Print
$43.99
Subscription
$15.99 Monthly

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Buy Now

Product Details


Publication date : Jul 18, 2018
Length 312 pages
Edition : 1st Edition
Language : English
ISBN-13 : 9781789139495
Category :
Table of content icon View table of contents Preview book icon Preview Book

Hands-On Natural Language Processing with Python

Getting Started

Natural language processing (NLP) is the field of understanding human language using computers. It involves the analysis and of large volumes of natural language data using computers to glean meaning and value for consumption in real-world applications. While NLP has been around since the 1950s, there has been tremendous growth in practical applications in the area, due to recent advances in machine learning (ML) and deep learning. The majority of this book will focus on various real-world applications of NLP, such as text classification, and sub-tasks of NLP, such as Named Entity Recognition (NER), with a particular emphasis on deep learning approaches. In this chapter, we will first introduce the basic concepts and terms in NLP. Following this, we will discuss some of the currently used applications that leverage NLP.

Basic concepts and terminologies in NLP

The following are some of the important terminologies and concepts in NLP mostly related to the language data. Getting familiar with these terms and concepts will help the reader in getting up to speed in understanding the contents in later chapters of the book:

  • Text corpus or corpora
  • Paragraph
  • Sentences
  • Phrases and words
  • N-grams
  • Bag-of-words

We will explain these in the following sections.

Text corpus or corpora

The language data that all NLP tasks depend upon is called the text corpus or simply corpus. A corpus is a large set of text data that can be in one of the languages like English, French, and so on. The corpus can consist of a single document or a bunch of documents. The source of the text corpus can be social network sites like Twitter, blog sites, open discussion forums like Stack Overflow, books, and several others. In some of the tasks like machine translation, we would require a multilingual corpus. For example we might need both the English and French translations of the same document content for developing a machine translation model. For speech tasks, we would also need human voice recordings and the corresponding transcribed corpus.

In most of the later chapters, we will be using text corpus and speech recordings available from the internet or open source data repositories. For many of the NLP task, the corpus is split into chunks for further analysis. These chunks could be at the paragraph, sentence, or word level. We will touch upon these in the following sections.

Paragraph

A paragraph is the largest unit of text handled by an NLP task. Paragraph level boundaries by itself may not be much use unless broken down into sentences. Though sometimes the paragraph may be considered as context boundaries. Tokenizers that can split a document into paragraphs are available in some of the Python libraries. We will look at such tokenizers in later chapters.

Sentences

Sentences are the next level of lexical unit of language data. A sentence encapsulates a complete meaning or thought and context. It is usually extracted from a paragraph based on boundaries determined by punctuations like period. The sentence may also convey opinion or sentiment expressed in it. In general, sentences consists of parts of speech (POS) entities like nouns, verbs, adjectives, and so on. There are tokenizers available to split paragraphs to sentences based on punctuations.

Phrases and words

Phrases are a group of consecutive words within a sentence that can convey a specific meaning. For example, in the sentence Tomorrow is going to be a rainy day the part going to be a rainy day expresses a specific thought. Some of the NLP tasks extract key phrases from sentences for search and retrieval applications. The next smallest unit of text is the word. The common tokenizers split sentences into text based on punctuations like spaces and comma. One of the problems with NLP is ambiguity in the meaning of same words used in different context. We will later see how this is handled well when we discuss word embeddings.

N-grams

A sequence of characters or words forms an N-gram. For example, character unigram consists of a single character, a bigram consists of a sequence of two characters and so on. Similarly word N-grams consists of a sequence of n words. In NLP, N-grams are used as features for tasks like text classification.

Bag-of-words

Bag-of-words in contrast to N-grams does not consider word order or sequence. It captures the word occurrence frequencies in the text corpus. Bag-of-words is also used as features in tasks like sentiment analysis and topic identification.

In the following sections, we will look at an overview of the following applications of NLP:

  • Analyzing sentiment
  • Recognizing named entities
  • Linking entities
  • Translating text
  • Natural language interfaces
  • Semantic Role Labeling
  • Relation extraction
  • SQL query generation, or semantic parsing
  • Machine Comprehension
  • Textual entailment
  • Coreference resolution
  • Searching
  • Question answering and chatbots
  • Converting text to voice
  • Converting voice to text
  • Speaker identification
  • Spoken dialog systems
  • Other applications

Applications of NLP

In this section, we will provide an overview of the major applications of NLP. While the topics listed here are not quite exhaustive, they will give the reader a sense of the wide range of applications where NLP is used.

Analyzing sentiment

The sentiment in a sentence or text reflects the overall positive, negative, or neutral opinion or thought of the person who produces or consumes it. It indicates whether a person is happy, unhappy, or neutral about the subject or context that describes the text. It can be quantified as a discrete value, such as 1 for happy, -1 for unhappy, and 0 for neutral, or it can be quantified on a continuous scale of values, from 0-1. Sentiment analysis, therefore, is the process of deriving this value from a piece of text that can be obtained from different data sources, such as social networks, product reviews, news articles, and so on. One real-world application of sentiment analysis is in social network data to derive actionable insights, such as customer satisfaction, product or brand popularity, fashion trends, and so on. The screenshot that follows shows one of the applications of sentiment analysis, in capturing the overall opinion of a particular news article about Google. The reader may refer to the application, or API, from Google Cloud at https://cloud.google.com/natural-language/:

The preceding screenshot indicates that sentiment data is captured for the whole document, as well as at the individual sentence level.

Recognizing named entities

NER is a type of text annotation task. In NER, words or tokens in a piece of text are labeled or annotated into categories, such as organizations, locations, people, and so on. In effect, NER converts unstructured text data into structured data that can later be used for further analysis. The following screenshot is a visualization from the Google Cloud API. The reader can try out the API with the link provided in the preceding subsection:

The output result in the preceding screenshot shows how the different entities, such as ORGANISATION (Google), PERSON (Sundar Pitchai), EVENT (CONSUMER ELECTRONICS SHOW), and so on, are automatically extracted from the unstructured raw text by NER. The output also gives the sentiment for each label or category, based on sentiment analysis. The reader can experiment with different text using the link provided earlier. When we click on the Categories tab, we can see the following:

The preceding screenshot shows how the system also classifies a particular piece of text into Computer & Electronics, News, and so on, using the recognized named entities in the text. Such a categorization, called topic modeling, is another important NLP task, used to identify the main theme or topic of a sentence or document.

Linking entities

Another practical application is entity linking. One good example of it can be found in the Microsoft Azure Text Analytics API, at https://azure.microsoft.com/is-is/services/cognitive-services/text-analytics/. The following screenshot shows output from a sample text:

The preceding screenshot shows how the system has automatically extracted the entity Seattle as a place. Interestingly, it has also correctly extracted the Space Needle as a landmark place, by linking it with Seattle. This shows how powerful named entity linking can be when extracting useful relationships between entities.

Translating text

Machine translation is the task of translating a given piece of text from one language to another target language. The language of the task is first identified, and then translated into the target language. The translation app from Google has proven very useful for traveling and has taken down language barriers. The latest techniques have improved the translation accuracy by a large margin.

Following is an example of translation from Chinese to English, using Google Translate at https://cloud.google.com/translate/:

The preceding screenshot also shows the JSON response of the translated text, when we use the Translation API service from Google (https://cdn-images-1.medium.com/max/1600/1*3f4l4lrLFFhgvVsvjhNzAQ.jpeg).

Natural Language Inference

Natural Language Inference (NLI) tasks classify the relationship between a premise and hypothesis. During inference, a premise and hypothesis are given as input to output whether a hypothesis is true based on a given premise.

Semantic Role Labeling

Semantic Role Labeling (SRL) determines the relationship between a given sentence and a predicate, such as a verb. Sometimes, the inference is provided as a question. An example of a role might be: where or when did something happen? The following is a visualization from http://demo.allennlp.org/semantic-role-labeling:

The preceding visualization shows semantic labeling, which created semantic associations between the different pieces of text, such as The keys being needed for the purpose to access the building. The reader may experiment with different examples using the URL link provided earlier.

Relation extraction

Relation extraction predicts a relationship when a text and type of relation are provided. There may be cases where the relationships can't be extracted. The following screenshot shows an example of relation extraction, based on predicates and objects:

Example of relation extraction

The preceding example shows relationship extraction from the sample text to a subject, a predicate, and objects.

SQL query generation, or semantic parsing

Semantic parsing helps to convert a natural language into SQL queries in order to query a database. The following screenshot shows an example of converting a free text query to a DBpedia database SPARQL query, which is quite similar to SQL:

The preceding visualization shows the query Who is Tom Cruise, converted into a SPARQL query at the bottom. You can experiment with other queries at http://quepy.machinalis.com/.

Machine Comprehension

Machine Comprehension (MC) answers questions from a paragraph. It is akin to school children doing comprehension tests. The following screenshot is a visualization from http://demo.allennlp.org/machine-comprehension, for the question Who stars in The Matrix? The answer is shown in the screenshot, along with the paragraph:

We can also see a visualization of how the model works, by highlighting certain words:

Textual Entailment

Textual Entailment (TE) predicts whether the facts in different texts are the same. The following is a visualization from http://demo.allennlp.org/textual-entailment:

The premise is: If you help the needy, God will reward you. The hypothesis is: Giving money to the poor has good consequences. The probabilities of Entailment, Contradiction, and Neutral are presented.

Coreference resolution

Searching

Question answering and chatbots

For question answering systems, a context is supplied with a question in order to generate an answer. The schema of a chatbot is shown in the following screenshot from https://aws.amazon.com/lex/details/:

Chatbots are application-specific, as integration varies among applications.

Converting text-to-voice

Sometimes, a text has to be converted into a voice. It can be useful for a personal bot to speak back to a user.

Let's look at how to use the AWS API for text-to-speech Amazon Polly. With the API, you can pass the text and convert it to speech. The audio file can be either streamed or downloaded.

The voice should be natural sounding, to connect with the user. Google can provide this in 30 different voices, in 12 different languages. The speed and pitch can be adjusted. Go to https://cloud.google.com/text-to-speech/ and try out a demo. The following screenshot shows an example; all of the parameters can be tuned:

Tuning of parameters

The requests can be sent from any connected device, such as a mobile, car, TV, and so on. It can be used for customer service, presenting educational text, or for animation content.

Converting voice-to-text

Sometimes, a voice has to be converted to text. This is a speech recognition problem. The Google speech recognition system works in 120 languages. The audio can be streamed, or a prerecorded video can be sent. Formatting can be done for different categories, such as proper nouns and punctuation. The following example is from https://cloud.google.com/speech-to-text/:

There are different models provided, for videos, phone calls, and search-based audio. This works even when there is background noise, and the system can filter inappropriate content.

Speaker identification

Spoken dialog systems

Home assistants, such as Google Voice, Apple's Siri, and Amazon's Alexa, are examples of spoken dialog systems. All of the applications, such as chatbots, voice-to-text, text-to-voice, speaker identification, and searching, can be combined to form the experience of spoken dialog systems.

Other applications

There are several other applications of NLP; the following is a list of some of them:

  • Detecting spam: The emails that we receive can be classified as spam or not spam.
  • News classification: It can be useful to classify a news item based on several categories.
  • Identifying the speakers, gender, or age: From a piece of text, the writer's gender and age can be detected. Similar attributes can be marked with voice data.
  • Discovering topics: The topics of an article can be identified.
  • Generating text: Text generated by machines has a lot of interesting applications.
  • Finding duplicates: Skype has launched a live translation feature that involves speech to text, machine translation, and text-to-speech.
  • Summarizing text: The summarization task takes a text as input and outputs a summary of that text. The summary is usually much shorter than the original text. For example, after a meeting, the transcript text can be summarized and sent to everyone.
  • Comprehending paragraphs: Paragraph comprehension is the high school task of answering questions, with respect to a given piece of prose.
  • Constituency parsing: Constituency parsing predicts a tree composition of a sentence into its constituents http://demo.allennlp.org/constituency-parsing.

Summary

In this chapter, you learned the basics of NLP and saw several applications where it can be useful. You saw some APIs provided by cloud providers that are used to access NLP applications. Having seen the cloud products, you will learn the science behind the applications in future chapters.

In the next chapter, we will look at the basics of the NLTK library. We will cover basic feature engineering and will program a simple task for text classification.

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • • Weave neural networks into linguistic applications across various platforms
  • • Perform NLP tasks and train its models using NLTK and TensorFlow
  • • Boost your NLP models with strong deep learning architectures such as CNNs and RNNs

Description

Natural language processing (NLP) has found its application in various domains, such as web search, advertisements, and customer services, and with the help of deep learning, we can enhance its performances in these areas. Hands-On Natural Language Processing with Python teaches you how to leverage deep learning models for performing various NLP tasks, along with best practices in dealing with today’s NLP challenges. To begin with, you will understand the core concepts of NLP and deep learning, such as Convolutional Neural Networks (CNNs), recurrent neural networks (RNNs), semantic embedding, Word2vec, and more. You will learn how to perform each and every task of NLP using neural networks, in which you will train and deploy neural networks in your NLP applications. You will get accustomed to using RNNs and CNNs in various application areas, such as text classification and sequence labeling, which are essential in the application of sentiment analysis, customer service chatbots, and anomaly detection. You will be equipped with practical knowledge in order to implement deep learning in your linguistic applications using Python's popular deep learning library, TensorFlow. By the end of this book, you will be well versed in building deep learning-backed NLP applications, along with overcoming NLP challenges with best practices developed by domain experts.

What you will learn

•Implement semantic embedding of words to classify and find entities •Convert words to vectors by training in order to perform arithmetic operations •Train a deep learning model to detect classification of tweets and news •Implement a question-answer model with search and RNN models •Train models for various text classification datasets using CNN •Implement WaveNet a deep generative model for producing a natural-sounding voice •Convert voice-to-text and text-to-voice •Train a model to convert speech-to-text using DeepSpeech

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Buy Now

Product Details


Publication date : Jul 18, 2018
Length 312 pages
Edition : 1st Edition
Language : English
ISBN-13 : 9781789139495
Category :

Table of Contents

15 Chapters
Preface Chevron down icon Chevron up icon
Getting Started Chevron down icon Chevron up icon
Text Classification and POS Tagging Using NLTK Chevron down icon Chevron up icon
Deep Learning and TensorFlow Chevron down icon Chevron up icon
Semantic Embedding Using Shallow Models Chevron down icon Chevron up icon
Text Classification Using LSTM Chevron down icon Chevron up icon
Searching and DeDuplicating Using CNNs Chevron down icon Chevron up icon
Named Entity Recognition Using Character LSTM Chevron down icon Chevron up icon
Text Generation and Summarization Using GRUs Chevron down icon Chevron up icon
Question-Answering and Chatbots Using Memory Networks Chevron down icon Chevron up icon
Machine Translation Using the Attention-Based Model Chevron down icon Chevron up icon
Speech Recognition Using DeepSpeech Chevron down icon Chevron up icon
Text-to-Speech Using Tacotron Chevron down icon Chevron up icon
Deploying Trained Models Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon

Customer reviews

Filter icon Filter
Top Reviews
Rating distribution
Empty star icon Empty star icon Empty star icon Empty star icon Empty star icon 0
(0 Ratings)
5 star 0%
4 star 0%
3 star 0%
2 star 0%
1 star 0%

Filter reviews by


No reviews found
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

How do I buy and download an eBook? Chevron down icon Chevron up icon

Where there is an eBook version of a title available, you can buy it from the book details for that title. Add either the standalone eBook or the eBook and print book bundle to your shopping cart. Your eBook will show in your cart as a product on its own. After completing checkout and payment in the normal way, you will receive your receipt on the screen containing a link to a personalised PDF download file. This link will remain active for 30 days. You can download backup copies of the file by logging in to your account at any time.

If you already have Adobe reader installed, then clicking on the link will download and open the PDF file directly. If you don't, then save the PDF file on your machine and download the Reader to view it.

Please Note: Packt eBooks are non-returnable and non-refundable.

Packt eBook and Licensing When you buy an eBook from Packt Publishing, completing your purchase means you accept the terms of our licence agreement. Please read the full text of the agreement. In it we have tried to balance the need for the ebook to be usable for you the reader with our needs to protect the rights of us as Publishers and of our authors. In summary, the agreement says:

  • You may make copies of your eBook for your own use onto any machine
  • You may not pass copies of the eBook on to anyone else
How can I make a purchase on your website? Chevron down icon Chevron up icon

If you want to purchase a video course, eBook or Bundle (Print+eBook) please follow below steps:

  1. Register on our website using your email address and the password.
  2. Search for the title by name or ISBN using the search option.
  3. Select the title you want to purchase.
  4. Choose the format you wish to purchase the title in; if you order the Print Book, you get a free eBook copy of the same title. 
  5. Proceed with the checkout process (payment to be made using Credit Card, Debit Cart, or PayPal)
Where can I access support around an eBook? Chevron down icon Chevron up icon
  • If you experience a problem with using or installing Adobe Reader, the contact Adobe directly.
  • To view the errata for the book, see www.packtpub.com/support and view the pages for the title you have.
  • To view your account details or to download a new copy of the book go to www.packtpub.com/account
  • To contact us directly if a problem is not resolved, use www.packtpub.com/contact-us
What eBook formats do Packt support? Chevron down icon Chevron up icon

Our eBooks are currently available in a variety of formats such as PDF and ePubs. In the future, this may well change with trends and development in technology, but please note that our PDFs are not Adobe eBook Reader format, which has greater restrictions on security.

You will need to use Adobe Reader v9 or later in order to read Packt's PDF eBooks.

What are the benefits of eBooks? Chevron down icon Chevron up icon
  • You can get the information you need immediately
  • You can easily take them with you on a laptop
  • You can download them an unlimited number of times
  • You can print them out
  • They are copy-paste enabled
  • They are searchable
  • There is no password protection
  • They are lower price than print
  • They save resources and space
What is an eBook? Chevron down icon Chevron up icon

Packt eBooks are a complete electronic version of the print edition, available in PDF and ePub formats. Every piece of content down to the page numbering is the same. Because we save the costs of printing and shipping the book to you, we are able to offer eBooks at a lower cost than print editions.

When you have purchased an eBook, simply login to your account and click on the link in Your Download Area. We recommend you saving the file to your hard drive before opening it.

For optimal viewing of our eBooks, we recommend you download and install the free Adobe Reader version 9.