Natural Language Processing: Python and NLTK

Learn to build expert NLP and machine learning projects using NLTK and other Python libraries

Natural Language Processing: Python and NLTK

Nitin Hardeniya et al.

4 customer reviews
Learn to build expert NLP and machine learning projects using NLTK and other Python libraries
Mapt Subscription
FREE
$29.99/m after trial
eBook
$47.60
RRP $67.99
Save 29%
Print + eBook
$84.99
RRP $84.99
What do I get with a Mapt Pro subscription?
  • Unlimited access to all Packt’s 5,000+ eBooks and Videos
  • Early Access content, Progress Tracking, and Assessments
  • 1 Free eBook or Video to download and keep every month after trial
What do I get with an eBook?
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with Print & eBook?
  • Get a paperback copy of the book delivered to you
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with a Video?
  • Download this Video course in MP4 format
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
$0.00
$47.60
$84.99
$29.99p/m after trial
RRP $67.99
RRP $84.99
Subscription
eBook
Print + eBook
Start 30 Day Trial
Subscribe and access every Packt eBook & Video.
 
  • 5,000+ eBooks & Videos
  • 50+ New titles a month
  • 1 Free eBook/Video to keep every month
Start Free Trial
 
Preview in Mapt

Book Details

ISBN 139781787285101
Paperback702 pages

Book Description

Natural Language Processing is a field of computational linguistics and artificial intelligence that deals with human-computer interaction. It provides a seamless interaction between computers and human beings and gives computers the ability to understand human speech with the help of machine learning. The number of human-computer interaction instances are increasing so it’s becoming imperative that computers comprehend all major natural languages.

The first NLTK Essentials module is an introduction on how to build systems around NLP, with a focus on how to create a customized tokenizer and parser from scratch. You will learn essential concepts of NLP, be given practical insight into open source tool and libraries available in Python, shown how to analyze social media sites, and be given tools to deal with large scale text. This module also provides a workaround using some of the amazing capabilities of Python libraries such as NLTK, scikit-learn, pandas, and NumPy.

The second Python 3 Text Processing with NLTK 3 Cookbook module teaches you the essential techniques of text and language processing with simple, straightforward examples. This includes organizing text corpora, creating your own custom corpus, text classification with a focus on sentiment analysis, and distributed text processing methods.

The third Mastering Natural Language Processing with Python module will help you become an expert and assist you in creating your own NLP projects using NLTK. You will be guided through model development with machine learning tools, shown how to create training data, and given insight into the best practices for designing and building NLP-based applications using Python.

This Learning Path combines some of the best that Packt has to offer in one complete, curated package and is designed to help you quickly learn text processing with Python and NLTK. It includes content from the following Packt products:

  • NTLK essentials by Nitin Hardeniya
  • Python 3 Text Processing with NLTK 3 Cookbook by Jacob Perkins
  • Mastering Natural Language Processing with Python by Deepti Chopra, Nisheeth Joshi, and Iti Mathur

Table of Contents

Chapter 1: Introduction to Natural Language Processing
Why learn NLP?
Let's start playing with Python!
Diving into NLTK
Your turn
Summary
Chapter 2: Text Wrangling and Cleansing
What is text wrangling?
Text cleansing
Sentence splitter
Tokenization
Stemming
Lemmatization
Stop word removal
Rare word removal
Spell correction
Your turn
Summary
Chapter 3: Part of Speech Tagging
What is Part of speech tagging
Named Entity Recognition (NER)
Your Turn
Summary
Chapter 4: Parsing Structure in Text
Shallow versus deep parsing
The two approaches in parsing
Why we need parsing
Different types of parsers
Dependency parsing
Chunking
Information extraction
Summary
Chapter 5: NLP Applications
Building your first NLP application
Other NLP applications
Summary
Chapter 6: Text Classification
Machine learning
Text classification
Sampling
The Random forest algorithm
Text clustering
Topic modeling in text
References
Summary
Chapter 7: Web Crawling
Web crawlers
Writing your first crawler
Data flow in Scrapy
The Sitemap spider
The item pipeline
External references
Summary
Chapter 8: Using NLTK with Other Python Libraries
NumPy
SciPy
pandas
matplotlib
External references
Summary
Chapter 9: Social Media Mining in Python
Data collection
Data extraction
Geovisualization
Summary
Chapter 10: Text Mining at Scale
Different ways of using Python on Hadoop
NLTK on Hadoop
Scikit-learn on Hadoop
PySpark
Summary
Chapter 11: Tokenizing Text and WordNet Basics
Introduction
Tokenizing text into sentences
Tokenizing sentences into words
Tokenizing sentences using regular expressions
Training a sentence tokenizer
Filtering stopwords in a tokenized sentence
Looking up Synsets for a word in WordNet
Looking up lemmas and synonyms in WordNet
Calculating WordNet Synset similarity
Discovering word collocations
Chapter 12: Replacing and Correcting Words
Introduction
Stemming words
Lemmatizing words with WordNet
Replacing words matching regular expressions
Removing repeating characters
Spelling correction with Enchant
Replacing synonyms
Replacing negations with antonyms
Chapter 13: Creating Custom Corpora
Introduction
Setting up a custom corpus
Creating a wordlist corpus
Creating a part-of-speech tagged word corpus
Creating a chunked phrase corpus
Creating a categorized text corpus
Creating a categorized chunk corpus reader
Lazy corpus loading
Creating a custom corpus view
Creating a MongoDB-backed corpus reader
Corpus editing with file locking
Chapter 14: Part-of-speech Tagging
Introduction
Default tagging
Training a unigram part-of-speech tagger
Combining taggers with backoff tagging
Training and combining ngram taggers
Creating a model of likely word tags
Tagging with regular expressions
Affix tagging
Training a Brill tagger
Training the TnT tagger
Using WordNet for tagging
Tagging proper names
Classifier-based tagging
Training a tagger with NLTK-Trainer
Chapter 15: Extracting Chunks
Introduction
Chunking and chinking with regular expressions
Merging and splitting chunks with regular expressions
Expanding and removing chunks with regular expressions
Partial parsing with regular expressions
Training a tagger-based chunker
Classification-based chunking
Extracting named entities
Extracting proper noun chunks
Extracting location chunks
Training a named entity chunker
Training a chunker with NLTK-Trainer
Chapter 16: Transforming Chunks and Trees
Introduction
Filtering insignificant words from a sentence
Correcting verb forms
Swapping verb phrases
Swapping noun cardinals
Swapping infinitive phrases
Singularizing plural nouns
Chaining chunk transformations
Converting a chunk tree to text
Flattening a deep tree
Creating a shallow tree
Converting tree labels
Chapter 17: Text Classification
Introduction
Bag of words feature extraction
Training a Naive Bayes classifier
Training a decision tree classifier
Training a maximum entropy classifier
Training scikit-learn classifiers
Measuring precision and recall of a classifier
Calculating high information words
Combining classifiers with voting
Classifying with multiple binary classifiers
Training a classifier with NLTK-Trainer
Chapter 18: Distributed Processing and Handling Large Datasets
Introduction
Distributed tagging with execnet
Distributed chunking with execnet
Parallel list processing with execnet
Storing a frequency distribution in Redis
Storing a conditional frequency distribution in Redis
Storing an ordered dictionary in Redis
Distributed word scoring with Redis and execnet
Chapter 19: Parsing Specific Data Types
Introduction
Parsing dates and times with dateutil
Timezone lookup and conversion
Extracting URLs from HTML with lxml
Cleaning and stripping HTML
Converting HTML entities with BeautifulSoup
Detecting and converting character encodings
Chapter 20: Working with Strings
Tokenization
Normalization
Substituting and correcting tokens
Applying Zipf's law to text
Similarity measures
Summary
Chapter 21: Statistical Language Modeling
Understanding word frequency
Applying smoothing on the MLE model
Develop a back-off mechanism for MLE
Applying interpolation on data to get mix and match
Evaluate a language model through perplexity
Applying metropolis hastings in modeling languages
Applying Gibbs sampling in language processing
Summary
Chapter 22: Morphology – Getting Our Feet Wet
Introducing morphology
Understanding stemmer
Understanding lemmatization
Developing a stemmer for non-English language
Morphological analyzer
Morphological generator
Search engine
Summary
Chapter 23: Parts-of-Speech Tagging – Identifying Words
Introducing parts-of-speech tagging
Creating POS-tagged corpora
Selecting a machine learning algorithm
Statistical modeling involving the n-gram approach
Developing a chunker using pos-tagged corpora
Summary
Chapter 24: Parsing – Analyzing Training Data
Introducing parsing
Treebank construction
Extracting Context Free Grammar (CFG) rules from Treebank
Creating a probabilistic Context Free Grammar from CFG
CYK chart parsing algorithm
Earley chart parsing algorithm
Summary
Chapter 25: Semantic Analysis – Meaning Matters
Introducing semantic analysis
Generation of the synset id from Wordnet
Disambiguating senses using Wordnet
Summary
Chapter 26: Sentiment Analysis – I Am Happy
Introducing sentiment analysis
Summary
Chapter 27: Information Retrieval – Accessing Information
Introducing information retrieval
Vector space scoring and query operator interaction
Developing an IR system using latent semantic indexing
Text summarization
Question-answering system
Summary
Chapter 28: Discourse Analysis – Knowing Is Believing
Introducing discourse analysis
Summary
Chapter 29: Evaluation of NLP Systems – Analyzing Performance
The need for evaluation of NLP systems
Evaluation of IR system
Metrics for error identification
Metrics based on lexical matching
Metrics based on syntactic matching
Metrics using shallow semantic matching
Summary

What You Will Learn

  • The scope of natural language complexity and how they are processed by machines
  • Clean and wrangle text using tokenization and chunking to help you process data better
  • Tokenize text into sentences and sentences into words
  • Classify text and perform sentiment analysis
  • Implement string matching algorithms and normalization techniques
  • Understand and implement the concepts of information retrieval and text summarization
  • Find out how to implement various NLP tasks in Python

Authors

Table of Contents

Chapter 1: Introduction to Natural Language Processing
Why learn NLP?
Let's start playing with Python!
Diving into NLTK
Your turn
Summary
Chapter 2: Text Wrangling and Cleansing
What is text wrangling?
Text cleansing
Sentence splitter
Tokenization
Stemming
Lemmatization
Stop word removal
Rare word removal
Spell correction
Your turn
Summary
Chapter 3: Part of Speech Tagging
What is Part of speech tagging
Named Entity Recognition (NER)
Your Turn
Summary
Chapter 4: Parsing Structure in Text
Shallow versus deep parsing
The two approaches in parsing
Why we need parsing
Different types of parsers
Dependency parsing
Chunking
Information extraction
Summary
Chapter 5: NLP Applications
Building your first NLP application
Other NLP applications
Summary
Chapter 6: Text Classification
Machine learning
Text classification
Sampling
The Random forest algorithm
Text clustering
Topic modeling in text
References
Summary
Chapter 7: Web Crawling
Web crawlers
Writing your first crawler
Data flow in Scrapy
The Sitemap spider
The item pipeline
External references
Summary
Chapter 8: Using NLTK with Other Python Libraries
NumPy
SciPy
pandas
matplotlib
External references
Summary
Chapter 9: Social Media Mining in Python
Data collection
Data extraction
Geovisualization
Summary
Chapter 10: Text Mining at Scale
Different ways of using Python on Hadoop
NLTK on Hadoop
Scikit-learn on Hadoop
PySpark
Summary
Chapter 11: Tokenizing Text and WordNet Basics
Introduction
Tokenizing text into sentences
Tokenizing sentences into words
Tokenizing sentences using regular expressions
Training a sentence tokenizer
Filtering stopwords in a tokenized sentence
Looking up Synsets for a word in WordNet
Looking up lemmas and synonyms in WordNet
Calculating WordNet Synset similarity
Discovering word collocations
Chapter 12: Replacing and Correcting Words
Introduction
Stemming words
Lemmatizing words with WordNet
Replacing words matching regular expressions
Removing repeating characters
Spelling correction with Enchant
Replacing synonyms
Replacing negations with antonyms
Chapter 13: Creating Custom Corpora
Introduction
Setting up a custom corpus
Creating a wordlist corpus
Creating a part-of-speech tagged word corpus
Creating a chunked phrase corpus
Creating a categorized text corpus
Creating a categorized chunk corpus reader
Lazy corpus loading
Creating a custom corpus view
Creating a MongoDB-backed corpus reader
Corpus editing with file locking
Chapter 14: Part-of-speech Tagging
Introduction
Default tagging
Training a unigram part-of-speech tagger
Combining taggers with backoff tagging
Training and combining ngram taggers
Creating a model of likely word tags
Tagging with regular expressions
Affix tagging
Training a Brill tagger
Training the TnT tagger
Using WordNet for tagging
Tagging proper names
Classifier-based tagging
Training a tagger with NLTK-Trainer
Chapter 15: Extracting Chunks
Introduction
Chunking and chinking with regular expressions
Merging and splitting chunks with regular expressions
Expanding and removing chunks with regular expressions
Partial parsing with regular expressions
Training a tagger-based chunker
Classification-based chunking
Extracting named entities
Extracting proper noun chunks
Extracting location chunks
Training a named entity chunker
Training a chunker with NLTK-Trainer
Chapter 16: Transforming Chunks and Trees
Introduction
Filtering insignificant words from a sentence
Correcting verb forms
Swapping verb phrases
Swapping noun cardinals
Swapping infinitive phrases
Singularizing plural nouns
Chaining chunk transformations
Converting a chunk tree to text
Flattening a deep tree
Creating a shallow tree
Converting tree labels
Chapter 17: Text Classification
Introduction
Bag of words feature extraction
Training a Naive Bayes classifier
Training a decision tree classifier
Training a maximum entropy classifier
Training scikit-learn classifiers
Measuring precision and recall of a classifier
Calculating high information words
Combining classifiers with voting
Classifying with multiple binary classifiers
Training a classifier with NLTK-Trainer
Chapter 18: Distributed Processing and Handling Large Datasets
Introduction
Distributed tagging with execnet
Distributed chunking with execnet
Parallel list processing with execnet
Storing a frequency distribution in Redis
Storing a conditional frequency distribution in Redis
Storing an ordered dictionary in Redis
Distributed word scoring with Redis and execnet
Chapter 19: Parsing Specific Data Types
Introduction
Parsing dates and times with dateutil
Timezone lookup and conversion
Extracting URLs from HTML with lxml
Cleaning and stripping HTML
Converting HTML entities with BeautifulSoup
Detecting and converting character encodings
Chapter 20: Working with Strings
Tokenization
Normalization
Substituting and correcting tokens
Applying Zipf's law to text
Similarity measures
Summary
Chapter 21: Statistical Language Modeling
Understanding word frequency
Applying smoothing on the MLE model
Develop a back-off mechanism for MLE
Applying interpolation on data to get mix and match
Evaluate a language model through perplexity
Applying metropolis hastings in modeling languages
Applying Gibbs sampling in language processing
Summary
Chapter 22: Morphology – Getting Our Feet Wet
Introducing morphology
Understanding stemmer
Understanding lemmatization
Developing a stemmer for non-English language
Morphological analyzer
Morphological generator
Search engine
Summary
Chapter 23: Parts-of-Speech Tagging – Identifying Words
Introducing parts-of-speech tagging
Creating POS-tagged corpora
Selecting a machine learning algorithm
Statistical modeling involving the n-gram approach
Developing a chunker using pos-tagged corpora
Summary
Chapter 24: Parsing – Analyzing Training Data
Introducing parsing
Treebank construction
Extracting Context Free Grammar (CFG) rules from Treebank
Creating a probabilistic Context Free Grammar from CFG
CYK chart parsing algorithm
Earley chart parsing algorithm
Summary
Chapter 25: Semantic Analysis – Meaning Matters
Introducing semantic analysis
Generation of the synset id from Wordnet
Disambiguating senses using Wordnet
Summary
Chapter 26: Sentiment Analysis – I Am Happy
Introducing sentiment analysis
Summary
Chapter 27: Information Retrieval – Accessing Information
Introducing information retrieval
Vector space scoring and query operator interaction
Developing an IR system using latent semantic indexing
Text summarization
Question-answering system
Summary
Chapter 28: Discourse Analysis – Knowing Is Believing
Introducing discourse analysis
Summary
Chapter 29: Evaluation of NLP Systems – Analyzing Performance
The need for evaluation of NLP systems
Evaluation of IR system
Metrics for error identification
Metrics based on lexical matching
Metrics based on syntactic matching
Metrics using shallow semantic matching
Summary

Book Details

ISBN 139781787285101
Paperback702 pages
Read More
From 4 reviews

Read More Reviews

Recommended for You

Python Machine Learning Book Cover
Python Machine Learning
$ 35.99
$ 25.20
Python Machine Learning Blueprints: Intuitive data projects you can relate to Book Cover
Python Machine Learning Blueprints: Intuitive data projects you can relate to
$ 39.99
$ 28.00
Practical Machine Learning Book Cover
Practical Machine Learning
$ 37.99
$ 26.60
Practical Data Science Cookbook Book Cover
Practical Data Science Cookbook
$ 29.99
$ 21.00
Identifying Behaviour Patterns using Machine Learning Techniques [Video] Book Cover
Identifying Behaviour Patterns using Machine Learning Techniques [Video]
$ 124.99
$ 106.25
From 0 to 1: Data Structures & Algorithms in Java [Video] Book Cover
From 0 to 1: Data Structures & Algorithms in Java [Video]
$ 32.99
$ 28.05