Python Text Processing with NLTK 2.0 Cookbook

More Information
  • Learn Text categorization and Topic identification
  • Learn Stemming and Lemmatization and how to go beyond the usual spell checker
  • Replace negations with antonyms in your text
  • Learn to tokenize words into lists of sentences and words, and gain an insight into WordNet
  • Transform and manipulate chunks and trees
  • Learn advanced features of corpus readers and create your own custom corpora
  • Tag different parts of speech by creating, training, and using a part-of-speech tagger
  • Improve accuracy by combining multiple part-of-speech taggers
  • Learn how to do partial parsing to extract small chunks of text from a part-of-speech tagged sentence
  • Produce an alternative canonical form without changing the meaning by normalizing parsed chunks
  • Learn how search engines use Natural Language Processing to process text
  • Make your site more discoverable by learning how to automatically replace words with more searched equivalents
  • Parse dates, times, and HTML
  • Train and manipulate different types of classifiers

Natural Language Processing is used everywhere – in search engines, spell checkers, mobile phones, computer games – even your washing machine. Python's Natural Language Toolkit (NLTK) suite of libraries has rapidly emerged as one of the most efficient tools for Natural Language Processing. You want to employ nothing less than the best techniques in Natural Language Processing – and this book is your answer.

Python Text Processing with NLTK 2.0 Cookbook is your handy and illustrative guide, which will walk you through all the Natural Language Processing techniques in a step–by-step manner. It will demystify the advanced features of text analysis and text mining using the comprehensive NLTK suite.

This book cuts short the preamble and you dive right into the science of text processing with a practical hands-on approach.

Get started off with learning tokenization of text. Get an overview of WordNet and how to use it. Learn the basics as well as advanced features of Stemming and Lemmatization. Discover various ways to replace words with simpler and more common (read: more searched) variants. Create your own corpora and learn to create custom corpus readers for JSON files as well as for data stored in MongoDB. Use and manipulate POS taggers. Transform and normalize parsed chunks to produce a canonical form without changing their meaning. Dig into feature extraction and text classification. Learn how to easily handle huge amounts of data without any loss in efficiency or speed.

This book will teach you all that and beyond, in a hands-on learn-by-doing manner. Make yourself an expert in using the NLTK for Natural Language Processing with this handy companion.

  • Quickly get to grips with Natural Language Processing – with Text Analysis, Text Mining, and beyond
  • Learn how machines and crawlers interpret and process natural languages
  • Easily work with huge amounts of data and learn how to handle distributed processing
  • Part of Packt's Cookbook series: Each recipe is a carefully organized sequence of instructions to complete the task as efficiently as possible
Page Count 272
Course Length 8 hours 9 minutes
ISBN 9781849513609
Date Of Publication 9 Nov 2010


Jacob Perkins

Jacob Perkins is the cofounder and CTO of Weotta, a local search company. Weotta uses NLP and machine learning to create powerful and easy-to-use natural language search for what to do and where to go.

He is the author of Python Text Processing with NLTK 2.0 Cookbook, Packt Publishing, and has contributed a chapter to the Bad Data Handbook, O'Reilly Media. He writes about NLTK, Python, and other technology topics at

To demonstrate the capabilities of NLTK and natural language processing, he developed, which provides simple demos and NLP APIs for commercial use. He has contributed to various open source projects, including NLTK, and created NLTK-Trainer to simplify the process of training NLTK models. For more information, visit