Reader small image

You're reading from  Python 3 Text Processing with NLTK 3 Cookbook

Product typeBook
Published inAug 2014
Reading LevelBeginner
Publisher
ISBN-139781782167853
Edition1st Edition
Languages
Tools
Right arrow
Author (1)
Jacob Perkins
Jacob Perkins
author image
Jacob Perkins

Jacob Perkins is the cofounder and CTO of Weotta, a local search company. Weotta uses NLP and machine learning to create powerful and easy-to-use natural language search for what to do and where to go. He is the author of Python Text Processing with NLTK 2.0 Cookbook, Packt Publishing, and has contributed a chapter to the Bad Data Handbook, O'Reilly Media. He writes about NLTK, Python, and other technology topics at http://streamhacker.com. To demonstrate the capabilities of NLTK and natural language processing, he developed http://text-processing.com, which provides simple demos and NLP APIs for commercial use. He has contributed to various open source projects, including NLTK, and created NLTK-Trainer to simplify the process of training NLTK models. For more information, visit https://github.com/japerk/nltk-trainer.
Read more about Jacob Perkins

Right arrow

Using WordNet for tagging


If you remember from the Looking up Synsets for a word in WordNet recipe in Chapter 1, Tokenizing Text and WordNet Basics, WordNet Synsets specify a part-of-speech tag. It's a very restricted set of possible tags, and many words have multiple Synsets with different part-of-speech tags, but this information can be useful for tagging unknown words. WordNet is essentially a giant dictionary, and it's likely to contain many words that are not in your training data.

Getting ready

First, we need to decide how to map WordNet part-of-speech tags to the Penn Treebank part-of-speech tags we've been using. The following is a table mapping one to the other. See the Looking up Synsets for a word in WordNet recipe in Chapter 1, Tokenizing Text and WordNet Basics, for more details. The s, which was not shown before, is just another kind of adjective, at least for tagging purposes.

WordNet tag

Treebank tag

n

NN

a

JJ

s

JJ

r

RB

v

VB

How to do it...

Now we can create a class...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Python 3 Text Processing with NLTK 3 Cookbook
Published in: Aug 2014Publisher: ISBN-13: 9781782167853

Author (1)

author image
Jacob Perkins

Jacob Perkins is the cofounder and CTO of Weotta, a local search company. Weotta uses NLP and machine learning to create powerful and easy-to-use natural language search for what to do and where to go. He is the author of Python Text Processing with NLTK 2.0 Cookbook, Packt Publishing, and has contributed a chapter to the Bad Data Handbook, O'Reilly Media. He writes about NLTK, Python, and other technology topics at http://streamhacker.com. To demonstrate the capabilities of NLTK and natural language processing, he developed http://text-processing.com, which provides simple demos and NLP APIs for commercial use. He has contributed to various open source projects, including NLTK, and created NLTK-Trainer to simplify the process of training NLTK models. For more information, visit https://github.com/japerk/nltk-trainer.
Read more about Jacob Perkins