Reader small image

You're reading from  Python 3 Text Processing with NLTK 3 Cookbook

Product typeBook
Published inAug 2014
Reading LevelBeginner
Publisher
ISBN-139781782167853
Edition1st Edition
Languages
Tools
Right arrow
Author (1)
Jacob Perkins
Jacob Perkins
author image
Jacob Perkins

Jacob Perkins is the cofounder and CTO of Weotta, a local search company. Weotta uses NLP and machine learning to create powerful and easy-to-use natural language search for what to do and where to go. He is the author of Python Text Processing with NLTK 2.0 Cookbook, Packt Publishing, and has contributed a chapter to the Bad Data Handbook, O'Reilly Media. He writes about NLTK, Python, and other technology topics at http://streamhacker.com. To demonstrate the capabilities of NLTK and natural language processing, he developed http://text-processing.com, which provides simple demos and NLP APIs for commercial use. He has contributed to various open source projects, including NLTK, and created NLTK-Trainer to simplify the process of training NLTK models. For more information, visit https://github.com/japerk/nltk-trainer.
Read more about Jacob Perkins

Right arrow

Singularizing plural nouns


As we saw in the previous recipe, the transformation process can result in phrases such as recipes book. This is a NNS followed by a NN, when a more proper version of the phrase would be recipe book, which is a NN followed by another NN. We can do another transform to correct these improper plural nouns.

How to do it...

The transforms.py script defines a function called singularize_plural_noun() which will depluralize a plural noun (tagged with NNS) that is followed by another noun:

def singularize_plural_noun(chunk):
  nnsidx = first_chunk_index(chunk, tag_equals('NNS'))

  if nnsidx is not None and nnsidx+1 < len(chunk) and chunk[nnsidx+1][1][:2] == 'NN':
    noun, nnstag = chunk[nnsidx]
    chunk[nnsidx] = (noun.rstrip('s'), nnstag.rstrip('S'))

  return chunk

And using it on recipes book, we get the more correct form, recipe book.

>>> singularize_plural_noun([('recipes', 'NNS'), ('book', 'NN')])
[('recipe', 'NN'), ('book', 'NN')]

How it works...

We start...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Python 3 Text Processing with NLTK 3 Cookbook
Published in: Aug 2014Publisher: ISBN-13: 9781782167853

Author (1)

author image
Jacob Perkins

Jacob Perkins is the cofounder and CTO of Weotta, a local search company. Weotta uses NLP and machine learning to create powerful and easy-to-use natural language search for what to do and where to go. He is the author of Python Text Processing with NLTK 2.0 Cookbook, Packt Publishing, and has contributed a chapter to the Bad Data Handbook, O'Reilly Media. He writes about NLTK, Python, and other technology topics at http://streamhacker.com. To demonstrate the capabilities of NLTK and natural language processing, he developed http://text-processing.com, which provides simple demos and NLP APIs for commercial use. He has contributed to various open source projects, including NLTK, and created NLTK-Trainer to simplify the process of training NLTK models. For more information, visit https://github.com/japerk/nltk-trainer.
Read more about Jacob Perkins