Reader small image

You're reading from  Hands-On Natural Language Processing with PyTorch 1.x

Product typeBook
Published inJul 2020
Reading LevelBeginner
PublisherPackt
ISBN-139781789802740
Edition1st Edition
Languages
Tools
Right arrow
Author (1)
Thomas Dop
Thomas Dop
author image
Thomas Dop

Thomas Dop is a data scientist at MagicLab, a company that creates leading dating apps, including Bumble and Badoo. He works on a variety of areas within data science, including NLP, deep learning, computer vision, and predictive modeling. He holds an MSc in data science from the University of Amsterdam.
Read more about Thomas Dop

Right arrow

Stemming and lemmatization

In language, inflection is how different grammatical categories such as tense, mood, or gender can be expressed by modifying a common root word. This often involves changing the prefix or suffix of a word but can also involve modifying the entire word. For example, we can make modifications to a verb to change its tense:

Run -> Runs (Add "s" suffix to make it present tense)

Run -> Ran (Modify middle letter to "a" to make it past tense)

But in some cases, the whole word changes:

To be -> Is (Present tense)

To be -> Was (Past tense)

To be -> Will be (Future tense – addition of modal)

There can be lexical variations on nouns too:

Cat -> Cats (Plural)

Cat -> Cat's (Possessive)

Cat -> Cats' (Plural possessive)

All these words relate back to the root word cat. We can calculate the root of all the words in the sentence to reduce the whole sentence to its lexical roots:

...
lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Hands-On Natural Language Processing with PyTorch 1.x
Published in: Jul 2020Publisher: PacktISBN-13: 9781789802740

Author (1)

author image
Thomas Dop

Thomas Dop is a data scientist at MagicLab, a company that creates leading dating apps, including Bumble and Badoo. He works on a variety of areas within data science, including NLP, deep learning, computer vision, and predictive modeling. He holds an MSc in data science from the University of Amsterdam.
Read more about Thomas Dop