Reader small image

You're reading from  Natural Language Processing with Python Quick Start Guide

Product typeBook
Published inNov 2018
Reading LevelIntermediate
PublisherPackt
ISBN-139781789130386
Edition1st Edition
Languages
Right arrow
Author (1)
Nirant Kasliwal
Nirant Kasliwal
author image
Nirant Kasliwal

Nirant Kasliwal maintains an awesome list of NLP natural language processing resources. GitHub's machine learning collection features this as the go-to guide. Nobel Laureate Dr. Paul Romer found his programming notes on Jupyter Notebooks helpful. Nirant won the first ever NLP Google Kaggle Kernel Award. At Soroco, image segmentation and intent categorization are the challenges he works with. His state-of-the-art language modeling results are available as Hindi2vec.
Read more about Nirant Kasliwal

Right arrow

Correcting spelling

One of the most frequently seen text challenges is correcting spelling errors. This is all the more true when data is entered by casual human users, for instance, shipping addresses or similar.

Let's look at an example. We want to correct Gujrat, Gujart, and other minor misspellings to Gujarat. There are several good ways to do this, depending on your dataset and level of expertise. We will discuss two or three popular ways, and discuss their pros and cons.

Before I begin, we need to pay homage to the legendary Peter Norvig's Spell Correct. It's still worth a read on how to think about solving a problem and exploring implementations. Even the way he refactors his code and writes functions is educational.

His spell-correction module is not the simplest or best way of doing this. I recommend two packages: one with a bias toward simplicity, one...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Natural Language Processing with Python Quick Start Guide
Published in: Nov 2018Publisher: PacktISBN-13: 9781789130386

Author (1)

author image
Nirant Kasliwal

Nirant Kasliwal maintains an awesome list of NLP natural language processing resources. GitHub's machine learning collection features this as the go-to guide. Nobel Laureate Dr. Paul Romer found his programming notes on Jupyter Notebooks helpful. Nirant won the first ever NLP Google Kaggle Kernel Award. At Soroco, image segmentation and intent categorization are the challenges he works with. His state-of-the-art language modeling results are available as Hindi2vec.
Read more about Nirant Kasliwal