Reader small image

You're reading from  Natural Language Processing with Python Quick Start Guide

Product typeBook
Published inNov 2018
Reading LevelIntermediate
PublisherPackt
ISBN-139781789130386
Edition1st Edition
Languages
Right arrow
Author (1)
Nirant Kasliwal
Nirant Kasliwal
author image
Nirant Kasliwal

Nirant Kasliwal maintains an awesome list of NLP natural language processing resources. GitHub's machine learning collection features this as the go-to guide. Nobel Laureate Dr. Paul Romer found his programming notes on Jupyter Notebooks helpful. Nirant won the first ever NLP Google Kaggle Kernel Award. At Soroco, image segmentation and intent categorization are the challenges he works with. His state-of-the-art language modeling results are available as Hindi2vec.
Read more about Nirant Kasliwal

Right arrow

Bread and butter – most common tasks

There are several well-known text cleaning ideas. They have all made their way into the most popular tools today such as NLTK, Stanford CoreNLP, and spaCy. I like spaCy for two main reasons:

  • It's an industry-grade NLP, unlike NLTK, which is mainly meant for teaching.
  • It has good speed-to-performance trade-off. spaCy is written in Cython, which gives it C-like performance with Python code.

spaCy is actively maintained and developed, and incorporates the best methods available for most challenges.

By the end of this section, you will be able to do the following:

  • Understand tokenization and do it manually yourself using spaCy
  • Understand why stop word removal and case standardization works, with spaCy examples
  • Differentiate between stemming and lemmatization, with spaCy lemmatization examples
...
lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Natural Language Processing with Python Quick Start Guide
Published in: Nov 2018Publisher: PacktISBN-13: 9781789130386

Author (1)

author image
Nirant Kasliwal

Nirant Kasliwal maintains an awesome list of NLP natural language processing resources. GitHub's machine learning collection features this as the go-to guide. Nobel Laureate Dr. Paul Romer found his programming notes on Jupyter Notebooks helpful. Nirant won the first ever NLP Google Kaggle Kernel Award. At Soroco, image segmentation and intent categorization are the challenges he works with. His state-of-the-art language modeling results are available as Hindi2vec.
Read more about Nirant Kasliwal