Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Natural Language Processing with Python Quick Start Guide

You're reading from  Natural Language Processing with Python Quick Start Guide

Product type Book
Published in Nov 2018
Publisher Packt
ISBN-13 9781789130386
Pages 182 pages
Edition 1st Edition
Languages
Author (1):
Nirant Kasliwal Nirant Kasliwal
Profile icon Nirant Kasliwal

Modern Methods for Classification

We now know how to convert text strings to numerical vectors that capture some meaning. In this chapter, we will look at how to use those with embedding. Embedding is the more frequently used term for word vectors and numerical representations.

In this chapter, we are still following the broad outline from our first, that is, text→ representations → models evaluation deployment.

We will continue working with text classification as our example task. This is mainly because it's a simple task for demonstration, but we can also extend almost all of the ideas in this book to solve other problems. The main focus ahead, however, is machine learning for text classification.

To sum up, in this chapter we will be looking at the following topics:

  • Sentiment analysis as a specific class and example of text classification...

Machine learning for text

There are at least 10 to 20 machine learning techniques that are well known in the community, ranging from SVMs to several regressions and gradient boosting machines. We will select a small taste of these.

Source: https://www.kaggle.com/surveys/2017.

The preceding graph shows the most popular machine learning techniques used by Kagglers.

We met Logistic Regression in the first chapter while working the 20 newsgroups dataset. We will revisit Logistic Regression and introduce Naive Bayes, SVM, Decision Trees, Random Forests, and XgBoost. XgBoost is a popular algorithm used by several Kaggle winners to achieve award-winning results. We will use the scikit-learn and XGBoost packages in Python to see the previous example in code.

Sentiment analysis as text classification...

Summary

In this chapter, we looked at several new ideas regarding machine learning. The intention here was to demonstrate some of the most common classifiers. We looked at how to use them with one thematic idea: translating text to a numerical representation and then feeding that to a classifier.

This chapter covered a fraction of the available possibilities. Remember, you can try anything from better feature extraction using Tfidf to tuning classifiers with GridSearch and RandomizedSearch, as well as ensembling several classifiers.

This chapter was mostly focused on pre-deep learning methods for both feature extraction and classification.

Note that deep learning methods also allow us to use a single model where the feature extraction and classification are both learned from the underlying data distribution. While a lot has been written about deep learning in computer vision...

lock icon The rest of the chapter is locked
You have been reading a chapter from
Natural Language Processing with Python Quick Start Guide
Published in: Nov 2018 Publisher: Packt ISBN-13: 9781789130386
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}