Modern Methods for Classification

We now know how to convert text strings to numerical vectors that capture some meaning. In this chapter, we will look at how to use those with embedding. Embedding is the more frequently used term for word vectors and numerical representations.

In this chapter, we are still following the broad outline from our first, that is, text→ representations → models→ evaluation → deployment.

We will continue working with text classification as our example task. This is mainly because it's a simple task for demonstration, but we can also extend almost all of the ideas in this book to solve other problems. The main focus ahead, however, is machine learning for text classification.

To sum up, in this chapter we will be looking at the following topics:

Sentiment analysis as a specific class and example of text classification...

Machine learning for text

There are at least 10 to 20 machine learning techniques that are well known in the community, ranging from SVMs to several regressions and gradient boosting machines. We will select a small taste of these.

Source: https://www.kaggle.com/surveys/2017.

The preceding graph shows the most popular machine learning techniques used by Kagglers.

We met Logistic Regression in the first chapter while working the 20 newsgroups dataset. We will revisit Logistic Regression and introduce Naive Bayes, SVM, Decision Trees, Random Forests, and XgBoost. XgBoost is a popular algorithm used by several Kaggle winners to achieve award-winning results. We will use the scikit-learn and XGBoost packages in Python to see the previous example in code.

Sentiment analysis as text classification...

Summary

In this chapter, we looked at several new ideas regarding machine learning. The intention here was to demonstrate some of the most common classifiers. We looked at how to use them with one thematic idea: translating text to a numerical representation and then feeding that to a classifier.

This chapter covered a fraction of the available possibilities. Remember, you can try anything from better feature extraction using Tfidf to tuning classifiers with GridSearch and RandomizedSearch, as well as ensembling several classifiers.

This chapter was mostly focused on pre-deep learning methods for both feature extraction and classification.

Note that deep learning methods also allow us to use a single model where the feature extraction and classification are both learned from the underlying data distribution. While a lot has been written about deep learning in computer vision...