Multiclass Classification Strategies
Multiclass classification, like we saw previously in Chapter 6, involves predicting categories when there are more than two classes. Various strategies such as one-vs-rest (OvR), one-vs-one (OvO), and hierarchical classification help effectively tackle these problems. Using scikit-learn, we can implement and explore these approaches for text classification tasks. Here, we will apply techniques for when our classification model can have multiple labels or multiple labels per inference.
Getting ready
We'll set up our environment and dataset to implement multiclass classification strategies. This time we’ll use the webtext
corpus which, according to NLTK “includes content from a Firefox discussion forum, conversations overheard in New York, the movie script of Pirates of the Carribean, personal advertisements, and wine reviews”
Load the libraries:
import nltk from nltk.corpus import webtext
Download the NLTK
webtext
corpus...