Performing rule-based text classification using keywords
In this recipe, we will use the keywords to classify the business and sport data. We will create a classifier with keywords that we will choose by ourselves from the frequency distributions from the previous recipe.
Getting ready
We will continue using classes from the sklearn, numpy, and nltk packages that we used in the previous recipe.
How to do it…
In this recipe, we will use hand-picked business and sport vocabulary to create a keyword classifier that we will evaluate using the same method as the dummy classifier in the previous recipe. The steps for this recipe are as follows:
- Do the necessary imports:
import numpy as np import string from sklearn import preprocessing from sklearn.metrics import classification_report from sklearn.model_selection import train_test_split from sklearn.feature_extraction.text import CountVectorizer from itertools import repeat from nltk.probability import FreqDist...