Classifier accuracy
Now we need to test our classifier with a bigger test set. In this case, we will randomly select 100 subjects; 50 spam and 50 not spam. Finally, we will count how many times the classifier chose the correct category:
with open("test.csv") as f:
correct = 0
tests = csv.reader(f)
for subject in test:
clase = classifier(subject[0],w,c,t,tw)
if clase[1] =subject[1]:
correct += 1
print("Efficiency : {0} of 100".format(correct))In this case, the efficiency is 82 percent:
>>> Efficiency: 82 of 100
Tip
We can find out of the box implementations of Naïve Bayes classifier such as the NaiveBayesClassifier function in the NLTK package for Python. NLTK provides a very powerful natural language toolkit and we can download it from http://nltk.org/.
In Chapter 11, Sentiment Analysis of Twitter Data, we present a more sophisticated version of Naïve Bayes classifier to perform a sentiment analysis.
In this case, we will find an optimal-size threshold for the...