Chapter 3. Categorizing and Tagging Text
In corpus linguistics, text categorization or tagging into various word classes or lexical categories is considered to be the second step in NLP pipeline after tokenization. We have all studied parts of speech in our elementary classes; we were familiarized with nouns, pronouns, verbs, adjectives, and their utility in English grammar. These word classes are not just the salient pillars of grammar, but also quite pivotal in many language processing activities. The process of categorizing and labeling words into different parts of speeches is known as parts of speech tagging or simply tagging.
The goal of this chapter is to equip you with the tools and the associated knowledge about different tagging, chunking, and entailment approaches and their usage in natural language processing.
Earlier chapters focused on basic text processing; this chapter improvises on those concepts to explain the different approaches of tagging texts into lexical categories...