Text analytics has many practical applications and is one of the most important areas of application of machine learning. Automatic e-mail filters, news article clustering and categorization, and sentimental analysis on social media posts about products are some of the most widely implemented use cases of text analytics. One of the major challenges in text analytics is feature extraction. Representation of documents is the most critical part of a text analytics project. In the coming sections, we are going to discuss one of the most-used forms of representation of text.
The representation of a set of documents as vectors in a common vector space is known as the vector space model (VSM), and it is fundamental to a host of information retrieval operations, such as scoring documents on a query, document classification, and document clustering. The VSM is a common way of vectorizing text documents.
In the vector space model, each unique word present in the set...