Technical requirements
The code for this chapter is located at https://github.com/PacktPublishing/Python-Natural-Language-Processing-Cookbook-Second-Edition/tree/main/Chapter03. Packages that are required for this chapter should be installed automatically via the poetry environment.
In addition, we will use models and datasets located at the following URLs. The Google word2vec model is a model that represents words as vectors, and the IMDB dataset contains movie titles, genres, and descriptions. Download them into the data folder inside the root directory:
- The Google
word2vecmodel: https://drive.google.com/file/d/0B7XkCwpI5KDYNlNUTTlSS21pQmM/edit?resourcekey=0-wjGZdNAUop6WykTtMip30g - The IMDB movie dataset: https://github.com/venusanvi/imdb-movies/blob/main/IMDB-Movie-Data.csv (also available in the book’s GitHub repo)
In addition to the preceding files, we will use various functions from a simple classifier that we will create in the first recipe. This...