Generating a synthetic dataset for text classification problems
In this recipe, we will generate a synthetic dataset for a binary text classification problem. The dataset to be generated in this recipe has two primary fields: the text field containing a statement in string format and the target label that specifies whether the text is POSITIVE or NEGATIVE.
Figure 8.2 – Synthetic dataset for text classification problems
In Figure 8.2, we can see that the sentences with the POSITIVE tag have the __label__positive label while the sentences with the NEGATIVE tag have the __label__negative label. We will use this dataset to train and deploy a BlazingText model in the next recipes to solve a sentiment analysis requirement.
Getting ready
A SageMaker Studio notebook running the Python 3 (Data Science) kernel is the only prerequisite for this recipe.
How to do it…
The first steps in this recipe focus on generating a list of POSITIVE and NEGATIVE...