Creating binary variables through one-hot encoding
One-hot encoding is a method used to represent categorical data, where each category is represented by a binary variable. The binary variable takes a value of 1 if the category is present, or 0 otherwise.
The following table shows the one-hot encoded representation of the Smoker variable with the categories of Smoker and Non-Smoker:
Figure 2.1 – One-hot encoded representation of the Smoker variable
As shown in Figure 2.1, from the Smoker variable, we can derive a binary variable for Smoker, which shows the value of 1 for smokers, or the binary variable for Non-Smoker, which takes the value of 1 for those who do not smoke.
For the Color categorical variable with the values of red, blue, and green, we can create three variables called red, blue, and green. These variables will be assigned a value of 1 if the observation corresponds to the respective color, and 0 if it does not.
A categorical...