Data Preprocessing
To fit models to the data, it must be represented in numerical format since the mathematics used in all machine learning algorithms only works on matrices of numbers (you cannot perform linear algebra on an image). This will be one goal of this section: to learn how to encode all the features into numerical representations. For example, in binary text, values that contain one of two possible values may be represented as zeros or ones. An example of this can be seen in the following diagram. Since there are only two possible values, the value 0 is assumed to be a cat and the value 1 is assumed to be a dog.
We can also rename the column for interpretation:
Figure 1.6: A numerical encoding of binary text values
Another goal will be to appropriately represent the data in numerical format - by appropriately, we mean that we want to encode relevant information numerically through the distribution of numbers. For example, one method to encode...