CrossValidation and hyperparameter tuning
We will be looking at one example each of CrossValidation and hyperparameter tuning. Let's take a look at CrossValidation.
CrossValidation
As stated before, we've used the default parameters of the machine learning algorithm and we don't know if they are a good choice. In addition, instead of simply splitting your data into training and testing, or training, testing, and validation sets, CrossValidation might be a better choice because it makes sure that eventually all the data is seen by the machine learning algorithm.
Note
CrossValidation basically splits your complete available training data into a number of k folds. This parameter k can be specified. Then, the whole Pipeline is run once for every fold and one machine learning model is trained for each fold. Finally, the different machine learning models obtained are joined. This is done by a voting scheme for classifiers or by averaging for regression.
The following figure illustrates ten-fold CrossValidation...