SPLIT VALIDATION
A crucial part of machine learning is partitioning the data into two separate sets using a technique called split validation. The first set is called the training data and is used to build the prediction model. The second set is called the test data and is kept in reserve and used to assess the accuracy of the model developed from the training data. The training and test data is typically split 70/30 or 80/20 with the training data representing the larger portion. Once the model has been optimized and validated against the test data for accuracy, it’s ready to generate predictions using new input data.
Although the model is used on both the training and test sets, it’s from the training data alone that the model is built. The test data is used as input to form predictions and assess the model’s accuracy, but it is never decoded and should not be used to create the model. As the test data cannot be used to build and optimize the model, data...