Though it may seem counterintuitive, getting this high a score likely means that our model is not working well at all, or that it’s working especially well on the training data but won’t perform as well on new datasets. While it may seem paradoxical, though the model is trying to get as close as it can to 1, getting too close is quite suspicious. That’s because we always expect a model will be imperfect – there will always be some loss. When models perform exceedingly well with training data and get high scores, it could just mean that the model was calibrated to that data sample and that it won’t perform as well with a new data sample.
This phenomenon is calledoverfittingand it’s a big topic of conversation in data science and ML circles. The reason for this is that, fundamentally, all models are flawed and are not to be trusted until you’ve done your due diligence in selecting the best model. This game of choosing the right model, training it, and releasing it into the wild must be done under intense supervision. This is especially true if you’re charging for a product or service and attempting to win the confidence of customers who will be vouching for you and your products someday. If you’re an AI/ML product manager, you should look for good performance that gets better and better incrementally with time, and you should be highly suspicious of excellent model performance from the get-go. I’ve had an experience where model performance during training was taken for granted and it wasn’t until we had already sold a contract to a client company that we realized the model performed terribly when applied to the client’s real-world data. As a result, we had to go back to the drawing board and retrain a new model to get the performance we were looking for before deploying that model into our client’s workflows.
A quick note on neural networks: while training generative AI models will be a bit different considering the subject matter and purpose of your model, it will follow a similar process. You’re still going to put a premium on a clean and diverse data sample, you’re still going to be thoughtful about which neural network will work best for the performance you want, and you’re still going to need to account for (and optimize on) your loss function to the best of your ability. This process will continue through various loops of training and validating until you feel confident enough that your generative AI model will be able to generate new outputs based on the training examples you’ve given it. Your goal of tweaking hyperparameters for performance, minimizing loss where you can, and amassing enough data to set your model up for success remains the same as it does for other ML models.
Once you have comprehensive, representative data that you’re training your models on, and you’ve trained those models enough times and adjusted those models accordingly to get the performance you’re seeking (and promising to customers), you’re ready to move forward!