Training knowledge
As discussed in Chapter 1, training knowledge refers to information that is inherently stored in the model through its training data. Every LLM begins with a vast repository of inherent knowledge derived from the massive datasets (typically, a large corpus of internet text) on which it was initially trained. The benefit of an LLM having internal knowledge is that the knowledge itself has the following advantages:
- It is quickly retrievable: Since the inherent knowledge is “baked” into the model weights, the model can retrieve the information very quickly and is typically limited only by the LLM’s compute speed
- It has a wide coverage: Since the training data is vast (the corpus of the internet), the inherent knowledge can cover lots of topics in fairly great detail
The process of changing the model’s inherent knowledge is called fine-tuning. Unlike prompting or retrieval-based techniques that guide a model’...