WHAT IS DRIFT?
In machine learning terms, drift refers to any type of change in distribution over a period of time. Model drift refers to a change (drift) in the accuracy of a model’s prediction, whereas data drift refers to a change in the type of data that is collected. Note that data drift is also called input drift, feature drift, or covariate drift.
There are several factors that influence the value of data, such as accuracy, relevance, and age. For example, physical stores that sell mobile phones are much more likely to sell recent phone models than older models. In some cases, data drift occurs over a period of time, and in other cases it’s because some data is no longer relevant due to feature-related changes in an application. Always keep in mind that there might be multiple factors that can influence data drift in a specific dataset.
Two techniques for handling data drift are domain classifier and the black-box shift detector, both of which are discussed here:
https://blog.dataiku.com/towards-reliable-mlops-with-drift-detectors
In addition to the preceding types of drift, other types of changes can occur in a data set, some of which are listed below:
• concept shift
• covariate shift
• domain shift
• prior probability shift
• spurious correlation shift
• subpopulation shift
• time shift
Perform an online search to find more information about the topics in the preceding list of bullet items. Finally, the following list contains links to open-source Python-based tools that provide drift detection:
• alibi-detect (https://github.com/SeldonIO/alibi-detect)
• evidently (https://github.com/evidentlyai/evidently)