Packt+ | Advance your knowledge in tech

You're reading from Data Literacy With Python A Comprehensive Guide to Understanding and Analyzing Data with Python

Product type Paperback

Published in Jul 2024

Publisher Mercury_Learning

ISBN-13 9781836640097

Length 271 pages

Edition 1st Edition

Languages

Python

Tools

Matplotlib

Concepts

Data Analysis

Authors (2):

Mercury Learning and Information

Oswald Campesato

View More author details

Table of Contents (9) Chapters

Preface

1. Chapter 1: Working With Data

2. Chapter 2: Outlier and Anomaly Detection FREE CHAPTER

3. Chapter 3: Cleaning Datasets

4. Chapter 4: Introduction to Statistics

5. Chapter 5: Matplotlib and Seaborn

6. Index

Appendix A: Introduction to Python

1. Appendix B: Introduction to Pandas

WHAT IS DRIFT?

In machine learning terms, drift refers to any type of change in distribution over a period of time. Model drift refers to a change (drift) in the accuracy of a model’s prediction, whereas data drift refers to a change in the type of data that is collected. Note that data drift is also called input drift, feature drift, or covariate drift.

There are several factors that influence the value of data, such as accuracy, relevance, and age. For example, physical stores that sell mobile phones are much more likely to sell recent phone models than older models. In some cases, data drift occurs over a period of time, and in other cases it’s because some data is no longer relevant due to feature-related changes in an application. Always keep in mind that there might be multiple factors that can influence data drift in a specific dataset.

Two techniques for handling data drift are domain classifier and the black-box shift detector, both of which are discussed here:

https://blog.dataiku.com/towards-reliable-mlops-with-drift-detectors

In addition to the preceding types of drift, other types of changes can occur in a data set, some of which are listed below:

• concept shift

• covariate shift

• domain shift

• prior probability shift

• spurious correlation shift

• subpopulation shift

• time shift

Perform an online search to find more information about the topics in the preceding list of bullet items. Finally, the following list contains links to open-source Python-based tools that provide drift detection:

• alibi-detect (https://github.com/SeldonIO/alibi-detect)

• evidently (https://github.com/evidentlyai/evidently)