Packt+ | Advance your knowledge in tech

You're reading from Python Data Analysis Cookbook Clean, scrape, analyze, and visualize data with the power of Python!

Product type Paperback

Published in Jul 2016

Publisher

ISBN-13 9781785282287

Length 462 pages

Edition 1st Edition

Languages

Python

Concepts

Data Analysis

Author (1):

Ivan Idris

View More author details

Table of Contents (18) Chapters

Preface

1. Laying the Foundation for Reproducible Data Analysis FREE CHAPTER

2. Creating Attractive Data Visualizations

3. Statistical Data Analysis and Probability

4. Dealing with Data and Numerical Issues

5. Web Mining, Databases, and Big Data

6. Signal Processing and Timeseries

7. Selecting Stocks with Financial Data Analysis

8. Text Mining and Social Network Analysis

9. Ensemble Learning and Dimensionality Reduction

10. Evaluating Classifiers, Regressors, and Clusters

11. Analyzing Images

12. Parallelism and Performance

A. Glossary

B. Function Reference

C. Online Resources

D. Tips and Tricks for Command-Line and Miscellaneous Tools

Index

Graphing Anscombe's quartet

Anscombe's quartet is a classic example that illustrates why visualizing data is important. The quartet consists of four datasets with similar statistical properties. Each dataset has a series of x values and dependent y values. We will tabulate these metrics in an IPython notebook. However, if you plot the datasets, they look surprisingly different compared to each other.

How to do it...

For this recipe, you need to perform the following steps:

Start with the following imports:

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib as mpl
from dautil import report
from dautil import plotting
import numpy as np
from tabulate import tabulate

Define the following function to compute the mean, variance, and correlation of x and y within a dataset, the slope, and the intercept of a linear fit for each of the datasets:
```
df = sns.load_dataset("anscombe")

    agg = df.groupby('dataset')\
             .agg([np.mean, np.var])\
         ...
```