Packt+ | Advance your knowledge in tech

You're reading from Mastering SciPy

Product typeBook

Published inNov 2015

Reading LevelIntermediate

Publisher

ISBN-139781783984749

Edition1st Edition

Languages

Python

Tools

SciPy Matplotlib

Concepts

Scientific Computing

Authors (2):

Francisco Javier Blanco-Silva

Francisco Javier B Silva

View More author details

Chapter 7. Descriptive Statistics

This and the following chapter are mainly aimed at SAS, SPSS, or Minitab users, and especially those employing the languages R or S for statistical computing. We will develop an environment for working effectively in the field of data analysis, with the aid of IPython sessions powered up with the following resources from the SciPy stack:

The probability and statistics submodule of the library of symbolic computations, sympy.stats.
The two libraries of statistical functions scipy.stats and scipy.stats.mstats (the latter for data provided by masked arrays), together with the module statsmodels, for data exploration, estimation on statistical models, and performing statistical tests in a numerical setting. The package statsmodels uses, under the hood, the powerful library patsy to describe statistical models and building design matrices in Python (R or S users will find patsy compatible with their formula mini-language).
For statistical inference, we again use...

Motivation

On Tuesday, September 8, 1857, the steamboat SS Central America left Havana at 9 A.M. for New York, carrying about 600 passengers and crew members. Inside this vessel, precious cargo was stored—a set of manuscripts by John James Audubon, and three tons of gold bars and coins. The manuscripts documented an expedition through the yet uncharted southwestern United States and California, and contained 200 sketches and paintings of its wildlife. The gold, fruit of many years of prospecting and mining during the California Gold Rush, was meant to start anew the lives of many of the passengers aboard.

On the 9th, the vessel ran into a storm which developed into a hurricane. The steamboat endured four hard days at sea, and by Saturday morning the ship was doomed. The captain arranged to have women and children taken off to the brig Marine, which offered them assistance at about noon. In spite of the efforts of the remaining crew and passengers to save the ship, the inevitable happened...

Probability

In the SciPy stack, we have two means for determining probability: a symbolic setting and a numerical setting. In this brief section, we are going to compare both with a sequence of examples.

For the symbolic treatment of random variables, we employ the module sympy.stats, while for the numerical treatment, we use the module scipy.stats. In both cases, the goal is the same—the instantiation of any random variable, and the following three kinds of operations on them:

Description of the probability distribution of a random variable with numbers (parameters).
Description of a random variable in terms of functions.
Computation of associated probabilities.

Let's observe several situations through the scope of the two different settings.

Symbolic setting

Let's start with discrete random variables. For instance, let's consider several random variables used to describe the process of rolling three 6-sided dice, one 100-sided dice, and the possible outcomes:

In [1]: from sympy import var; \...

Data exploration

Data exploration is generally performed by presenting a meaningful synthesis of its distribution—it could be through a sequence of graphs, by describing it with a set of numerical parameters, or by approximating it with simple functions. Now let's explore different possibilities, and how to accomplish them with different tools in the SciPy stack.

Picturing distributions with graphs

The type of graph depends on the type of variable (categorical, quantitative, or dates).

Bar plots and pie charts

When our data is described in terms of categorical variables, we often use pie charts or bar graphs to represent it. For example, we access the Consumer Complaint Database from the Consumer Financial Protection Bureau, at http://catalog.data.gov/dataset/consumer-complaint-database. The database was created in February 2014 to contain complaints received by the Bureau about financial products and services. In its updated version in March of the same year, it consisted of almost 300,000...

Summary

This concludes the first part of our two-chapter coverage of Data Analysis, where we have explored advanced Python tools in the SciPy stack for computation, and visualization of descriptive statistics. In the next chapter, we produce a similar treatment of inference statistics, data mining, and machine learning.

The rest of the chapter is locked

You have been reading a chapter from

Mastering SciPy

Published in: Nov 2015Publisher: ISBN-13: 9781783984749

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (2)

Francisco Javier Blanco-Silva

I will always be indebted to Bradley J. Lucier and Rodrigo Bañuelos, for being a constant inspiration, for their guidance and teachings. Special thanks to my editors, Sriram Neelakantam, Bharat Patil, Nikhil Potdukhe, and Mohammad Rizvi. Many colleagues have contributed with encouragement and fruitful discussions. In particular, I would like to mention Parsa Bakhtary, Aaron Dutle, Edsel Peña, Pablo Sprechmann, Adam Taylor, and Holly Watson. But the most special thanks go without a doubt to my wife and daughter. Grace's love and smiles alone provided all the motivation, enthusiasm and skills to overcome any difficulties encountered during the pursuit of this book, and everything life threw at me ever since she was born.
Read more about Francisco Javier Blanco-Silva

Francisco Javier B Silva

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages