You're reading from The Statistics and Calculus with Python Workshop

Product typeBook

Published inAug 2020

Reading LevelBeginner

PublisherPackt

ISBN-139781800209763

Edition1st Edition

Languages

Python

Concepts

Statistics

Authors (6):

Peter Farrell

Alvaro Fuentes

Ajinkya Sudhir Kolhe

Quan Nguyen

Alexander Joseph Sarver

Marios Tsatsos

View More author details

7. Doing Basic Statistics with Python

Overview

In this chapter, we'll learn how to use the main descriptive statistics metrics and also produce and understand the main visualizations used in Exploratory Data Analysis.

By the end of this chapter, you will be able to load and prepare a dataset for basic statistical analysis, calculate and use the main descriptive statistics metrics, use descriptive statistics to understand numerical and categorical variables, and use visualizations to study relationships between variables.

Introduction

Python and its analytical libraries, such as pandas and Matplotlib, make it very easy to perform both simple and complex statistical calculations on many types of datasets. This chapter introduces the first steps for any statistical analysis: defining and understanding the problem, loading and preparing the dataset, and after that, understanding the variables individually and exploring some relationships between them.

This chapter consists of three sections: in the first section, we introduce the dataset we will be using in this chapter along with a hypothetical (yet very realistic) business problem. Then we load the dataset and perform many of the common tasks of data preparation, including changing variable types and filtering for useful observations. With the dataset ready, the second section presents a brief conceptual introduction to the main metrics of descriptive statistics, then this knowledge is immediately applied to the dataset we are working with. As part...

Data Preparation

All applied statistics starts with a dataset and a problem to solve. In the real world, we never do statistical analysis in a vacuum; there is always a business problem to solve, a topic that needs to be quantitatively understood, or a scientific question to ask. Understanding the problem is always the very first step of any statistical analysis. The second step is to collect and prepare the data. Data collection is not a topic of this book, so we will go directly into data preparation. Therefore, before diving into doing some statistical calculations, we need to make sure we understand our business problem and that we have prepared our dataset.

Introducing the Dataset

In this subsection, we will introduce the dataset we will use in this chapter and perform some basic data preparation tasks. Knowing the dataset will give you a bit more context when we define the business problem.

We are going to use the strategy games dataset, which contains real-world information...

Calculating and Using Descriptive Statistics

Descriptive statistics is a set of methods that we use to summarize the information of a set of measurements (data), which helps us to make sense of it. In this section, we will first explain the need for descriptive statistics. After that, we will introduce the most common metrics of descriptive statistics, including mean, median, and standard deviation. First, we will understand them at a conceptual level using a simple set of measurements, and then we will apply what we have learned about them to the dataset we prepared in the previous section.

The Need for Descriptive Statistics

Why do we need descriptive statistics? Here is an example that will show you why we need these types of analytical tools: our brains are very good at a wide variety of tasks, such as recognizing the emotion expressed in a human face. Try to notice how much effort your brain puts into reading the emotion of the following face:

Figure...

Exploratory Data Analysis

In this section, we will be referring back to the business problem that we performed some initial analysis on in the first section of this chapter, which is as follows:

The CEO of the game development company you work for has come up with a plan to strengthen the position of the company in the gaming market. From his industry knowledge and other business reports, he knows that a very effective way to attract new customers is to develop a great reputation in the mobile game space. Given this fact, he has the following plan: develop a strategy game for the iOS platform that will get a lot of positive attention, which in turn will bring a large number of new customers to the company. He is sure his plan will work if and only if the game gets great ratings from users. Since he is new in the mobile game space, he asks you for your help to answer the following question: What types of strategy games have great user ratings?

In this section, we will do some...

Summary

In this chapter, we learned about the first steps toward performing any kind of statistical analysis: first, we defined our business problem and introduced the dataset. Based on the problem we wanted to solve, we prepared the dataset accordingly: we deleted some records, imputed missing values, transformed the types of some variables, and created new ones. Then we learned about the need for descriptive statistics; we learned how easy it is to calculate them using pandas and how to use and interpret those calculations. In the final section, we learned about how we can combine visualizations with descriptive statistics to get a deeper understanding of the relationships between variables in our datasets. What we learned in this chapter are concepts and techniques that you will be able to put in practice in any data analysis you perform. However, to get more sophisticated in your analysis, you need to have a good grasp of the basics of probability theory, which is the subject of...

The rest of the chapter is locked

You have been reading a chapter from

The Statistics and Calculus with Python Workshop

Published in: Aug 2020Publisher: PacktISBN-13: 9781800209763

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (6)

Peter Farrell

Peter Farrell learned to program from the Logo code in Seymour Paperts Mindstorms. A student introduced him to Python and he never looked back. In 2015, he self-published Hacking Math Class with Python on applying Python programming to learning and teaching high-school math. In 2019, No Starch Press published his second book, Math Adventures with Python. In his books, he also presents 21st-century topics, such as Cellular Automata, 3D Graphics, and Genetic Algorithms. Currently, he teaches Python and Math in the Dallas, Texas area.
Read more about Peter Farrell

Alvaro Fuentes

Alvaro Fuentes is a senior data scientist with a background in applied mathematics and economics. He has more than 14 years of experience in various analytical roles and is an analytics consultant at one of the ‘Big Three' global management consulting firms, leading advanced analytics projects in different industries like banking, technology, and consumer goods. Alvaro is also an author and trainer in analytics and data science and has published courses and books, such as 'Become a Python Data Analyst' and 'Hands-On Predictive Analytics with Python'. He has also taught data science and related topics to thousands of students both on-site and online through different platforms such as Springboard, Simplilearn, Udemy, and BSG Institute, among others.
Read more about Alvaro Fuentes

Ajinkya Sudhir Kolhe

Ajinkya Sudhir Kolhe is a programmer working for a tech company in the Bay area. He holds a M.S. in Computer Science and has experience in the tech industry of 5+ years. His area of interests include problem solving, analytics and applications in Python.
Read more about Ajinkya Sudhir Kolhe

Quan Nguyen

Quan Nguyen, the author of the first edition of this book, is a Python programmer with a strong passion for machine learning. He holds a dual degree in mathematics and computer science, with a minor in philosophy, earned from DePauw University. Quan is deeply involved in the Python community and has authored multiple Python books, contributing to the Python Software Foundation and regularly sharing insights on DataScience portal. He is currently pursuing a Ph.D. in computer science at Washington University in St. Louis.
Read more about Quan Nguyen

Alexander Joseph Sarver

Alexander Joseph Sarver is an ambitious data scientist and content creator with 6 years of mathematical teaching experience.
Read more about Alexander Joseph Sarver

Marios Tsatsos

Marios Tsatsos has 8+ years of experience in research in Physics, analytical thinking, modeling, problem solving and decision making.
Read more about Marios Tsatsos

Other recommended products

Related to this chapter

Become a Python Data Analyst

Become a Python Data Analyst book introduces you to the mainstream libraries of Python’s Data Science stack. With proven examples and real-world datasets, this book teaches how to effectively perform data manipulation, visualize and analyze data patterns and brings you to the ladder of advanced topics like Predictive Analytics.

BookAug 2018178 pages

Statistics Crash Course for Beginners

Through both theoretical and practical study with Python, this course will get you up to speed with all you need to know about statistics in programming—a core study of machine learning.

BookMar 2021329 pages

Practical Discrete Mathematics

Discrete math deals with studying finite and distinct elements. With this book, you’ll learn the discrete math language and methods crucial to studying and describing objects and functions in computer science. You'll also focus on the mathematics of machine learning and computer science and prepare to understand real-world algorithm development.

BookFeb 2021330 pages

Applying Math with Python

Python has a number of powerful packages to help anyone tackle complex mathematical problems in a simple and efficient way. This practical guide explains how to model real-world problems as mathematical objects in Python and how to perform computations, and interpret results. It explores Python lang to solve a variety of math and statistics problems.

BookJul 2020358 pages

SciPy Recipes

The SciPy stack is a popular Python ecosystem used for mathematical and scientific computing tasks. Learn how you can put to use the various functionalities offered by the SciPy stack in the most efficient way possible. With the help of this book, you will solve real-world problems in linear algebra, numerical analysis, visualization, and more.

BookDec 2017386 pages

Essential Statistics for Non-STEM Data Analysts

Put your data science knowledge to work with this practical guide to statistics. You’ll understand the working mechanism of each method used and find out how data science algorithms function. This book will help you learn the statistical techniques required for key model building and functioning using Python.

BookNov 2020392 pages

Hands-On Mathematics for Deep Learning

The main aim of this book is to make the advanced mathematical background accessible to someone with a programming background. This book will equip the readers with not only deep learning architectures but the mathematics behind them. With this book, you will understand the relevant mathematics that goes behind building deep learning models.

BookJun 2020364 pages

Statistical Application Development with R and Python

Statistical Analysis involves collecting and examining data to describe the nature of data that needs to be analyzed. It helps you explore the relation of data and build models to make better decisions. You will begin with a brief understanding of the nature of data and end with modern and advanced statistical models like CART. Every step is taken with DATA and R code, and further enhanced by Python. By the end of this book you will be able to apply your statistical learning in major domains at work or in your projects.

BookAug 2017432 pages

Hands-On Simulation Modeling with Python

Developers working with the simulation models will be able to put their knowledge to work with this practical guide. You will work with real-world data to uncover various patterns used in complex systems using Python. The book provides a hands-on approach to implementation and associated methodologies to improve or optimize systems.

BookJul 2020346 pages

Hands-On Exploratory Data Analysis with Python

This book provides practical knowledge about the main pillars of EDA including data cleaning, data preparation, data exploration, and data visualization. You can leverage the power of Python to understand, summarize and investigate your data in the best way possible. The book presents a unique approach to exploring hidden features in your data.

BookMar 2020352 pages

Data Analysis with R

R has spread deep into the private sector and can be found in the production pipelines at some of the most advanced and successful enterprises. Starting with the basics of R and statistical reasoning, this book dives into advanced predictive analytics, showing how to apply those techniques to real-world data though with real-world examples.

BookMar 2018570 pages

Mastering pandas

pandas is a popular Python library used by data scientists and analysts worldwide to manipulate and analyze their data. This book presents useful techniques and real-world examples on getting the most out of pandas for expert-level data manipulation, analysis and visualization.

BookOct 2019674 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages