The Statistics and Calculus with Python Workshop

By Peter Farrell , Alvaro Fuentes , Ajinkya Sudhir Kolhe and 3 more
    What do you get with a Packt Subscription?

  • Instant access to this title and 7,500+ eBooks & Videos
  • Constantly updated with 100+ new titles each month
  • Breadth and depth in over 1,000+ technologies
  1. Free Chapter
    2. Python's Main Tools for Statistics
About this book
Are you looking to start developing artificial intelligence applications? Do you need a refresher on key mathematical concepts? Full of engaging practical exercises, The Statistics and Calculus with Python Workshop will show you how to apply your understanding of advanced mathematics in the context of Python. The book begins by giving you a high-level overview of the libraries you'll use while performing statistics with Python. As you progress, you'll perform various mathematical tasks using the Python programming language, such as solving algebraic functions with Python starting with basic functions, and then working through transformations and solving equations. Later chapters in the book will cover statistics and calculus concepts and how to use them to solve problems and gain useful insights. Finally, you'll study differential equations with an emphasis on numerical methods and learn about algorithms that directly calculate values of functions. By the end of this book, you’ll have learned how to apply essential statistics and calculus concepts to develop robust Python applications that solve business challenges.
Publication date:
August 2020
Publisher
Packt
Pages
740
ISBN
9781800209763

 

2. Python's Main Tools for Statistics

Overview

This chapter presents a practical introduction to the main libraries that most statistics practitioners use in Python. It will cover some of the most important and useful concepts, functions, and Application Programming Interfaces (APIs) of each of the key libraries. Almost all of the computational tools that will be needed for the rest of this book will be introduced in this chapter.

By the end of this chapter, you will understand the idea behind array vectorization of the NumPy library and be able to use its sampling functionalities. You'll be able to initialize pandas DataFrames to represent tabular data and manipulate their content. You'll also understand the importance of data visualization in data analysis and be able to utilize Python's two most popular visualization libraries: Matplotlib and Seaborn.

 

Introduction

After going through a refresher on the Python language in the previous chapter, we are now ready to tackle the main topics of this book: mathematics and statistics.

Among others, the general fields of computational mathematics and statistics can be broken up into three main tool-centric components: representation and engineering; analysis and computation; and finally, visualization. In the ecosystem of the Python programming language, specific libraries are dedicated to each of these components (namely, pandas, NumPy, Matplotlib, and Seaborn), making the process modular.

While there might be other similar packages and tools, the libraries that we will be discussing have been proven to possess a wide range of functionalities and support powerful options in terms of computation, data processing, and visualization, making them some of a Python programmer's preferred tools over the years.

In this chapter, we will be introduced to each of these libraries and learn...

 

Scientific Computing and NumPy Basics

The term scientific computing has been used several times in this workshop so far; in the broadest sense of the term, it denotes the process of using computer programs (or anything with computing capabilities) to model and solve a specific problem in mathematics, engineering, or science. Examples may include mathematical models to look for and analyze patterns and trends in biological and social data, or machine learning models to make future predictions using economic data. As you may have already noticed, this definition has a significant overlap with the general fields of data science, and sometimes the terms are even used interchangeably.

The main workhorse of many (if not most) scientific computing projects in Python is the NumPy library. Since NumPy is an external library that does not come preinstalled with Python, we need to download and install it. As you may already know, installing external libraries and packages in Python can be...

 

Working with Tabular Data in pandas

If NumPy is used on matrix data and linear algebraic operations, pandas is designed to work with data in the form of tables. Just like NumPy, pandas can be installed in your Python environment using the pip package manager:

$ pip install pandas

If you are using Anaconda, you can download it using the following command:

$ conda install pandas

Once the installation process completes, fire off a Python interpreter and try importing the library:

>>> import pandas as pd

If this command runs without any error message, then you have successfully installed pandas. With that, let's move on with our discussions, beginning with the most commonly used data structure in pandas, DataFrame, which can represent table data: two-dimensional data with row and column labels. This is to be contrasted with NumPy arrays, which can take on any dimension but do not support labeling.

Initializing a DataFrame Object

There are multiple...

 

Data Visualization with Matplotlib and Seaborn

Data visualization is undoubtedly an integral part of any data pipeline. Good visualizations can not only help scientists and researchers find unique insights about their data, but also help convey complex, advanced ideas in an intuitive, easy to understand way. In Python, the backend of most of the data visualization tools is connected to the Matplotlib library, which offers an incredibly wide range of options and functionalities, as we will see in this upcoming discussion.

First, to install Matplotlib, simply run either of the following commands, depending on which one is your Python package manager:

$ pip install matplotlib
$ conda install matplotlib

The convention in Python is to import the pyplot package from the Matplotlib library, like so:

>>> import matplotlib.pyplot as plt

This pyplot package, whose alias is now plt, is the main workhorse for any visualization functionality in Python and will therefore...

 

Summary

This chapter went through the core tools for data science and statistical computing in Python, namely, NumPy for linear algebra and computation, pandas for tabular data processing, and Matplotlib and Seaborn for visualization. These tools will be used extensively in later chapters of this book, and they will prove useful in your future projects. In the next chapter, we will go into the specifics of a number of statistical concepts that we will be using throughout this book and learn how to implement them in Python.

About the Authors
  • Peter Farrell

    Peter Farrell learned to program from the Logo code in Seymour Paperts Mindstorms. A student introduced him to Python and he never looked back. In 2015, he self-published Hacking Math Class with Python on applying Python programming to learning and teaching high-school math. In 2019, No Starch Press published his second book, Math Adventures with Python. In his books, he also presents 21st-century topics, such as Cellular Automata, 3D Graphics, and Genetic Algorithms. Currently, he teaches Python and Math in the Dallas, Texas area.

    Browse publications by this author
  • Alvaro Fuentes

    Alvaro Fuentes is a senior data scientist with a background in applied mathematics and economics. He has more than 14 years of experience in various analytical roles and is an analytics consultant at one of the ‘Big Three’ global management consulting firms, leading advanced analytics projects in different industries like banking, technology, and consumer goods. Alvaro is also an author and trainer in analytics and data science and has published courses and books, such as 'Become a Python Data Analyst' and 'Hands-On Predictive Analytics with Python'. He has also taught data science and related topics to thousands of students both on-site and online through different platforms such as Springboard, Simplilearn, Udemy, and BSG Institute, among others.

    Browse publications by this author
  • Ajinkya Sudhir Kolhe

    Ajinkya Sudhir Kolhe is a programmer working for a tech company in the Bay area. He holds a M.S. in Computer Science and has experience in the tech industry of 5+ years. His area of interests include problem solving, analytics and applications in Python.

    Browse publications by this author
  • Quan Nguyen

    Quan Nguyen is a Python programmer and machine learning enthusiast. He is interested in solving decision-making problems under uncertainty. Quan has authored several books on Python programming and scientific computing. He is currently pursuing a Ph.D. degree in computer science at Washington University in St. Louis, researching Bayesian methods in machine learning.

    Browse publications by this author
  • Alexander Joseph Sarver

    Alexander Joseph Sarver is an ambitious data scientist and content creator with 6 years of mathematical teaching experience.

    Browse publications by this author
  • Marios Tsatsos

    Marios Tsatsos has 8+ years of experience in research in Physics, analytical thinking, modeling, problem solving and decision making.

    Browse publications by this author
The Statistics and Calculus with Python Workshop
Unlock this book and the full library FREE for 7 days
Start now