Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Applying Math with Python - Second Edition

You're reading from  Applying Math with Python - Second Edition

Product type Book
Published in Dec 2022
Publisher Packt
ISBN-13 9781804618370
Pages 376 pages
Edition 2nd Edition
Languages
Concepts
Author (1):
Sam Morley Sam Morley
Profile icon Sam Morley

Table of Contents (13) Chapters

Preface Chapter 1: An Introduction to Basic Packages, Functions, and Concepts Chapter 2: Mathematical Plotting with Matplotlib Chapter 3: Calculus and Differential Equations Chapter 4: Working with Randomness and Probability Chapter 5: Working with Trees and Networks Chapter 6: Working with Data and Statistics Chapter 7: Using Regression and Forecasting Chapter 8: Geometric Problems Chapter 9: Finding Optimal Solutions Chapter 10: Improving Your Productivity Index Other Books You May Enjoy

Working with Randomness and Probability

In this chapter, we will discuss randomness and probability. We will start by briefly exploring the fundamentals of probability by selecting elements from a set of data. Then, we will learn how to generate (pseudo) random numbers using Python and NumPy, and how to generate samples according to a specific probability distribution. We will conclude the chapter by looking at a number of advanced topics covering random processes and Bayesian techniques and using Markov Chain Monte Carlo (MCMC) methods to estimate the parameters of a simple model.

Probability is a quantification of the likelihood of a specific event occurring. We use probabilities intuitively all of the time, although sometimes the formal theory can be quite counterintuitive. Probability theory aims to describe the behavior of random variables whose value is not known, but where the probabilities of the value of this random variable take some (range of) values that are known. These...

Technical requirements

For this chapter, we require the standard scientific Python packages: NumPy, Matplotlib, and SciPy. We will also require the PyMC package for the final recipe. You can install this using your favorite package manager, such as pip:

python3.10 -m pip install pymc

This command will install the most recent version of PyMC, which, at the time of writing, is 4.0.1. This package provides facilities for probabilistic programming, which involves performing many calculations driven by randomly generated data to understand the likely distribution of a solution to a problem.

Note

In the previous edition, the current version of PyMC was 3.9.2, but since then, PyMC version 4.0 was released and the name reverted to PyMC with this update rather than PyMC3.

The code for this chapter can be found in the Chapter 04 folder of the GitHub repository at https://github.com/PacktPublishing/Applying-Math-with-Python-2nd-Edition/tree/main/Chapter%2004.

Selecting items at random

At the core of probability and randomness is the idea of selecting an item from some kind of collection. As we know, the probability of selecting an item from a collection quantifies the likelihood of that item being selected. Randomness describes the selection of items from a collection according to probabilities without any additional bias. The opposite of a random selection might be described as a deterministic selection. In general, it is very difficult to replicate a purely random process using a computer because computers and their processing are inherently deterministic. However, we can generate sequences of pseudorandom numbers that, when properly constructed, demonstrate a reasonable approximation of randomness.

In this recipe, we will select items from a collection and learn about some of the key terminology associated with probability and randomness that we will need throughout this chapter.

Getting ready

The Python Standard Library contains...

Generating random data

Many tasks involve generating large quantities of random numbers, which, in their most basic form, are either integers or floating-point numbers (double-precision) lying within the range . Ideally, these numbers should be selected uniformly, so that if we draw a large number of these numbers, they are distributed roughly evenly across the range .

In this recipe, we will see how to generate large quantities of random integers and floating-point numbers using NumPy, and show the distribution of these numbers using a histogram.

Getting ready

Before we start, we need to import the default_rng routine from the NumPy random module and create an instance of the default random number generator to use in the recipe:

from numpy.random import default_rng
rng = default_rng(12345) # changing seed for reproducibility

We have discussed this process in the Selecting items at random recipe.

We also import the Matplotlib pyplot module under the plt alias.

...

Changing the random number generator

The random module in NumPy provides several alternatives to the default PRNG, which uses a 128-bit permutation congruential generator. While this is a good general-purpose random number generator, it might not be sufficient for your particular needs. For example, this algorithm is very different from the one used in Python’s internal random number generator. We will follow the guidelines for best practice set out in the NumPy documentation for running repeatable but suitably random simulations.

In this recipe, we will show you how to change to an alternative PRNG and how to use seeds effectively in your programs.

Getting ready

As usual, we import NumPy under the np alias. Since we will be using multiple items from the random package, we import that module from NumPy, too, using the following code:

from numpy import random

You will need to select one of the alternative random number generators that are provided by NumPy (or...

Generating normally distributed random numbers

In the Generating random data recipe, we generated random floating-point numbers following a uniform distribution between 0 and 1, but not including 1. However, in most cases where we require random data, we need to follow one of several different distributions instead. Roughly speaking, a distribution function is a function, , that describes the probability that a random variable has a value that is below . In practical terms, the distribution describes the spread of the random data over a range. In particular, if we create a histogram of data that follows a particular distribution, then it should roughly resemble the graph of the distribution function. This is best seen by example.

One of the most common distributions is normal distribution, which appears frequently in statistics and forms the basis for many statistical methods that we will see in Chapter 6, Working with Data and Statistics. In this recipe, we will demonstrate how...

Working with random processes

In this recipe, we will examine a simple example of a random process that models the number of bus arrivals at a stop over time. This process is called a Poisson process. A Poisson process, , has a single parameter, , which is usually called the intensity or rate, and the probability that takes the value at a given time is given by the following formula:

This equation describes the probability that buses have arrived by time . Mathematically, this equation means that has a Poisson distribution with the parameter . There is, however, an easy way to construct a Poisson process by taking sums of inter-arrival times that follow an exponential distribution. For instance, let be the time between the ()-st arrival and the -th arrival, which are exponentially distributed with parameter . Now, we take the following equation:

Here, the number is the maximum such that . This is the construction that we will work through in this recipe. We will...

Analyzing conversion rates with Bayesian techniques

Bayesian probability allows us to systematically update our understanding (in a probabilistic sense) of a situation by considering data. In more technical language, we update the prior distribution (our current understanding) using data to obtain a posterior distribution. This is particularly useful, for example, when examining the proportion of users who go on to buy a product after viewing a website. We start with our prior belief distribution. For this, we will use the beta distribution, which models the probability of success given a number of observed successes (completed purchases) against failures (no purchases). For this recipe, we will assume that our prior belief is that we expect 25 successes from 100 views (75 fails). This means that our prior belief follows a beta (25, 75) distribution. Let’s say that we wish to calculate the probability that the true rate of success is at least 33%.

Our method is roughly divided...

Estimating parameters with Monte Carlo simulations

Monte Carlo methods broadly describe techniques that use random sampling to solve problems. These techniques are especially powerful when the underlying problem involves some kind of uncertainty. The general method involves performing large numbers of simulations, each sampling different inputs according to a given probability distribution, and then aggregating the results to give a better approximation of the true solution than any individual sample solution.

MCMC is a specific kind of Monte Carlo simulation in which we construct a Markov chain of successively better approximations of the true distribution that we seek. This works by accepting or rejecting a proposed state, sampled at random, based on carefully selected acceptance probabilities at each stage, with the aim of constructing a Markov chain whose unique stationary distribution is precisely the unknown distribution that we wish to find.

In this recipe, we will use...

Further reading

A good, comprehensive reference for probability and random processes is the following book:

  • Grimmett, G. and Stirzaker, D. (2009). Probability and random processes. 3rd ed. Oxford: Oxford Univ. Press.

An easy introduction to Bayes’ theorem and Bayesian statistics is the following:

  • Kurt, W. (2019). Bayesian statistics the fun way. San Francisco, CA: No Starch Press, Inc.
lock icon The rest of the chapter is locked
You have been reading a chapter from
Applying Math with Python - Second Edition
Published in: Dec 2022 Publisher: Packt ISBN-13: 9781804618370
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}