You're reading from Applying Math with Python - Second Edition

Product type Book

Published in Dec 2022

Publisher Packt

ISBN-13 9781804618370

Pages 376 pages

Edition 2nd Edition

Languages

Concepts

Statistics

Author (1):

Sam Morley

Table of Contents (13) Chapters

Preface

Chapter 1: An Introduction to Basic Packages, Functions, and Concepts

Chapter 2: Mathematical Plotting with Matplotlib

Chapter 3: Calculus and Differential Equations

Chapter 4: Working with Randomness and Probability

Chapter 5: Working with Trees and Networks

Chapter 6: Working with Data and Statistics

Chapter 7: Using Regression and Forecasting

Chapter 8: Geometric Problems

Chapter 9: Finding Optimal Solutions

Chapter 10: Improving Your Productivity

Index

Why subscribe?

Other Books You May Enjoy

Improving Your Productivity

In this chapter, we will look at several topics that don’t fit within the categories that we discussed in the previous chapters of this book. Most of these topics are concerned with different ways to facilitate computing and otherwise optimize the execution of our code. Others are concerned with working with specific kinds of data or file formats.

The aim of this chapter is to provide you with some tools that, while not strictly mathematical in nature, often appear in mathematical problems. These include topics such as distributed computing and optimization – both help you to solve problems more quickly, validate data and calculations, load and store data from file formats commonly used in scientific computation, and incorporate other topics that will generally help you be more productive with your code.

In the first two recipes, we will cover packages that help keep track of units and uncertainties in calculations. These are very important...

Technical requirements

This chapter requires many different packages due to the nature of the recipes it contains. The list of packages we need is as follows:

Pint
uncertainties
netCDF4
xarray
Pandas
Scikit-learn
GeoPandas
Geoplot
Jupyter
Papermill
Cerberus
Cython
Dask

All of these packages can be installed using your favorite package manager, such as pip:

python3.10 -m pip install pint uncertainties netCDF4 xarray pandas scikit-learn geopandas geoplot jupyter papermill cerberus cython

To install the Dask package, we need to install the various extras associated with the package. We can do this using the following pip command in the terminal:

python3.10 -m pip install dask[complete]

In addition to these Python packages, we will also need to install some supporting software. For the Working with geographical data recipe, the GeoPandas and Geoplot libraries have numerous lower-level dependencies that might need...

Keeping track of units with Pint

Correctly keeping track of units in calculations can be very difficult, particularly if there are places where different units can be used. For example, it is very easy to forget to convert between different units – feet/inches into meters – or metric prefixes – converting 1 km into 1,000 m, for instance.

In this recipe, we’ll learn how to use the Pint package to keep track of units of measurement in calculations.

Getting ready

For this recipe, we need the Pint package, which can be imported as follows:

import pint

How to do it...

The following steps show you how to use the Pint package to keep track of units in calculations:

First, we need to create a UnitRegistry object:
```
ureg = pint.UnitRegistry(system="mks")
```
To create a quantity with a unit, we multiply the number by the appropriate attribute of the registry object:
```
distance = 5280 * ureg.feet
```
We can change the units of the quantity...

Accounting for uncertainty in calculations

Most measuring devices are not 100% accurate and instead are accurate up to a certain amount, usually somewhere between 0 and 10%. For instance, a thermometer might be accurate to 1%, while a pair of digital calipers might be accurate up to 0.1%. The true value in both of these cases is unlikely to be exactly the reported value, although it will be fairly close. Keeping track of the uncertainty in a value is difficult, especially when you have multiple different uncertainties combined in different ways. Rather than keeping track of this by hand, it is much better to use a consistent library to do this for you. This is what the uncertainties package does.

In this recipe, we will learn how to quantify the uncertainty of variables and see how these uncertainties propagate through a calculation.

Getting ready

For this recipe, we will need the uncertainties package, from which we will import the ufloat class and the umath module:

from...

Loading and storing data from NetCDF files

Many scientific applications require that we start with large quantities of multi-dimensional data in a robust format. NetCDF is one example of a format used for data that’s developed by the weather and climate industry. Unfortunately, the complexity of the data means that we can’t simply use the utilities from the Pandas package, for example, to load this data for analysis. We need the netcdf4 package to be able to read and import the data into Python, but we also need to use xarray. Unlike the Pandas library, xarray can handle higher-dimensional data while still providing a Pandas-like interface.

In this recipe, we will learn how to load data from and store data in NetCDF files.

Getting ready

For this recipe, we will need to import the NumPy package as np, the Pandas package as pd, the Matplotlib pyplot module as plt, and an instance of the default random number generator from NumPy:

import numpy as np
import pandas...

Working with geographical data

Many applications involve working with geographical data. For example, when tracking global weather, we might want to plot the temperature as measured by various sensors around the world at their position on a map. For this, we can use the GeoPandas package and the Geoplot package, both of which allow us to manipulate, analyze, and visualize geographical data.

In this recipe, we will use the GeoPandas and Geoplot packages to load and visualize some sample geographical data.

Getting ready

For this recipe, we will need the GeoPandas package, the Geoplot package, and the Matplotlib pyplot package imported as plt:

import geopandas
import geoplot
import matplotlib.pyplot as plt

How to do it...

Follow these steps to create a simple plot of the capital cities plotted on a map of the world using sample data:

First, we need to load the sample data from the GeoPandas package, which contains the world geometry information:
```
world = geopandas...
```

Executing a Jupyter notebook as a script

Jupyter notebooks are a popular medium for writing Python code for scientific and data-based applications. A Jupyter notebook is really a sequence of blocks that is stored in a file in JavaScript Object Notation (JSON) with the ipynb extension. Each block can be one of several different types, such as code or markdown. These notebooks are typically accessed through a web application that interprets the blocks and executes the code in a background kernel that then returns the results to the web application. This is great if you are working on a personal PC, but what if you want to run the code contained within a notebook remotely on a server? In this case, it might not even be possible to access the web interface provided by the Jupyter Notebook software. The papermill package allows us to parameterize and execute notebooks from the command line.

In this recipe, we’ll learn how to execute a Jupyter notebook from the command line using...

Validating data

Data is often presented in a raw form and might contain anomalies or incorrect or malformed data, which will obviously present a problem for later processing and analysis. It is usually a good idea to build a validation step into a processing pipeline. Fortunately, the Cerberus package provides a lightweight and easy-to-use validation tool for Python.

For validation, we have to define a schema, which is a technical description of what the data should look like and the checks that should be performed on the data. For example, we can check the type and place bounds on the maximum and minimum values. Cerberus validators can also perform type conversions during the validation step, which allows us to plug data loaded directly from CSV files into the validator.

In this recipe, we will learn how to use Cerberus to validate data loaded from a CSV file.

Getting ready

For this recipe, we need to import the csv module from the Python Standard Library (https://docs...

Accelerating code with Cython

Python is often criticized for being a slow programming language – an endlessly debatable statement. Many of these criticisms can be addressed by using a high-performance compiled library with a Python interface – such as the scientific Python stack – to greatly improve performance. However, there are some situations where it is difficult to avoid the fact that Python is not a compiled language. One way to improve performance in these (fairly rare) situations is to write a C extension (or even rewrite the code entirely in C) to speed up the critical parts. This will certainly make the code run more quickly, but it might make it more difficult to maintain the package. Instead, we can use Cython, which is an extension of the Python language that is transpiled into C and compiled for great performance improvements.

For example, we can consider some code that’s used to generate an image of the Mandelbrot set. For comparison,...

Distributing computing with Dask

Dask is a library that’s used for distributing computing across multiple threads, processes, or even computers in order to effectively perform computation on a huge scale. This can greatly improve performance and throughput, even if you are working on a single laptop computer. Dask provides replacements for most of the data structures from the Python scientific stack, such as NumPy arrays and Pandas DataFrames. These replacements have very similar interfaces, but under the hood, they are built for distributed computing so that they can be shared between multiple threads, processes, or computers. In many cases, switching to Dask is as simple as changing the import statement, and possibly adding a couple of extra method calls to start concurrent computations.

In this recipe, we will learn how to use Dask to do some simple computations on a DataFrame.

Getting ready

For this recipe, we will need to import the dataframe module from the Dask...

Writing reproducible code for data science

One of the fundamental principles of the scientific method is the idea that results should be reproducible and independently verifiable. Sadly, this principle is often undervalued in favor of “novel” ideas and results. As practitioners of data science, we have an obligation to do our part to make our analyses and results as reproducible as possible.

Since data science is typically done entirely on computers – that is, it doesn’t usually involve instrumental errors involved in measurements – some might expect that all data science is inherently reproducible. This is certainly not the case. It is easy to overlook simple things such as seeding randomness (see Chapter 3) when using randomized hyperparameter searches or stochastic gradient descent-based optimization. Moreover, more subtle non-deterministic factors (such as use of threading or multiprocessing) can dramatically change results if you are not aware...

The rest of the chapter is locked

You're reading from Applying Math with Python - Second Edition

Table of Contents (13) Chapters

Improving Your Productivity

Technical requirements

Keeping track of units with Pint

Getting ready

How to do it...

Accounting for uncertainty in calculations

Getting ready

Loading and storing data from NetCDF files

Getting ready

Working with geographical data

Getting ready

How to do it...

Executing a Jupyter notebook as a script

Validating data

Getting ready

Accelerating code with Cython

Distributing computing with Dask

Getting ready

Writing reproducible code for data science

Authors (1)

Personalised recommendations for you

You're reading from Applying Math with Python - Second Edition

Table of Contents (13) Chapters

Improving Your Productivity

Technical requirements

Keeping track of units with Pint

Getting ready

How to do it...

Accounting for uncertainty in calculations

Getting ready

Loading and storing data from NetCDF files

Getting ready

Working with geographical data

Getting ready

How to do it...

Executing a Jupyter notebook as a script

Validating data

Getting ready

Accelerating code with Cython

Distributing computing with Dask

Getting ready

Writing reproducible code for data science

Unlock this book and the full library FREE for 7 days

Authors (1)

Personalised recommendations for you