The Statistics and Calculus with Python Workshop

By Peter Farrell , Alvaro Fuentes , Ajinkya Sudhir Kolhe and 3 more
  • Instant online access to over 7,500+ books and videos
  • Constantly updated with 100+ new titles each month
  • Breadth and depth in over 1,000+ technologies
  1. 1. Fundamentals of Python

About this book

Are you looking to start developing artificial intelligence applications? Do you need a refresher on key mathematical concepts? Full of engaging practical exercises, The Statistics and Calculus with Python Workshop will show you how to apply your understanding of advanced mathematics in the context of Python.

The book begins by giving you a high-level overview of the libraries you'll use while performing statistics with Python. As you progress, you'll perform various mathematical tasks using the Python programming language, such as solving algebraic functions with Python starting with basic functions, and then working through transformations and solving equations. Later chapters in the book will cover statistics and calculus concepts and how to use them to solve problems and gain useful insights. Finally, you'll study differential equations with an emphasis on numerical methods and learn about algorithms that directly calculate values of functions.

By the end of this book, you’ll have learned how to apply essential statistics and calculus concepts to develop robust Python applications that solve business challenges.

Publication date:
August 2020
Publisher
Packt
Pages
740
ISBN
9781800209763

 

1. Fundamentals of Python

Overview

This chapter reviews the basic Python data structures and tools that will be used in future discussions. These concepts will allow us to refresh our memory regarding Python's most fundamental and important features, while simultaneously preparing us for advanced topics in later chapters.

By the end of this chapter, you will be able to use control flow methods to design your Python programs and initialize common Python data structures, as well as manipulate their content. You will solidify your understanding of functions and recursion in Python algorithm design. You will also be able to facilitate debugging, testing, and version control for Python programs. Finally, in the activity at the end of this chapter, you will create a Sudoku solver.

 

Introduction

Python has enjoyed an unprecedented increase in popularity and usage in recent years, especially in mathematics, which is the main topic of this chapter. However, before we delve into the advanced topics in mathematics, we will need to solidify our understanding of the fundamentals of the language.

This chapter will offer a refresher on the general concepts of Python; the topics covered will allow you to be in the best position for later discussions in this book. Specifically, we will be reviewing elementary concepts in general programming such as conditionals and loops, as well as Python-specific data structures such as lists and dictionaries. We will also discuss functions and the algorithm design process, which is an important part in any medium or large Python project that includes mathematics-related programs. All of this will be done through hands-on exercises and activities.

By the end of this chapter, you will be well positioned to tackle more complex, interesting problems in later chapters of this book.

 

Control Flow Methods

Control flow is a general term that denotes any programming syntax that can redirect the execution of a program. Control flow methods in general are what allow programs to be dynamic in their execution and computation: depending on the current state of a program or its input, the execution of that program and thus its output will dynamically change.

if Statements

The most common form of control flow in any programming language is conditionals, or if statements. if statements are used to check for a specific condition about the current state of the program and, depending on the result (whether the condition is true or false), the program will execute different sets of instructions.

In Python, the syntax of an if statement is as follows:

if [condition to check]:
    [instruction set to execute if condition is true]

Given the readability of Python, you can probably already guess how conditionals work: when the execution of a given program reaches a conditional and checks the condition in the if statement, if the condition is true, the indented set of instructions inside the if statement will be executed; otherwise, the program will simply skip those instructions and move on.

Within an if statement, it is possible for us to check for a composite condition, which is a combination of multiple individual conditions. For example, using the and keyword, the following if block is executed when both of its conditions are satisfied:

if [condition 1] and [condition 2]:
    [instruction set]

Instead of doing this, we can use the or keyword in a composite condition, which will display positive (true) if either the condition to the left or to the right of the keyword is true. It is also possible to keep extending a composite condition with more than one and/or keyword to implement conditionals that are nested on multiple levels.

When a condition is not satisfied, we might want our program to execute a different set of instructions. To implement this logic, we can use elif and else statements, which should immediately follow an if statement. If the condition in the if statement is not met, our program will move on and evaluate the subsequent conditions in the elif statements; if none of the conditions are met, any code inside an else block will be executed. An if...elif...else block in Python is in the following form:

if [condition 1]:
    [instruction set 1]
elif [condition 2]:
    [instruction set 2]
...
elif [condition n]:
    [instruction set n]
else:
    [instruction set n + 1]

This control flow method is very valuable when there is a set of possibilities that our program needs to check for. Depending on which possibility is true at a given moment, the program should execute the corresponding instructions.

Exercise 1.01: Divisibility with Conditionals

In mathematics, the analysis of variables and their content is very common, and one of the most common analyses is the divisibility of an integer. In this exercise, we will use if statements to consider the divisibility of a given number by 5, 6, or 7.

Perform the following steps in order to achieve this:

  1. Create a new Jupyter notebook and declare a variable named x whose value is any integer, as shown in the following code:
    x = 130
  2. After that declaration, write an if statement to check whether x is divisible by 5 or not. The corresponding code block should print out a statement indicating whether the condition has been met:
    if x % 5 == 0:
        print('x is divisible by 5')

    Here, % is the modulo operator in Python; the var % n expression returns the remainder when we divide the var variable by the number, n.

  3. In the same code cell, write two elif statements to check whether x is divisible by 6 and 7, respectively. Appropriate print statements should be placed under their corresponding conditionals:
    elif x % 6 == 0:
        print('x is divisible by 6')
    elif x % 7 == 0:
        print('x is divisible by 7')
  4. Write the final else statement to print out a message stating that x is not divisible by either 5, 6, or 7 (in the same code cell):
    else:
        print('x is not divisible by 5, 6, or 7')
  5. Run the program with a different value assigned to x each time to test the conditional logic we have. The following output is an example of this with x assigned with the value 104832:
    x is divisible by 6
  6. Now, instead of printing out a message about the divisibility of x, we would like to write that message to a text file. Specifically, we want to create a file named output.txt that will contain the same message that we printed out previously.

    To do this, we can use the with keyword together with the open() function to interact with the text file. Note that the open() function takes in two arguments: the name of the file to write to, which is output.txt in our case, and w (for write), which specifies that we would like to write to file, as opposed to reading the content from a file:

    if x % 5 == 0:
        with open('output.txt', 'w') as f:
            f.write('x is divisible by 5')
    elif x % 6 == 0:
        with open('output.txt', 'w') as f:
            f.write('x is divisible by 6')
    elif x % 7 == 0:
        with open('output.txt', 'w') as f:
            f.write('x is divisible by 7')
    else:
        with open('output.txt', 'w') as f:
            f.write('x is not divisible by 5, 6, or 7')
  7. Check the message in the output text file for its correctness. If the x variable still holds the value of 104832, your text file should have the following contents:
    x is divisible by 6

In this exercise, we applied the usage of conditionals to write a program that determines the divisibility of a given number by 6, 3, and 2 using the % operator. We also saw how to write content to a text file in Python. In the next section, we will start discussing loops in Python.

Note

The code lines in the elif block are executed sequentially, and breaks from sequence, when any one of the conditions is true. This implies that when x is assigned the value 30, once x%5==0 is satisfied, x%6==0 is not checked.

To access the source code for this specific section, please refer to https://packt.live/3dNflxO.

You can also run this example online at https://packt.live/2AsqO8w.

Loops

Another widely used control flow method is the use of loops. These are used to execute the same set of instructions repeatedly over a specified range or while a condition is met. There are two types of loops in Python: while loops and for loops. Let's understand each one in detail.

The while Loop

A while loop, just like an if statement, checks for a specified condition to determine whether the execution of a given program should keep on looping or not. For example, consider the following code:

>>> x = 0
>>> while x < 3:
...     print(x)
...     x += 1
0
1
2

In the preceding code, after x was initialized with the value 0, a while loop was used to successively print out the value of the variable and increment the same variable at each iteration. As you can imagine, when this program executes, 0, 1, and 2 will be printed out and when x reaches 3, the condition specified in the while loop is no longer met, and the loop therefore ends.

Note that the x += 1 command corresponds to x = x + 1, which increments the value of x during each iteration of the loop. If we remove this command, then we would get an infinite loop printing 0 each time.

The for Loop

A for loop, on the other hand, is typically used to iterate through a specific sequence of values. Using the range function in Python, the following code produces the exact same output that we had previously:

>>> for x in range(3):
...     print(x)
0
1
2

The in keyword is the key to any for loop in Python: when it is used, the variable in front of it will be assigned values inside the iterator that we'd like to loop through sequentially. In the preceding case, the x variable is assigned the values inside the range(3) iterator—which are, in order, 0, 1, and 2—at each iteration of the for loop.

Instead of range(), other types of iterators can also be used in a Python for loop. The following table gives a brief summary of some of the most common iterators to be used in for loops. Don't worry if you are not familiar with the data structures included in this table; we will cover those concepts later in this chapter:

Figure 1.1: List of datasets and their examples

Figure 1.1: List of datasets and their examples

It is also possible to nest multiple loops inside one another. While the execution of a given program is inside a loop, we can use the break keyword to exit the current loop and move on with the execution.

Exercise 1.02: Number Guessing Game

For this exercise, we will put our knowledge of loops to practice and write a simple guessing game. A target integer between 0 and 100 is randomly selected at the beginning of the program. Then, the program will take in user inputs as guesses of what this number is. In response, the program will print out a message saying Lower if the guess is greater than the actual target, or Higher if the opposite is true. The program should terminate when the user guesses correctly.

Perform the following steps to complete this exercise:

  1. In the first cell of a new Jupyter notebook, import the random module in Python and use its randint function to generate random numbers:
    import random
    true_value = random.randint(0, 100)

    Every time the randint() function is called, it generates a random integer between the two numbers passed to it; in our case, an integer between 0 and 100 will be generated.

    While they are not needed for the rest of this exercise, if you are curious about other functionalities that the random module offers, you can take a look at its official documentation at https://docs.python.org/3/library/random.html.

    Note

    The rest of the program should also be put in the current code cell.

  2. Use the input() function in Python to take in the user's input and assign the returned value to a variable (guess, in the following code). This value will be interpreted as the guess of what the target is from the user:
    guess = input('Enter your guess: ')
  3. Convert the user input into an integer using the int() function and check it against the true target. Print out appropriate messages for all possible cases of the comparison:
    guess = int(guess)
    if guess == true_value:
        print('Congratulations! You guessed correctly.')
    elif guess > true_value:
        print('Lower.')  # user guessed too high
    else:
        print('Higher.')  # user guessed too low

    Note

    The # symbol in the code snippet below denotes a code comment. Comments are added into code to help explain specific bits of logic.

  4. With our current code, the int() function will throw an error and crash the entire program if its input cannot be converted into an integer (for example, when the input is a string character). For this reason, we need to implement the code we have inside a try...except block to handle the situation where the user enters a non-numeric value:
    try:
        if guess == true_value:
            print('Congratulations! You guessed correctly.')
        elif guess > true_value:
            print('Lower.')  # user guessed too high
        else:
            print('Higher.')  # user guessed too low
    # when the input is invalid
    except ValueError:
        print('Please enter a valid number.')
  5. As of now, the user can only guess exactly once before the program terminates. To implement the feature that would allow the user to repeatedly guess until they find the target, we will wrap the logic we have developed so far in a while loop, which will break if and only if the user guesses correctly (implemented by a while True loop with the break keyword placed appropriately).

    The complete program should look similar to the following code:

    import random
    true_value = random.randint(0, 100)
    while True:
        guess = input('Enter your guess: ')
        try:
            guess = int(guess)
            if guess == true_value:
                print('Congratulations! You guessed correctly.')
                break
            elif guess > true_value:
                print('Lower.')  # user guessed too high
            else:
                print('Higher.')  # user guessed too low
        # when the input is invalid
        except ValueError:
            print('Please enter a valid number.')
  6. Try rerunning the program by executing the code cell and test out different input options to ensure that the program can process its instructions nicely, as well as handle cases of invalid inputs. For example, the output the program might produce when the target number is randomly selected to be 13 is as follows:
    Enter your guess: 50
    Lower.
    Enter your guess: 25
    Lower.
    Enter your guess: 13
    Congratulations! You guessed correctly.

In this exercise, we have practiced using a while loop in a number guessing game to solidify our understanding of the usage of loops in programming. In addition, you have been introduced to a method of reading in user input and the random module in Python.

Note

To access the source code for this specific section, please refer to https://packt.live/2BYK6CR.

You can also run this example online at https://packt.live/2CVFbTu.

Next, we will start considering common Python data structures.

 

Data Structures

Data structures are types of variables that represent different forms of information that you might want to create, store, and manipulate in your program. Together with control flow methods, data structures are the other fundamental building block of any programming language. In this section, we will go through some of the most common data structures in Python, starting with strings.

Strings

Strings are sequences of characters that are typically used to represent textual information (for example, a message). A Python string is denoted by any given textual data inside either single- or double-quotation marks. For example, in the following code snippet, the a and b variables hold the same information:

a = 'Hello, world!'
b = "Hello, world!"

Since strings are roughly treated as sequences in Python, common sequence-related operations can be applied to this data structure. In particular, we can concatenate two or more strings together to create a long-running string, we can iterate through a string using a for loop, and individual characters and substrings can be accessed using indexing and slicing. The effects of these operations are demonstrated in the following code:

>>> a = 'Hello, '
>>> b = 'world!'
>>> print(a + b)
Hello, world!
>>> for char in a:
...     print(char)
H
e
l
l
o
,
 # a blank character printed here, the last character in string a
>>> print(a[2])
l
>>> print(a[1: 4]) 
ell

One of the most important features that was added in Python 3.6 was f-strings, a syntax to format strings in Python. Since we are using Python 3.7, we can avail this feature. String formatting is used when we would like to insert the value of a given variable into a predefined string. Before f-strings, there were two other formatting options, which you may be familiar with: %-formatting and str.format(). Without going into too much detail, these two methods have a few undesirable characteristics, and f-strings was therefore developed to address those problems.

The syntax for f-strings is defined with curly brackets, { and }. For example, we can combine the printed value of a variable using an f-string as follows:

>>> a = 42
>>> print(f'The value of a is {a}.')
The value of a is 42.

When a variable is put inside the f-string curly brackets, its __str__() representation will be used in the final printed output. This means you can obtain further flexibility with f-strings by overwriting and customizing the dunder method, __str__(), while working with Python objects.

Common numeric formatting options for strings such as specifying the number of digits after the decimal or datetime formatting can be done in f-strings using the colon, as demonstrated here:

>>> from math import pi
>>> print(f'Pi, rounded to three decimal places, is {pi:.3f}.')
Pi, rounded to three decimal places, is 3.142.
>>> from datetime import datetime
>>> print(f'Current time is {datetime.now():%H:%M}.')
Current time is 21:39.

Another great thing about f-strings is that they are faster to render and process than the other two string formatting methods. Next, let's discuss Python lists.

Lists

Lists are arguably the most used data structure in Python. It is Python's own version of an array in Java or C/C++. A list is a sequence of elements that can be accessed or iterated over in order. Unlike, say, Java arrays, elements in a Python list do not have to be of the same data structure, as demonstrated here:

>>> a = [1, 'a', (2, 3)]  # a list containing a number, a string, and a tuple

Note

We'll talk more about tuples in the next section.

As we have discussed previously, elements in a list can be iterated over in a for loop in a similar way as characters in a string. Lists can also be indexed and sliced in the same way as strings:

>>> a = [1, 'a', (2, 3), 2]
>>> a[2]
(2, 3)
>>> a[1: 3]
['a', (2, 3)]

There are two ways to add new elements to a Python list: append() inserts a new single element to the end of a list, while list concatenation simply concatenates two or more strings together, as shown here:

>>> a = [1, 'a', (2, 3)]
>>> a.append(3)
>>> a
[1, 'a', (2, 3), 3]
>>> b = [2, 5, 'b']
>>> a + b
[1, 'a', (2, 3), 3, 2, 5, 'b']

To remove an element from a list, the pop() method, which takes in the index of the element to be removed, can be used.

One of the operations that make Python lists unique is list comprehension: a Pythonic syntax to efficiently initialize lists using a for loop placed inside square brackets. List comprehension is typically used when we want to apply an operation to an existing list to create a new list. For example, say we have a list variable, a, containing some integers:

>>> a = [1, 4, 2, 9, 10, 3]

Now, we want to create a new list, b, whose elements are two times the elements in a, in order. We could potentially initialize b as an empty list and iteratively loop through a and append the appropriate values to b. However, with list comprehension, we can achieve the same result with a more elegant syntax:

>>> b = [2 * element for element in a]
>>> b
[2, 8, 4, 18, 20, 6]

Furthermore, we can even combine conditionals inside a list comprehension to implement complex logic in this process of creating Python lists. For example, to create a list of twice the elements in a that are odd numbers, we can do the following:

>>> c = [2 * element for element in a if element % 2 == 1]
>>> c
[2, 18, 6]

Another Python data structure that is very often contrasted with list is tuple, which we will discuss in the next section. However, before moving forward, let's go through an exercise on a new concept: multi-dimensional lists/arrays.

Multi-dimensional arrays, also known as tables or matrices (and sometimes tensors), are common objects in the field of mathematics and machine learning. Given the fact that elements in a Python list can be any Python objects, we can model arrays that span more than one dimension using lists in a list. Specifically, imagine that, within an overarching Python list, we have three sublists, each having three elements in it. This object can be thought of as a 2D, 3 x 3 table. In general, we can model n-dimensional arrays using Python lists that are nested inside other lists n times.

Exercise 1.03: Multi-Dimensional Lists

In this exercise, we will familiarize ourselves with the concept of multi-dimensional lists and the process of iterating through them. Our goal here is to write logic commands that dynamically display the content of a 2D list.

Perform the following steps to complete this exercise:

  1. Create a new Jupyter notebook and declare a variable named a in a code cell, as follows:
    a = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

    This variable represents a 3 x 3 2D table, with the individual sublists in the list representing the rows.

  2. In a new code cell, iterate through the rows by looping through the elements in list a (do not run the cell just yet):
    for row in a:
  3. At each iteration in this for loop, a sublist in a is assigned to a variable called row. We can then access the individual cells in the 2D table by indexing the individual rows. The following for loop will print out the first element in each sublist, or in other words, the number in the first cell of each row in the table (1, 4, and 7):
    for row in a:
        print(row[0])
  4. In a new code cell, print out the values of all the cells in table a by having a nested for loop, whose inner loop will iterate through the sublists in a:
    for row in a:
        for element in row:
            print(element)

    This should print out the numbers from 1 to 9, each in a separate row.

  5. Finally, in a new cell, we need to print out the diagonal elements of this table in a nicely formatted message. To do this, we can have an indexing variable — i, in our case — loop from 0 to 2 to access the diagonal elements of the table:
    for i in range(3):
        print(a[i][i])

    Your output should be 1, 5, and 9, each in a separate row.

    Note

    This is because the row index and the column index of a diagonal element in a table/matrix are equal.

  6. In a new cell, change the preceding print statements using f-strings to format our printed output:
    for i in range(3):
        print(f'The {i + 1}-th diagonal element is: {a[i][i]}')

    This should produce the following output:

    The 1-th diagonal element is: 1
    The 2-th diagonal element is: 5
    The 3-th diagonal element is: 9

In this exercise, we have combined what we have learned about loops, indexing, and f-string formatting to create a program that dynamically iterates through a 2D list.

Note

To access the source code for this specific section, please refer to https://packt.live/3dRP8OA.

You can also run this example online at https://packt.live/3gpg4al.

Next, we'll continue our discussion about other Python data structures.

Tuples

Declared with parentheses instead of square brackets, Python tuples are still sequences of different elements, similar to lists (although the parentheses can be omitted in assignment statements). The main difference between these two data structures is that tuples are immutable objects in Python—this means they cannot be mutated, or changed, in any way after their initialization, as shown here:

>>> a = (1, 2)
>>> a[0] = 3  # trying to change the first element
Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
TypeError: 'tuple' object does not support item assignment
>>> a.append(2)  # trying to add another element
Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
AttributeError: 'tuple' object has no attribute 'append'

Given this key difference between tuples and lists, we can utilize these data structures accordingly: when we want a sequence of elements to be immutable for any number of reasons (for example, to ensure the logical integrity functions), a tuple can be used; if we allow the sequence to be able to be changed after its initialization, it can be declared as a list.

Next, we will be discussing a common data structure in mathematical computing: sets.

Sets

If you are already familiar with the mathematical concept, the definition of a Python set is essentially the same: a Python set is a collection of unordered elements. A set can be initialized with curly brackets, and a new element can be added to a set using the add() method, like so:

>>> a = {1, 2, 3}
>>> a.add(4)
>>> a
{1, 2, 3, 4}

Since a set is a collection of Python elements, or in other words, an iterator, its elements can still be iterated over using a for loop. However, given its definition, there is no guarantee that those elements will be iterated in the same order as they are initialized in or added to the set.

Furthermore, when an element that already exists in a set is added to that set, the statement will have no effect:

>>> a
{1, 2, 3, 4}
>>> a.add(3)
>>> a
{1, 2, 3, 4}

Taking the union or the intersection of two given sets are the most common set operations and can be achieved via the union() and intersection() methods in Python, respectively:

>>> a = {1, 2, 3, 4}
>>> b = {2, 5, 6}
>>> a.union(b)
{1, 2, 3, 4, 5, 6}
>>> a.intersection(b)
{2}

Finally, to remove a given element from a set, we can use either the discard() method or the remove() method. Both remove the item passed to them from a set. However, if the item does not exist in the set, the former will not mutate the set, while the latter will raise an error. Just like tuples and lists, you can choose to use one of these two methods in your program to implement specific logic, depending on your goal.

Moving on, the last Python data structure that we will be discussing in this section is dictionaries.

Dictionaries

Python dictionaries are the equivalent of hash maps in Java, where we can specify key-value pair relationships and perform lookups on a key to obtain its corresponding value. We can declare a dictionary in Python by listing out key-value pairs in the form of key: value, separated by commas inside curly brackets.

For example, a sample dictionary containing students' names mapped to their final scores in a class may look as follows:

>>> score_dict = {'Alice': 90, 'Bob': 85, 'Carol': 86}
>>> score_dict
{'Alice': 90, 'Bob': 85, 'Carol': 86}

In this case, the names of the students ('Alice', 'Bob', and 'Carol') are the keys of the dictionary, while their respective scores are the values that the keys are mapped to. A key cannot be used to map to multiple different values. The value of a given key can be accessed by passing the key to the dictionary inside square brackets:

>>> score_dict['Alice']
90
>>> score_dict['Carol']
86
>>> score_dict['Chris']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'Chris'

Note that in the last statement in the preceding snippet, 'Chris' is not a key in the dictionary, so when we attempt to access its value, KeyError is returned by the Python interpreter.

Changing the value of an existing key or adding a new key-value pair to an existing dictionary can be done using the same syntax:

>>> score_dict['Alice'] = 89
>>> score_dict
{'Alice': 89, 'Bob': 85, 'Carol': 86}
>>> score_dict['Chris'] = 85
>>> score_dict
{'Alice': 89, 'Bob': 85, 'Carol': 86, 'Chris': 85}

Similar to list comprehension, a Python dictionary can be declared using dictionary comprehension. For instance, the following statement initializes a dictionary mapping integers from -1 to 1 (inclusively) to their respective squares:

>>> square_dict = {i: i ** 2 for i in range(-1, 2)}
>>> square_dict
{-1: 1, 0: 0, 1: 1}

As we can see, this dictionary contains the key-value pairs xx ** 2 for every x between -1 and 1, which was done by placing the for loop inside the initialization of the dictionary.

To delete a key-value pair from a dictionary, we would need to use the del keyword. Say we would like to delete the 'Alice' key and its corresponding value. We would do this like so:

>>> del score_dict['Alice']

Attempting to access a deleted key will cause the Python interpreter to raise an error:

>>> score_dict['Alice']
KeyError: 'Alice'

One of the most important aspects of Python dictionaries to keep in mind is the fact that only immutable objects can be dictionary keys. In the examples so far, we have seen strings and numbers as dictionary keys. Lists, which can be mutated and changed after initialization, cannot be used as dictionary keys; tuples, on the other hand, can.

Exercise 1.04: Shopping Cart Calculations

In this exercise, we will use the dictionary data structure to build a skeletal version of a shopping application. This will allow us to review and further understand the data structure and the operations that can be applied to it.

Perform the following steps to complete this exercise:

  1. Create a new Jupyter notebook and declare a dictionary representing any given items available for purchase and their respective prices in the first code cell. Here, we'll add three different types of laptops with their prices in dollars:
    prices = {'MacBook 13': 1300, 'MacBook 15': 2100, \
              'ASUS ROG': 1600}

    Note

    The code snippet shown here uses a backslash ( \ ) to split the logic across multiple lines. When the code is executed, Python will ignore the backslash, and treat the code on the next line as a direct continuation of the current line.

  2. In the next cell, initialize a dictionary representing our shopping cart. The dictionary should be empty at the beginning, but it should map an item in the cart to how many copies are to be purchased:
    cart = {}
  3. In a new cell, write a while True loop that represents each step of the shopping process and asks the user whether they would like to continue shopping or not. Use conditionals to handle different cases of the input (you can leave the case where the user wants to continue shopping until the next step):
    while True:
        _continue = input('Would you like to continue '\
                          'shopping? [y/n]: ')
        if _continue == 'y':
            ...
        elif _continue == 'n':
            break
        else:
            print('Please only enter "y" or "n".')
  4. Inside the first conditional case, take in another user input to ask which item should be added to the cart. Use conditionals to increment the count of the item in the cart dictionary or handle invalid cases:
        if _continue == 'y':
            print(f'Available products and prices: {prices}')
            new_item = input('Which product would you like to '\
                             'add to your cart? ')
            if new_item in prices:
                if new_item in cart:
                    cart[new_item] += 1
                else:
                    cart[new_item] = 1
            else:
                print('Please only choose from the available products.')
  5. In the next cell, loop through the cart dictionary and calculate the total amount of money the user has to pay (by looking up the quantity and price of each item in the cart):
    # Calculation of total bill.
    running_sum = 0
    for item in cart:
        running_sum += cart[item] * prices[item]  # quantity times price
  6. Finally, in a new cell, print out the items in the cart and their respective amount in different lines via a for loop and at the end the total bill. Use an f-string to format the printed output:
    print(f'Your final cart is:')
    for item in cart:
        print(f'- {cart[item]} {item}(s)')
    print(f'Your final bill is: {running_sum}')
  7. Run the program and experiment with different carts to ensure our program is correct. For example, if you were to add two MacBook 13s and one ASUS ROG to my shopping cart and stop, the corresponding output would be as follows:
    Figure 1.2: Output of the shopping cart application

Figure 1.2: Output of the shopping cart application

And that concludes our shopping cart exercise, through which we have familiarized ourselves with the use of dictionaries to look up information. We have also reviewed the use of conditionals and loops to implement control flow methods in a Python program.

Note

To access the source code for this specific section, please refer to https://packt.live/2C1Ra1C.

You can also run this example online at https://packt.live/31F7QXg.

In the next section, we will discuss two integral components of any complex program: functions and algorithms.

 

Functions and Algorithms

While functions denote a specific object in Python programming with which we can order and factor our programs, the term algorithm typically refers to the general organization of a sequence of logic to process its given input data. In data science and scientific computing, algorithms are ubiquitous, commonly taking the form of machine learning models that are used to process data and potentially make predictions.

In this section, we will discuss the concept and syntax of Python functions and then tackle some example algorithm-design problems.

Functions

In its most abstract definition, a function is simply an object that can take in an input and producing an output, according to a given set of instructions. A Python function is of the following form:

def func_name(param1, param2, ...):
     […]
    return […]

The def keyword denotes the start of a Python function. The name of a function can be anything, though the rule is to avoid special characters at the beginning of the name and to use snake case. Between the parentheses are the parameters that the function takes in, which are separated by commas and can be used inside the indented code of the function.

For example, the following function takes in a string (though this requirement is unspecified) and prints out a greeting message:

>>> def greet(name):
...     print(f'Hello, {name}!')

Then, we can call this function on any string that we want and achieve the effect that we intended with the instruction inside the function. If we somehow mis-specify the arguments that a function takes in (for example, the last statement in the following code snippet), the interpreter will return an error:

>>> greet('Quan')
Hello, Quan!
>>> greet('Alice')
Hello, Alice!
>>> greet()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: greet() missing 1 required positional argument: 'name'

It is important to note that any local variables (variables declared inside a function) cannot be used outside of the scope of the function. In other words, once a function finishes its execution, none of its variables will be accessible by other code.

Most of the time, we would want our functions to return some sort of value at the end, which is facilitated by the return keyword. As soon as a return statement is executed, the execution of a program will exit out of a given function and return to the parent scope that called the function. This allows us to design a number of dynamic logic sequences.

For example, imagine a function that takes in a Python list of integers and returns the first element that is divisible by 2 (and returns False if there is no even element in the list):

def get_first_even(my_list):
    [...]
    return  # should be the first even element

Now, the natural way to write this function is to loop through the elements in the list and check for their 2-divisibility:

def get_first_even(my_list):
    for item in my_list:
        if item % 2 == 0:
            [...]
    return  # should be the first even element

However, if and when the condition is met (that is, when the current element we are iterating over is divisible by 2), that very element should be the return value of the function, since it is the first element in the list that is divisible by 2. This means we can actually return it within the if block (and finally return False at the end of the function):

def get_first_even(my_list):
    for item in my_list:
        if item % 2 == 0:
            return item
    return False

This approach is to be contrasted with an alternative version where we only return the element that satisfies our condition at the end of the loop, which will be more time-consuming (execution-wise) and require an additional check as to whether there is an even element in the input list. We will examine a variation of this logic in depth in the next exercise.

Exercise 1.05: Finding the Maximum

Finding the maximum/minimum of an array, or list, is a common exercise in any introductory programming class. In this exercise, we will consider an elevated version of the problem, in which we need to write a function that returns the index and the actual value of the maximum element within a list (if tie-breaking is needed, we return the last maximum element).

Perform the following steps to complete this exercise:

  1. Create a new Jupyter notebook and declare the general structure of our target function in a code cell:
    def get_max(my_list):
        ...
        return ...
  2. Create a variable that keeps track of the index of the current maximum element called running_max_index, which should be initialized to 0:
    def get_max(my_list):
        running_max_index = 0
        ...
        return ...
  3. Loop through the values in the parameter list and their corresponding indices using a for loop and the enumerate operation:
    def get_max(my_list):
        running_max_index = 0
        # Iterate over index-value pairs.
        for index, item in enumerate(my_list):
             [...]
        return ...
  4. At each step of the iteration, check to see if the current element is greater than or equal to the element corresponding to the running indexing variable. If that is the case, assign the index of the current element to the running maximum index:
    def get_max(my_list):
        running_max_index = 0
        # Iterate over index-value pairs.
        for index, item in enumerate(my_list):
            if item >= my_list[running_max_index]:
                running_max_index = index
        return [...]
  5. Finally, return the running maximum index and its corresponding value as a tuple:
    def get_max(my_list):
        running_max_index = 0
        # Iterate over index-value pairs.
        for index, item in enumerate(my_list):
            if item >= my_list[running_max_index]:
                running_max_index = index
        return running_max_index, my_list[running_max_index]
  6. In a new cell, call this function on various lists to test for different cases. An example of this is as follows:
    >>> get_max([1, 3, 2])
    (1, 3)
    >>>  get_max([1, 3, 56, 29, 100, 99, 3, 100, 10, 23])
    (7, 100)

This exercise helped us review the general syntax of a Python function and also offered a refresher on looping. Furthermore, variations of the logic that we considered are commonly found in scientific computing projects (for example, finding the minimum or an element in an iterator that satisfies some given conditions).

Note

To access the source code for this specific section, please refer to https://packt.live/2Zu6KuH.

You can also run this example online at https://packt.live/2BUNjDk.

Next, let's discuss a very specific style of function design called recursion.

Recursion

The term recursion in programming denotes the style of solving a problem using functions by having a function recursively call itself. The idea is that each time the function is called, its logic will take a small step toward the solution of the problem, and by doing this many times, the original problem will be finally solved. The idea is that if we somehow have a way to translate our problem into a small one that can be solved in the same way, we can repeatedly break down the problem to arrive at the base case and ensure that the original, bigger problem is solved.

Consider the problem of computing the sum of n integers. If we somehow already have the sum of the first n - 1 integers, then we can simply add the last number to that sum to compute the total sum of the n numbers. But how can the sum of the first n - 1 numbers be computed? With recursion, we once again assume that if we have the sum of the first n - 2 numbers, then we add in that last number. This process repeats until we reach the first number in the list, and the whole process completes.

Let's consider this function in the following example:

>>> def find_sum(my_list):
...     if len(my_list) == 1:
...             return my_list[0]
...     return find_sum(my_list[: -1]) + my_list[-1]

We can see that, in the general case, the function computes and returns the result of adding the last element of the input list, my_list[-1], to the sum of the sublist without this last element my_list[: -1], which is in turn computed by the find_sum() function itself. Again, we rationalize that if the find_sum() function can somehow solve the problem of summing a list in a smaller case, we can generalize the result to any given non-empty list.

Handling the base case is therefore an integral part of any recursive algorithm. Here, our base case is when the input list is a single-valued one (checked by our if statement), in which case we should simply return that very element in the list.

We can see that this function correctly computes the sum of any non-empty list of integers, as shown here:

>>> find_sum([1, 2, 3])
6
>>> find_sum([1])
1

This is a somewhat basic example, as finding the sum of a list can be easily done by maintaining a running sum and using a for loop to iterate over all the elements in the input list. In fact, most of the time, recursion is less efficient than iteration, as there is significant overhead in repeatedly calling function after function in a program.

However, there are situations, as we will see in the following exercise, where, by abstracting our approach to a problem to a recursive algorithm, we can significantly simplify how the problem is solved.

Exercise 1.06: The Tower of Hanoi

The Tower of Hanoi is a well-known mathematical problem and a classic application of recursion. The problem statement is as follows.

There are three disk stacks where disks can be placed in and n disks, all of different sizes. At the beginning, the disks are stacked in ascending order (the largest at the bottom) in one single stack. At each step of the game, we can take the top disk of one stack and put it at the top of another stack (which can be an empty stack) with the condition that no disk can be placed on top of a disk that is smaller than it.

We are asked to compute the minimum number of moves necessary to move the entire stack of n disk from one stack to another. While the problem can be quite complex if we think about it in a linear way, it becomes simpler when we employ a recursive algorithm.

Specifically, in order to move the n disks, we need to move the top n - 1 disks to another stack, move the bottom, biggest disk to the last stack, and finally move the n - 1 disks in the other stack to the same stack as the biggest disk. Now, imagine we can compute the minimum number of steps taken to move (n - 1) disks from one stack to another, denoted as S(n - 1), then to move n disks, we need 2 S(n - 1) + 1 steps.

That is the recursively analytical solution to the problem. Now, let's write a function to actually compute this quantity for any given n.

Perform the following steps to complete this exercise:

  1. In a new Jupyter notebook, define a function that takes in an integer named n and returns the quantity that we arrived at previously:
    def solve(n):
        return 2 * solve(n - 1) + 1
  2. Create a conditional in the function to handle the base case where n = 1 (note that it only takes one step to move a single disk):
    def solve(n):
        if n == 1:
            return 1
        return 2 * solve(n - 1) + 1
  3. In a different cell, call the function on different inputs to verify that the function returns the correct analytical solution to the problem, which is 2n - 1:
    >>> print(solve(3) == 2 ** 3 - 1)
    True
    >>> print(solve(6) == 2 ** 6 - 1)
    True

    Here, we are using the == operator to compare two values: the returned value from our solve() function and the analytical expression of the solution. If they are equal, we should see the Boolean True printed out, which is the case for both comparisons we have here.

While the code in this exercise is short, it has illustrated the point that recursion can offer elegant solutions to a number of problems and has hopefully solidified our understanding of the procedure of a recursive algorithm (with the general step and a base case).

Note

To access the source code for this specific section, please refer to https://packt.live/2NMrGrk.

You can also run this example online at https://packt.live/2AnAP6R.

With that, we'll move on and start discussing the general process of algorithm design in the next section.

Algorithm Design

Designing algorithms is actually something that we have been doing all along, especially in this section, which is all about functions and algorithms: discussing what a functional object should take in, how it should process that input, and what output it should return at the end of its execution. In this section, we will briefly discuss some practices in a general algorithm-design procedure and then consider a somewhat complex problem called the N-Queens problem as an exercise.

While writing Python functions, some programmers might choose to implement subfunctions (functions within other functions). Following the idea of encapsulation in software development, a subfunction should be implemented when it is only called by instructions within another function. If this is the case, the first function can be viewed as a helper function of the second and therefore should be placed inside that second function. This form of encapsulation allows us to be more organized with our programs/code and ensure that if a piece of code does not need to use the logic within a given function, then it should not have access to it.

The next point of discussion involves recursive search algorithms, which we'll look at in the next exercise. Specifically, when an algorithm is recursively trying to find a valid solution to a given problem, it can reach a state in which there are no valid solutions (for example, when we are trying to find an even element in a list of only odd integers). This leads to the need for a way to indicate that we have reached an invalid state.

In our find-the-first-even-number example, we chose to return False to indicate an invalid state where our input list only consists of odd numbers. Returning some sort of flag such as False or 0 is actually a common practice that we will follow in later examples in this chapter as well.

With that in mind, let's jump into the exercise of this section.

Exercise 1.07: The N-Queens Problem

Another classic algorithm-design problem in mathematics and computer science, the N-Queens problem asks us to place n queen pieces in the game of chess on an n x n chessboard so that no queen piece can attack another. A queen can attack another piece if they share the same row, column, or diagonal, so the problem is essentially finding a combination of locations for the queen pieces so that any two given queens are in different rows, columns, and diagonals.

For this exercise, we will design a backtracking algorithm that searches for a valid solution to this problem for any positive integer, n. The algorithm is as follows:

  1. Given the requirements of the problem, we argue that in order to place n pieces, each row of the chessboard needs to include exactly one piece.
  2. For each row, we iteratively go through all the cells of that row and check to see whether a new queen piece can be placed in a given cell:

    a. If such a cell exists, we place a piece in that cell and move on to the next row.

    b. If a new queen piece cannot be placed in any cell in the current row, we know that we have reached an invalid state and thus return False.

  3. We repeat this process until a valid solution is found.

The following diagram describes how this algorithm works with n = 4:

Figure 1.3: Recursion with the N-Queens problem

Figure 1.3: Recursion with the N-Queens problem

Now, let's actually implement the algorithm:

  1. Create a new Jupyter notebook. In its first cell, declare a variable named N to represent the size of our chessboard, as well as the number of queen pieces we need to place on the board:
    N = 8
  2. A chessboard will be represented as a 2D, n x n list with 0 representing an empty cell and 1 representing a cell with a queen piece. Now, in a new code cell, implement a function that takes in a list of this form and print it out in a nice format:
    # Print out the board in a nice format.
    def display_solution(board):
        for i in range(N):
            for j in range(N):
                print(board[i][j], end=' ')
            print()

    Note that the end=' ' argument in our print statement specifies that instead of ending the printed output with a newline character, it should simply be a space character. This is so that we can print out the cells in the same row using different print statements.

  3. In the next cell, write a function that takes in a board, a row number, and a column number. The function should check to see whether a new queen piece can be placed on this board at the location given by the row and column numbers.

    Note that since we are iteratively placing pieces rows to rows, each time we check to see whether a new piece can be placed at a given location, we only need to check for the rows above the location:

    # Check if a queen can be placed in the position.
    def check_next(board, row, col):
        # Check the current column.
        for i in range(row):
            if board[i][col] == 1:
                return False
        # Check the upper-left diagonal.
        for i, j in zip(range(row, -1, -1), \
                        range(col, -1, -1)):
            if board[i][j] == 1:
                return False
        # Check the upper-right diagonal.
        for i, j in zip(range(row, -1, -1), \
                        range(col, N)):
            if board[i][j] == 1:
                return False
        return True
  4. In the same code cell, implement a function that takes in a board and a row number. This function should go through all the cells in the given row and check to see whether a new queen piece can be placed at a particular cell (using the check_next() function written in the preceding step).

    For such a cell, place a queen in that cell (by changing the cell value to 1) and recursively call the function itself with the next row number. If a final solution is valid, return True; otherwise, remove the queen piece from the cell (by changing it back to 0).

    If, after we have considered all the cells of the given row, no valid solution is found, return False to indicate an invalid state. The function should also have a conditional at the beginning to check for whether the row number is larger than the board size N, in which case we simply return True to indicate that we have reached a valid final solution:

    def recur_generate_solution(board, row_id):
        # Return if we have reached the last row.
        if row_id >= N:
            return True
        # Iteratively try out cells in the current row.
        for i in range(N):
            if check_next(board, row_id, i):
                board[row_id][i] = 1 
                # Return if a valid solution is found.
                final_board = recur_generate_solution(\
                              board, row_id + 1)
                if final_board:
                    return True
                board[row_id][i] = 0  
        # When the current board has no valid solutions.
        return False
  5. In the same code cell, write a final solver function that wraps around the two functions, check_next() and recur_generate_solution() (in other words, the two functions should be subfunctions of the function we are writing). The function should initialize an empty 2D n x n list (representing the chessboard) and call the recur_generate_solution() function with row number 0.

    The function should also print out the solution at the end:

    # Generate a valid solution.
    def generate_solution():
        # Check if a queen can be placed in the position.
        def check_next(board, row, col):
            [...]
        # Recursively generate a solution.
        def recur_generate_solution(board, row_id):
            [...]
        # Start out with en empty board.
        my_board = [[0 for _ in range(N)] for __ in range(N)]
        final_solution = recur_generate_solution(my_board, 0)
        # Display the final solution.
        if final_solution is False:
            print('A solution cannot be found.')
        else:
            print('A solution was found.')
            display_solution(my_board)
  6. In a different code cell, run the overarching function from the preceding step to generate and print out a solution:
    >>> generate_solution()
    A solution was found.
    1 0 0 0 0 0 0 0 
    0 0 0 0 1 0 0 0 
    0 0 0 0 0 0 0 1 
    0 0 0 0 0 1 0 0 
    0 0 1 0 0 0 0 0 
    0 0 0 0 0 0 1 0 
    0 1 0 0 0 0 0 0 
    0 0 0 1 0 0 0 0 

Throughout this exercise, we have implemented a backtracking algorithm that is designed to search for a valid solution by iteratively making a move toward a potential solution (placing a queen piece in a safe cell), and if the algorithm somehow reaches an invalid state, it will backtrack by undoing its previous move (in our case, by removing the last piece we placed) and looking for a new move to make. As you can probably tell, backtracking is closely related to recursion, and that is why we chose to implement our algorithm using a recursive function, thus consolidating our understanding of the general concept.

Note

To access the source code for this specific section, please refer to https://packt.live/2Bn7nyt.

You can also run this example online at https://packt.live/2ZrKRMQ.

In the next and final section of this chapter, we will consider a number of administrative tasks in Python programming that are often overlooked, namely debugging, testing, and version control.

 

Testing, Debugging, and Version Control

It is important to note that, in programming, the actual task of writing code is not the only element of the process. There are other administrative procedures that play important roles in the pipeline that are often overlooked. In this section, we will discuss each task one by one and consider the process of implementing them in Python, starting with testing.

Testing

In order to make sure that a piece of software that we have written works as we intended and produces correct results, it is necessary to put it through specific tests. In software development, there are numerous types of testing that we can apply to a program: integration testing, regression testing, system testing, and so on. One of the most common is unit testing, which is our topic of discussion in this section.

Unit testing denotes the focus on individual small units of the software, as opposed to the entire program. Unit testing is typically the first step of a testing pipeline—once we are reasonably confident that the individual components of our program are working correctly, we can move on to test how these components work together and see whether they produce the results we want (with integration or system testing).

Unit testing in Python can be easily implemented using the unittest module. Taking an object-oriented approach, unittest allows us to design tests for our programs as Python classes, making the process more modular. Such a class needs to inherit from the TestCase class from unittest, and individual tests are to be implemented in separate functions, as follows:

import unittest
class SampleTest(unittest.TestCase):
    def test_equal(self):
        self.assertEqual(2 ** 3 - 1, 7)
        self.assertEqual('Hello, world!', 'Hello, ' + 'world!')
    
    def test_true(self):
        self.assertTrue(2 ** 3 < 3 ** 2)
        for x in range(10):
            self.assertTrue(- x ** 2 <= 0)

In the SampleTest class, we placed two test cases where we want to check whether two given quantities are equal or not using the assertEqual() method in the test_equal() function. Here, we test whether 23 - 1 is indeed equal to 7, and whether string concatenation in Python is correct.

Similarly, the assertTrue() methods used in the test_true() function test for whether the given parameter is evaluated True or not. Here, we test whether 23 is less than 32, and whether the negative of perfect squares of integers between 0 and 10 are non-positive.

To run the tests we have implemented, we can use the following statement:

>>> unittest.main()
test_equal (__main__.SampleTest) ... ok
test_true (__main__.SampleTest) ... ok
----------------------------------------------------------------------
Ran 2 tests in 0.001s
OK

The produced output tells us that both of our tests returned positive. One important side note to keep in mind is that if you are running a unit test in a Jupyter notebook, the last statement needs to be as follows:

unittest.main(argv=[''], verbosity=2, exit=False)

As a result of the fact that the unit tests are to be implemented as functions in a Python class, the unittest module also offers two convenient methods, setUp() and tearDown(), which are to be run automatically before and after each test, respectively. We will see an example of this in our next exercise. For now, we will move on and talk about debugging.

Debugging

The term debugging literally means the removal of one or many bugs from a given computer program, thus making it work correctly. In most cases, a debugging process follows a failed test where it is determined that there is a bug in our program. Then, to debug the program, we need to identify the source of the error that caused the test to fail and attempt to fix the code related to that error.

There are multiple forms of debugging that a program might employ. These include the following:

  • Print debugging: Arguably one of the most common and elementary methods of debugging, print debugging involves identifying the variables that might play a role in causing the bug, placing print statements for those variables at various places in our program so that we can track the changes in the values of those variables. Once a change in the value of a variable is found to be undesirable or unwanted, we look at where specifically that print statement is in the program and therefore (roughly) identify the location of the bug.
  • Logging: If instead of printing the values of our variables to standard output, we decide to write the output to a log file, this is called logging. Logging is often done to keep track of specific events taking place in the execution of the program we are debugging or simply monitoring.
  • Tracing: To debug a program, we, in this case, will follow the low-level function calls and execution stack of the program when it executes. By looking at the order in which variables and functions are used from that low-level perspective, we can identify the source of the error as well. Tracing can be implemented in Python using the sys.settrace() method from the sys module.

In Python, it is quite easy to employ print debugging, as we simply need to use print statements. For more complex functionalities, we can use a debugger, a module/library that is specifically designed for debugging purposes. The most dominant debugger in Python is the built-in pdb module, which used to be run via the pdb.set_trace() method.

Starting from Python 3.7, we can opt for a simpler syntax by placing calls to the built-in breakpoint() function. At each place where a breakpoint() function is called, the execution of the program will pause and allow us to inspect the behavior and current characteristics of the program, including the values of its variables.

Specifically, once the execution of the program reaches a breakpoint() function, an input prompt will appear, where we can enter a pdb command. There are many commands that you can take advantage of that are included in the documentation of the module. Some notable commands are as follows:

  • h: For help, which prints out the complete list of commands you can use.
  • u/d: For up and down, respectively, which move the running frame count one level in a direction.
  • s: For step, which executes the instruction that the program is currently at and pauses at the first possible place in the execution. This command is very useful in terms of observing the immediate effect of a line of code on the state of the program.
  • n: For next, which executes the instruction that the program is currently at and only pauses at the next instruction in the current function and when the execution is returned. This command works somewhat similarly to s, though it skips through instructions at a much higher rate.
  • r: For return, which continues the execution until the current function returns.
  • c: For continue, which continues the execution until the next breakpoint() statement is reached.
  • ll: For longlist, which prints out the source code for the current instruction.
  • p [expression]: For print, which evaluates and prints out the value of the given expression

Overall, once the execution of a program is paused by a breakpoint() statement, we can utilize a combination of the preceding different commands to inspect the state of the program and identify a potential bug. We'll look at an example of this in the following exercise.

Exercise 1.08: Testing for Concurrency

In this exercise, we will consider a well-known bug in concurrency- or parallelism-related programs called a race condition. This will serve as a nice use case to try out our testing and debugging tools. Since the integration of pdb and other debugging tools is still underdeveloped in Jupyter Notebooks, we will be working with .py scripts in this exercise.

Perform the following steps to complete this exercise:

  1. The setup of our program (which is implemented in the following steps) is as follows. We have a class that implements a counter object that can be manipulated by multiple threads in parallel. The value of an instance of this counter object (stored in its value attribute, initialized to 0) is incremented every time its update() method is called. The counter also has a target that its value should be incremented to. When its run() method is called, multiple threads will be spawned. Each thread will call the update() method, thus incrementing its value attribute a number of times that is equal to the original target. In theory, the final value of the counter should be the same as the target, but we will see that this is not the case due to a race condition. Our goal is to apply pdb to track the changes of variables inside this program to analyze this race condition.
  2. Create a new .py script and enter the following code:
    import threading
    import sys; sys.setswitchinterval(10 ** -10)
    class Counter:
        def __init__(self, target):
            self.value = 0
            self.target = target        
        def update(self):
            current_value = self.value
            # breakpoint()
            self.value = current_value + 1
            
        def run(self):
            threads = [threading.Thread(target=self.update) \
                                        for _ in range(self.target)]
            for t in threads:
                t.start()
            for t in threads:
                t.join()

    This code implements the Counter class that we discussed earlier. Note that there is a line of code that sets the switch interval of our system; we will discuss this later.

  3. With the hope that the value of a counter object should be incremented to its true target, we will test its performance with three different target values. In the same .py script, enter the following code to implement our unit tests:
    import unittest
    class TestCounter(unittest.TestCase):
        def setUp(self):
            self.small_params = 5
            self.med_params = 5000
            self.large_params = 10000
        
        def test_small(self):
            small_counter = Counter(self.small_params)
            small_counter.run()
            self.assertEqual(small_counter.value, \
                             self.small_params)
            
        def test_med(self):
            med_counter = Counter(self.med_params)
            med_counter.run()
            self.assertEqual(med_counter.value, \
                             self.med_params)
            
        def test_large(self):
            large_counter = Counter(self.large_params)
            large_counter.run()
            self.assertEqual(large_counter.value, \
                             self.large_params)
        if __name__ == '__main__':
            unittest.main()

    Here, we can see that in each testing function, we initialize a new counter object, run it, and finally compare its value with the true target. The targets for the test cases are declared in the setUp() method, which, as we mentioned previously, is run before the tests are carried out:

    Run this Python script:test_large (__main__.TestCounter) ... FAIL
    test_med (__main__.TestCounter) ... FAIL
    test_small (__main__.TestCounter) ... ok
    ====================================================================
    FAIL: test_large (__main__.TestCounter)
    --------------------------------------------------------------------
    Traceback (most recent call last):
        File "<ipython-input-57-4ed47b9310ba>", line 22, in test_large
        self.assertEqual(large_counter.value, self.large_params)
    AssertionError: 9996 != 10000
    ====================================================================
    FAIL: test_med (__main__.TestCounter)
    --------------------------------------------------------------------
    Traceback (most recent call last):
        File "<ipython-input-57-4ed47b9310ba>", line 17, in test_med
        self.assertEqual(med_counter.value, self.med_params)
    AssertionError: 4999 != 5000
    --------------------------------------------------------------------
    Ran 3 tests in 0.890s
    FAILED (failures=2)

    As you can see, the program failed at two tests: test_med (where the final value of the counter was only 4,999 instead of 5,000) and test_large (where the value was 9,996 instead of 10,000). It is possible that you might obtain a different output.

  4. Rerun this code cell multiple times to see that the result might vary.
  5. Now that we know there is a bug in our program, we will attempt to debug it. Reimplement our Counter class by placing a breakpoint() statement between the two instructions in the update() method, as shown in the following code, and rerun the code cell:
    class Counter:
        ...
        def update(self):
            current_value = self.value
            breakpoint()
            self.value = current_value + 1
        ...
  6. In the main scope of our Python script, comment out the call to the unit tests. Instead, declare a new counter object and run the script using the Terminal:
    sample_counter = Counter(10)
    sample_counter.run()

    Here, you will see a pdb prompt appear in the Terminal (you might need to press Enter first to make the debugger proceed):

    Figure 1.4: pdb interface

    Figure 1.4: pdb interface

  7. Input ll and hit Enter to see where in the program we are pausing:
    (Pdb) ll
      9         def update(self):
     10             current_value = self.value
     11             breakpoint()
     12  ->         self.value = current_value + 1

    Here, the output indicates that we are currently pausing between the two instructions that increment the value of our counter inside the update() method.

  8. Hit Enter again to return to the pdb prompt and run the p self.value command:
    (Pdb) p self.value
    0

    We can see that the current value of the counter is 0.

  9. Return to the prompt and enter the n command. After this, use the p self.value command again to inspect the value of the counter:
    (Pdb) n
    --Return--
    > <ipython-input-61-066f5069e308>(12)update()->None
    -> self.value = current_value + 1
    (Pdb) p self.value
    1
  10. We can see that the value has been incremented by 1. Repeat this process of alternating between n and p self.value to observe that the value stored in self.value is not updated as we proceed through the program. In other words, the value typically stays at 1. This is how the bug manifests itself in large values of the counter, as we have seen in our unit tests.
  11. Exit the debugger using Ctrl + C.

    Note

    To access the source code for this specific section, please refer to https://packt.live/2YPCZFJ.

    This section does not currently have an online interactive example and will need to be run locally.

For those who are interested, the bug of our program stems from the fact that multiple threads can increment the value of the counter at roughly the same time, overwriting the changes made by one another. With a large number of threads (such as 5,000 or 10,000, which we have in our test cases), the probability of this event taking place becomes higher. This phenomenon, as we mentioned previously, is called a race condition, and it is one of the most common bugs in concurrent and parallel programs.

Aside from demonstrating some pdb commands, this exercise also illustrates the fact that it is necessary to design tests to cover different situations. While the program passed our small test with the target being 5, it failed with larger values of the target. In real life, we should have the tests for a program to simulate a wide range of possibilities, ensuring that the program still works as intended, even in edge cases.

And with that, let's move on to the last topic of this chapter, version control.

Version Control

In this section, we will briefly talk about the general theory behind version control and then discuss the process of implementing it with Git and GitHub, arguably the most popular version control systems in the industry. Version control is to a programming project what backing up data is to regular files. In essence, version control systems allow us to save our progress in a project separately from our local files so that we can come back to it later on, even if the local files are lost or damaged.

With the functionalities that current version control systems such as Git and GitHub provide, we can also do a lot more. For example, the branching and merging features from these systems offer their users a way to create multiple versions of a common project so that different directions can be explored; the branch that implements the most preferred direction will then be merged with the main branch in the end. Additionally, Git and GitHub allow work between users on their platform to be seamless, which is greatly appreciated in team projects.

To understand the available features that we can take advantage of with Git and GitHub, let's go through the following exercise.

Exercise 1.09: Version Control with Git and GitHub

This exercise will walk us through all of the steps necessary to get started with Git and GitHub. If you do not have any experience working with version control yet, this exercise will be beneficial to you.

Perform the following steps to complete this exercise:

  1. First, if you haven't already, register for a GitHub account by going to https://www.github.com/ and sign up. This will allow you to host the files that you want to version control on their cloud storage.
  2. Go to https://git-scm.com/downloads and download the Git client software for your system and install it. This Git client will be responsible for communicating with the GitHub server. You know if your Git client is successfully installed if you can run the git command in your Terminal:
    $ git
    usage: git [--version] [--help] [-C <path>] [-c <name>=<value>]
               [--exec-path[=<path>]] [--html-path] [--man-path] [--info-path]
               [-p | --paginate | -P | --no-pager] [--no-replace-objects] [--bare]
               [--git-dir=<path>] [--work-tree=<path>] [--namespace=<name>]
               <command> [<args>]

    Otherwise, your system might need to be rebooted for the installation to take full effect.

  3. Now, let's start the process of applying version control to a sample project. First, create a dummy folder and generate a Jupyter notebook and a text file named input.txt with the following content in it:
    1,1,1
    1,1,1
  4. In the first cell of the Jupyter notebook, write a function called add_elements() that takes in two lists of numbers and adds them up element-wise. The function should return a list that consists of the element-wise sums; you can assume that the two parameter lists are of the same length:
    def add_elements(a, b):
        result = []
        for item_a, item_b in zip(a, b):
            result.append(item_a + item_b)
        return result
  5. In the next code cell, read in the input.txt file using a with statement and extract the last two lines of the file using the readlines() function and list indexing:
    with open('input.txt', 'r') as f:
        lines = f.readlines()
    last_line1, last_line2 = lines[-2], lines[-1]

    Note that in the open() function, the second argument, 'r', specifies that we are reading in the file, as opposed to writing to the file.

  6. In a new code cell, convert these two strings of text input into lists of numbers, first using the str.split() function with the ',' argument to isolate the individual numbers in each line, and then the map() and int() functions to apply the conversion to integers element-wise:
    list1 = list(map(int, last_line1[: -1].split(',')))
    list2 = list(map(int, last_line2[: -1].split(',')))
  7. In a new code cell, call add_elements() on list1 and list2. Write the returned list to the same input file in a new line in the same comma-separated values (CSV) format:
    new_list = add_elements(list1, list2)
    with open('input.txt', 'a') as f:
        for i, item in enumerate(new_list):
            f.write(str(item))
            
            if i < len(new_list) - 1:
                f.write(',')
            else:
                f.write('\n')

    Here the 'a' argument specifies that we are writing to append a new line to the file, as opposed to reading and overwriting the file completely.

  8. Run the code cell and verify that the text file has been updated to the following:
    1,1,1
    1,1,1
    2,2,2
  9. This is the current setup of our sample project so far: we have a text file and a Python script inside a folder; the script can alter the content of the text file when run. This setup is fairly common in real-life situations: you can have a data file that contains some information that you'd like to keep track of and a Python program that can read in that data and update it in some way (maybe through prespecified computation or adding new data that was collected externally).

    Now, let's implement version control in this sample project.

  10. Go to your online GitHub account, click on the plus sign icon (+) in the top-right corner of the window, and choose the New repository option, as illustrated here:
    Figure 1.5: Creating a new repository

    Figure 1.5: Creating a new repository

    Input a sample name for your new repository in the form and finalize the creation process. Copy the URL to this new repository to your clipboard as we will need it later.

    This, as the name suggests, will create a new online repository that will host the code we want to version control.

  11. On your local computer, open your Terminal and navigate to the folder. Run the following command to initialize a local Git repository, which will be associated with our folder:
    $ git init
  12. Still in the Terminal, run the following command to add everything in our project to Git and commit them:
    git add .
    git commit -m [any message with double quotes]

    Instead of git add ., you can replace . with the names of the files that you want to register with Git. This option is helpful when you only want to register a file or two, as opposed to every file you have in a folder.

  13. Now, we need to link our local repository and the online repository that we have created. To do that, run the following command:
    git remote add origin [URL to GitHub repository]

    Note that "origin" is simply a conventional nickname for the URL.

  14. Finally, upload the locally registered files to the online repository by running the following command:
    git push origin master
  15. Go to the website for the online repository to verify that the local files we created have indeed been uploaded to GitHub.
  16. On your local computer, run the script included in the Jupyter notebook and change the text file.
  17. Now, we would like to commit this change to the GitHub repository. In your Terminal, run the following commands again:
    git add .
    git commit
    git push origin master
  18. Go to or refresh the GitHub website to verify that the change we made the second time has also been made on GitHub.

With this exercise, we have walked through a sample version control pipeline and seen some examples of how Git and GitHub can be used in this respect. We also saw a refresher on the process of reading and writing to files in Python using the with statement.

Note

To access the source code for this specific section, please refer to https://packt.live/2VDS0IS.

You can also run this example online at https://packt.live/3ijJ1pM.

This also concludes the last topic of the first chapter of this book. In the next section, we have provided an activity that will serve as a hands-on project that encapsulates the important topics and discussions we have gone through in this chapter.

Activity 1.01: Building a Sudoku Solver

Let's test what we have learned so far with a more complex problem: writing a program that can solve Sudoku puzzles. The program should be able to read in a CSV text file as input (which contains the initial puzzle) and output the complete solution to that puzzle.

This activity serves as a warmup consisting of multiple procedures that are common in scientific computing and data science projects, such as reading in data from external files and manipulating that information via an algorithm.

  1. Use the sudoku_input_2.txt file from the GitHub repository of this chapter as the input file for our program by copying it to the same location as the Jupyter notebook you will be creating in the next step (or create your own input file in the same format where empty cells are represented with zeros).
  2. In the first code cell of a new Jupyter notebook, create a Solver class that takes in the path to an input file. It should store the information read from the input file in a 9 x 9 2D list (a list of nine sublists, each of which contains the nine values of individual rows in the puzzle).
  3. Add a helper method that prints out the puzzle in a nice format, as follows:
    -----------------------
    0 0 3 | 0 2 0 | 6 0 0 | 
    9 0 0 | 3 0 5 | 0 0 1 | 
    0 0 1 | 8 0 6 | 4 0 0 | 
    -----------------------
    0 0 8 | 1 0 2 | 9 0 0 | 
    7 0 0 | 0 0 0 | 0 0 8 | 
    0 0 6 | 7 0 8 | 2 0 0 | 
    -----------------------
    0 0 2 | 6 0 9 | 5 0 0 | 
    8 0 0 | 2 0 3 | 0 0 9 | 
    0 0 5 | 0 1 0 | 3 0 0 | 
    -----------------------
  4. Create a get_presence(cells) method in the class that takes in any 9 x 9 2D list, representing an unsolved/half-solved puzzle, and returns a sort of indicator regarding whether a given number (between 1 and 9) is present in a given row, column, or quadrant.

    For instance, in the preceding example, the returned value of this method should be able to tell you that 2, 3, and 6 are present in the first row, while no number is present in the second column.

  5. Create a get_possible_values(cells) method in the class that also takes in any 2D list representing an incomplete solution and returns a dictionary, whose keys are the locations of currently empty cells and the corresponding values are the lists/sets of possible values that those cells can take.

    These lists of possible values should be generated by taking into account whether a number is present in the same row, column, or quadrant as a given empty cell.

  6. Create a simple_update(cells) method in the class that takes in any 2D incomplete solution list and calls the get_possible_values() method on that list. From the returned value, if there is an empty cell that holds only one possible solution, update that cell with that value.

    If such an update does happen, the method should call itself again to keep updating the cells. This is because after an update, the list of possible values for the remaining empty cells might change. The method should return the updated 2D list in the end.

  7. Create a recur_solve(cells) method in the class that takes in any 2D incomplete solution list and performs backtracking. First, this method should call simple_update() and return whether or not the puzzle is completely solved (that is, whether or not there are empty cells in the 2D list).

    Next, consider the possible values of the remaining empty cells. If there are empty cells remaining and you have no possible values, return a negative result to indicate that we have reached an invalid solution.

    On the other hand, if all cells have at least two possible values, find the cell that has the fewest number of possible values. Loop through these possible values, sequentially fill them in the empty cell, and call recur_solve() inside itself with the updated cells to implement the recursive nature of the algorithm. At each iteration, return whether the final solution is valid. If no valid final solution is found via any of the possible values, return a negative result.

  8. Wrap the preceding methods in a solve() method, which should print out the initial puzzle, pass it to the recur_solve() method, and print out the returned solution from that method.

    For example, with the preceding puzzle, a Solver instance, when solve() is called, will print out the following output.

    Initial puzzle:

    -----------------------
    0 0 3 | 0 2 0 | 6 0 0 | 
    9 0 0 | 3 0 5 | 0 0 1 | 
    0 0 1 | 8 0 6 | 4 0 0 | 
    -----------------------
    0 0 8 | 1 0 2 | 9 0 0 | 
    7 0 0 | 0 0 0 | 0 0 8 | 
    0 0 6 | 7 0 8 | 2 0 0 | 
    -----------------------
    0 0 2 | 6 0 9 | 5 0 0 | 
    8 0 0 | 2 0 3 | 0 0 9 | 
    0 0 5 | 0 1 0 | 3 0 0 | 
    -----------------------

    Solved puzzle:

    -----------------------
    4 8 3 | 9 2 1 | 6 5 7 | 
    9 6 7 | 3 4 5 | 8 2 1 | 
    2 5 1 | 8 7 6 | 4 9 3 | 
    -----------------------
    5 4 8 | 1 3 2 | 9 7 6 | 
    7 2 9 | 5 6 4 | 1 3 8 | 
    1 3 6 | 7 9 8 | 2 4 5 | 
    -----------------------
    3 7 2 | 6 8 9 | 5 1 4 | 
    8 1 4 | 2 5 3 | 7 6 9 | 
    6 9 5 | 4 1 7 | 3 8 2 | 
    -----------------------

    Extensions

    1. Go to the Project Euler website, https://projecteuler.net/problem=96, to test out your algorithm against the included puzzles.

    2. Write a program that generates Sudoku puzzles and includes unit tests that check whether the solutions generated by our solver are correct.

    Note

    The solution for this activity can be found on page 648.

 

Summary

This chapter introduced the most fundamental building blocks of Python programming: control flow, data structures, algorithm design, and various house-keeping tasks (debugging, testing, and version control). The knowledge that we have gained in this chapter will prepare us for discussions in future chapters, where we'll learn about other more complex and specialized tools in Python. In particular, in the next chapter, we will talk about the main tools and libraries that Python offers in the fields of statistics, scientific computing, and data science.

PGM59

MAF28

About the Authors

  • Peter Farrell

    Peter Farrell learned to program from the Logo code in Seymour Papert’s Mindstorms. A student introduced him to Python and he never looked back. In 2015, he self-published Hacking Math Class with Python on applying Python programming to learning and teaching high-school math. In 2019, No Starch Press published his second book, Math Adventures with Python. In his books, he also presents 21st-century topics, such as Cellular Automata, 3D Graphics, and Genetic Algorithms. Currently, he teaches Python and Math in the Dallas, Texas area.

    Browse publications by this author
  • Alvaro Fuentes

    Alvaro Fuentes is a senior data scientist with a background in applied mathematics and economics. He has more than 14 years of experience in various analytical roles and is an analytics consultant at one of the ‘Big Three’ global management consulting firms, leading advanced analytics projects in different industries like banking, technology, and consumer goods. Alvaro is also an author and trainer in analytics and data science and has published courses and books, such as 'Become a Python Data Analyst' and 'Hands-On Predictive Analytics with Python'. He has also taught data science and related topics to thousands of students both on-site and online through different platforms such as Springboard, Simplilearn, Udemy, and BSG Institute, among others.

    Browse publications by this author
  • Ajinkya Sudhir Kolhe

    Ajinkya Sudhir Kolhe is a programmer working for a tech company in the Bay area. He holds a M.S. in Computer Science and has experience in the tech industry of 5+ years. His area of interests include problem solving, analytics and applications in Python.

    Browse publications by this author
  • Quan Nguyen

    Quan Nguyen is a programmer with a special interest in scientific computing, data analysis, and artificial intelligence. Before publishing his first book with Packt, he was a primary contributor to the book Python for Scientists and Engineers and various open-source projects on GitHub. He is also a writer for the Python Software Foundation and Oracle's AI and Data Science blog.

    Browse publications by this author
  • Alexander Joseph Sarver

    Alexander Joseph Sarver is an ambitious data scientist and content creator with 6 years of mathematical teaching experience.

    Browse publications by this author
  • Marios Tsatsos

    Marios Tsatsos has 8+ years of experience in research in Physics, analytical thinking, modeling, problem solving and decision making.

    Browse publications by this author

Recommended For You

Hands-On Genetic Algorithms with Python

Explore the ever-growing world of genetic algorithms to solve search, optimization, and AI-related tasks, and improve machine learning models using Python libraries such as DEAP, scikit-learn, and NumPy

By Eyal Wirsansky
The Python Workshop

Cut through the noise and get real results with a step-by-step approach to learning Python 3.X programming

By Andrew Bird and 4 more
Artificial Intelligence with Python - Second Edition

New edition of the bestselling guide to artificial intelligence with Python, updated to Python 3.x, with seven new chapters that cover RNNs, AI and Big Data, fundamental use cases, chatbots, and more.

By Alberto Artasanchez and 1 more
Hands-On Mathematics for Deep Learning

A comprehensive guide to getting well-versed with the mathematical techniques for building modern deep learning architectures

By Jay Dawani