You're reading from Functional Python Programming, 3rd edition - Third Edition

Product type Book

Published in Dec 2022

Publisher Packt

ISBN-13 9781803232577

Pages 576 pages

Edition 3rd Edition

Languages

Concepts

Functional Programming

Author (1):

Steven F. Lott

Table of Contents (18) Chapters

Preface

1. Chapter 1: Understanding Functional Programming

2. Chapter 2: Introducing Essential Functional Concepts

3. Chapter 3: Functions, Iterators, and Generators

4. Chapter 4: Working with Collections

5. Chapter 5: Higher-Order Functions

6. Chapter 6: Recursions and Reductions

7. Chapter 7: Complex Stateless Objects

8. Chapter 8: The Itertools Module

9. Chapter 9: Itertools for Combinatorics – Permutations and Combinations

10. Chapter 10: The Functools Module

11. Chapter 11: The Toolz Package

12. Chapter 12: Decorator Design Techniques

13. Chapter 13: The PyMonad Library

14. Chapter 14: The Multiprocessing, Threading, and Concurrent.Futures Modules

15. Chapter 15: A Functional Approach to Web Services

16. Other Books You Might Enjoy

17. Index

1.6 Exercises

The exercises in this book are based on code available from Packt Publishing on GitHub. See https://github.com/PacktPublishing/Functional-Python-Programming-3rd-Edition.

In some cases, the reader will notice that the code provided on GitHub includes partial solutions to some of the exercises. These serve as hints, allowing the reader to explore alternative solutions.

In many cases, exercises will need unit test cases to confirm they actually solve the problem. These are often identical to the unit test cases already provided in the GitHub repository. The reader will need to replace the book’s example function name with their own solution to confirm that it works.

1.6.1 Convert an imperative algorithm to functional code

The following algorithm is stated as imperative assignment statements and a while construct to indicate processing something iteratively.

What does this appear to compute? Given Python built-in functions like sum, can this be simplified?

It helps to write this in Python and refactor the code to be sure that correct answers are created.

A test case is the following:

V ← {7.46,6.77,12.74,7.11,7.81,8.84,6.08,5.39,8.15,6.42,5.73}

The computed value for m is approximately 7.5.

1.6.2 Convert step-wise computation to functional code

The following algorithm is stated as a long series of single assignment statements. The rad(x) function converts degrees to radians, rad(d) = π × 1d80 . See the math module for an implementation.

Is this code easy to understand? Can you summarize this computation as a short mathematical-looking formula?

Breaking it down into sections, lines 1 to 8 seem to be focused on some conversions, differences, and mid-point computations. Lines 9 to 12 compute two values, x and y. Can these be summarized or simplified? The final four lines do a relatively direct computation of d. Can this be summarized or simplified? As a hint, look at math.hypot() for a function that might be applicable in this case.

It helps to write this in Python and refactor the code.

A test case is the following:

  lat₁ ← 32.82950
  lon₁ ←−79.93021
  lat₂ ← 32.74412
  lon₂ ←−79.85226

The computed value for d is approximately 6.4577.

Refactoring the code can help to confirm your understanding.

1.6.3 Revise the sqrt() function

The sqrt() function defined in the A classic example of functional programming section has only a single parameter value, n. Rewrite this to create a more advanced version using default parameter values to make changes possible. An expression such as sqrt(1.0, 0.000_01, 3) will start with an approximation of 1.0 and compute the value to a precision of 0.00001. The final parameter value, 3, is the value of n, the number we need to compute the square root of.

1.6.4 Data cleansing steps

A file of source data has US ZIP codes in a variety of formats. This problem often arises when spreadsheet software is used to collect or transform data.

Some ZIP codes were processed as numbers. This doesn’t work out well for places in New England, where ZIP codes have a leading zero. For example, one of Portsmouth, New Hampshire’s codes should be stated as 03801. In the source file, it is 3801. For the most part, these numbers will have five or nine digits, but some codes in New England will be four or eight digits when a single leading zero was dropped. For Puerto Rico, there may be two leading zeroes.
Some ZIP codes are stored as strings, 12345−0100, where a four-digit extension for a post-office box has been appended to the base five-digit code.

A CSV-format file has only text values. However, when data in the file has been processed by a spreadsheet, problems can arise. Because a ZIP code has only digits, it can be treated as numeric data. This means the original data values will have been converted to a number, and then back to a text representation. These conversions will drop the leading zeroes. There are a number of workarounds in various spreadsheet applications to prevent this problem. If they’re not used, the data can have anomalous values that can be cleansed to restore the original representation.

The objective of the exercise is to compute a histogram of the most popular ZIP codes in the source data file. The data must be cleansed to have the following two ZIP formats:

Five characters with no post-office box, for example 03801
Ten characters with a hyphen, for example 03899-9876

The essential histogram can be done with a collections.Counter object as follows.

from collections import Counter 
import csv 
from pathlib import Path 
 
DEFAULT_PATH = Path.cwd() / "address.csv" 
 
def main(source_path: Path = DEFAULT_PATH) -> None: 
    frequency: Counter[str] = Counter() 
    with source_path.open() as source: 
        rdr = csv.DictReader(source) 
        for row in rdr: 
            if "-" in row[’ZIP’]: 
                text_zip = row[’ZIP’] 
                missing_zeroes = 10 - len(text_zip) 
                if missing_zeroes: 
                    text_zip = missing_zeroes*’0’ + text_zip 
            else: 
                text_zip = row[’ZIP’] 
                if 5 < len(row[’ZIP’]) < 9: 
                    missing_zeroes = 9 - len(text_zip) 
                else: 
                    missing_zeroes = 5 - len(text_zip) 
                if missing_zeroes: 
                    text_zip = missing_zeroes*’0’ + text_zip 
            frequency[text_zip] += 1 
    print(frequency) 
 
if __name__ == "__main__": 
    main()

This makes use of imperative processing features to read a file. The overall design, using a for statement to process rows of a file, is an essential Pythonic feature that we can preserve.

On the other hand, the processing of the text_zip and missing_zeroes variables through a number of state changes seems like it’s a potential source for confusion.

This can be refactored through several rewrites:

Decompose the main() function into two parts. A new zip_histogram() function should be written to contain much of the processing detail. This function will process the opened file, and return a Counter object. A suggested signature is the following:
```
    def zip_histogram( 
            reader: csv.DictReader[str]) -> Counter[str]: 
        pass
```
The main() function is left with the responsibility to open the file, create the csv.DictReader instance, evaluate zip_histogram(), and print the histogram.
Once the zip_histogram() function has been defined, the cleansing of the ZIP attribute can be refactored into a separate function, with a name like zip_cleanse(). Rather than setting the value of the text_zip variable, this function can return the cleansed result. This can be tested separately to be sure the various cases are handled gracefully.
The distinction between long ZIP codes with a hyphen and without a hyphen is something that should be fixed. Once the zip_cleanse() works in general, add a new function to inject hyphens into ZIP codes with only digits. This should transform 38011234 to 03801-1234. Note that short, five-digit ZIP codes do not need to have a hyphen added; this additional transformation only applies to nine-digit codes to make them into ten-position strings.

The final zip_histogram() function should look something like the following:

def zip_histogram( 
        reader: csv.DictReader[str]) -> Counter[str]: 
    return Counter( 
        zip_cleanse( 
            row[’ZIP’] 
        ) for row in reader 
    )

This provides a framework for performing a focused data cleanup in the given column. It allows us to distinguish between CSV and file processing features, and the details of how to clean up a specific column of data.

1.6.5 (Advanced) Optimize this functional code

The following algorithm is stated as a single ”step” that has been decomposed into three separate formulae. The decomposition is more a concession to the need to fit the expression into the limits of a printed page than a useful optimization. The rad(x) function converts degrees to radians, rad(d) = π × -d- 180 .

There are a number of redundant expressions, like rad(lat₁) and rad(lat₂). If these are assigned to local variables, can the expression be simplified?

The final computation of d does not match the conventional understanding of computing a hypotenuse, ∘ ------- x2 + y2 . Should the code be refactored to match the definition in math.hypot?