# 1

Understanding Functional Programming

Functional programming defines a computation using expressions and evaluation; often, they are encapsulated in function definitions. It de-emphasizes or avoids the complexity of state change and mutable objects. This tends to create programs that are more succinct and expressive. In this chapter, we’ll introduce some of the techniques that characterize functional programming. We’ll identify some of the ways to map these features to Python. Finally, we’ll also address some ways in which the benefits of functional programming accrue when we use these design patterns to build Python applications.

This book doesn’t contain a tutorial introduction to the Python language. We assume the reader knows some Python. In many cases, if the reader knows a functional programming language, then that knowledge can be applied to Python via the examples in this book. For background information on Python, see Python in a Nutshell, 4th Edition, or any of the Python introductions from Packt Publishing.

Python has a broad variety of programming features, including numerous ways to support functional programming. As we will see throughout this book, Python is not a purely functional programming language; instead, it relies on a mixture of features. We’ll see that the language offers enough of the right kinds of features to provide the benefits of functional programming. It also retains all the optimization power of an imperative programming language. Further, we can mix the object-oriented and functional features to make use of the best aspects of both paradigms.

We’ll also look at a problem domain that we’ll use for many of the examples in this book. We’ll try to stick closely to Exploratory Data Analysis (EDA). For more information, see https://www.itl.nist.gov/div898/handbook/eda/eda.htm. The idea of ”exploratory” means doing data collection followed by analysis, with a goal of inferring what model would be appropriate to describe the data. This is a helpful domain because many of the algorithms are good examples of functional programming. Furthermore, the benefits of functional programming accrue rapidly when exploring data to locate trends and relationships.

Our goal is to establish some essential principles of functional programming. The more serious Python code will begin in Chapter 2, Introducing Essential Functional Concepts.

In this chapter, we’ll focus on the following topics:

Comparing and contrasting the functional paradigm with other ways of designing software. We’ll look at how Python’s approach can be called a ”hybrid” between functional programming and object-oriented programming.

We’ll look in depth at a specific example extracted from the functional programming literature.

We’ll conclude with an overview of EDA and why this discipline seems to provide numerous examples of functional programming.

We’ll focus on Python 3.10 features in this book. This includes the new `match`

statement.

Throughout this book, we’ll include Python 3 type hints in the examples. Type hints can help a reader visualize the essential purpose behind a function definition. Type hints are analyzed with the mypy tool. As with unit testing, mypy can be part of a tool chain to produce high-quality software.

# 1.1 The functional style of programming

We’ll define functional programming through a series of examples. The distinguishing feature between these examples is the concept of state, specifically the state of the computation.

Python’s strong imperative traits mean that the state of a computation is defined by the values of the variables in the various namespaces. Some kinds of statements make a well-defined change to the state by adding, changing, or removing a variable. We call this imperative because specific kinds of statements change the state.

In Python, the assignment statement is the primary way to change the state. Python has other statements, such as `global`

or `nonlocal`

, which modify the rules for variables in a particular namespace. Statements such as `def`

, `class`

, and `import`

change the processing context. The bulk of the remaining statements provide ways to choose which assignment statements get executed. The focus of all these various statement types, however, is on changing the state of the variables.

In a functional language, we replace the state—the changing values of variables—with a simpler notion of evaluating functions. Each function evaluation creates a new object or objects from existing objects. Since a functional program is a composition of functions, we can design lower-level functions that are easy to understand, and then create compositions of functions that can also be easier to visualize than a complex sequence of statements.

Function evaluation more closely parallels mathematical formalisms. Because of this, we can often use simple algebra to design an algorithm that clearly handles the edge cases and boundary conditions. This makes us more confident that the functions work. It also makes it easy to locate test cases for formal unit testing.

It’s important to note that functional programs tend to be relatively succinct, expressive, and efficient compared to imperative (object-oriented or procedural) programs. The benefit isn’t automatic; it requires careful design. This design effort for functional programming is often smaller than for procedural programming. Some developers experienced in imperative and object-oriented styles may find it a challenge to shift their focus from stateful designs to functional designs.

# 1.2 Comparing and contrasting procedural and functional styles

We’ll use a tiny example program to illustrate a non-functional, or procedural, style of programming. This example computes a sum of a sequence of numbers. Each of the numbers has a specific property that makes it part of the sequence.

```
def sum_numeric(limit: int = 10) -> int:
s = 0
for n in range(1, limit):
if n % 3 == 0 or n % 5 == 0:
s += n
return s
```

The sum computed by this function includes only numbers that are multiples of 3 or 5. We’ve made this program strictly procedural, avoiding any explicit use of Python’s object features. The function’s state is defined by the values of the variables `s`

and `n`

. The variable `n`

takes on values such that 1 ≤ n < 10. As the iteration involves an ordered exploration of values for the `n`

variable, we can prove that it will terminate when the value of `n`

is equal to the value of `limit`

.

There are two explicit assignment statements, both setting values for the `s`

variable. These state changes are visible. The value of `n`

is set implicitly by the `for`

statement. The state change in the `s`

variable is an essential element of the state of the computation.

Now let’s look at this again from a purely functional perspective. Then, we’ll examine a more Pythonic perspective that retains the essence of a functional approach while leveraging a number of Python’s features.

## 1.2.1 Using the functional paradigm

In a functional sense, the sum of the multiples of 3 and 5 can be decomposed into two parts:

The sum of a sequence of numbers

A sequence of values that pass a simple test condition, for example, being multiples of 3 and 5

To be super formal, we can define the sum as a function using simpler language components. The sum of a sequence has a recursive definition:

```
from collections.abc import Sequence
def sumr(seq : Sequence[int]) -> int:
if len(seq) == 0:
return 0
return seq[0] + sumr(seq[1:])
```

We’ve defined the sum in two cases. The base case states that the sum of a zero-length sequence is 0. The recursive case states that the sum of a sequence is the first value plus the sum of the rest of the sequence. Since the recursive definition depends on a shorter sequence, we can be sure that it will (eventually) devolve to the base case.

Here are some examples of how this function works:

```
>>> sumr([7, 11])
18
>>> sumr([11])
11
>>> sumr([])
0
```

The first example computes the sum of a list with multiple items. The second example shows how the recursion rule works by adding the first item, `seq[0]`

, to the sum of the remaining items, `sumr(seq[1:])`

. Eventually, the computation of the result involves the sum of an empty list, which is defined as 0.

The `+`

operator on the last line of the `sumr`

function and the initial value of 0 in the base case characterize the equation as a sum. Consider what would happen if we changed the operator to `*`

and the initial value to 1: this new expression would compute a product. We’ll return to this simple idea of generalization in the following chapters.

Similarly, generating a sequence of values with a given property can have a recursive definition, as follows:

```
from collections.abc import Sequence, Callable
def until(
limit: int,
filter_func: Callable[[int], bool],
v: int
) -> list[int]:
if v == limit:
return []
elif filter_func(v):
return [v] + until(limit, filter_func, v + 1)
else:
return until(limit, filter_func, v + 1)
```

In this function, we’ve compared a given value, `v`

, against the upper bound, `limit`

. If `v`

has reached the upper bound, the resulting list must be empty. This is the base case for the given recursion.

There are two more cases defined by an externally defined `filter_func()`

function. The value of `v`

is passed by the `filter_func()`

function; if this returns a very small list, containing one element, this can be concatenated with any remaining values computed by the `until()`

function.

If the value of `v`

is rejected by the `filter_func()`

function, this value is ignored and the result is simply defined by any remaining values computed by the `until()`

function.

We can see that the value of `v`

will increase from an initial value until it reaches `limit`

, assuring us that we’ll reach the base case.

Before we can see how to use the `until()`

function, we’ll define a small function to filter values that are multiples of 3 or 5:

```
def mult_3_5(x: int) -> bool:
return x % 3 == 0 or x % 5 == 0
```

We could also have defined this as a lambda object to emphasize succinct definitions of simple functions. Anything more complex than a one-line expression requires the `def`

statement.

This function can be combined with the `until()`

function to generate a sequence of values, which are multiples of 3 and 5. Here’s an example:

```
>>> until(10, mult_3_5, 0)
[0, 3, 5, 6, 9]
```

Looking back at the decomposition at the top of this section, we now have a way to compute sums and a way to compute the sequence of values.

We can combine the `sumr()`

and `until()`

functions to compute a sum of values. Here’s the resulting code:

```
def sum_functional(limit: int = 10) -> int:
return sumr(until(limit, mult_3_5, 0))
```

This small application to compute a sum doesn’t make use of the assignment statement to set the values of variables. It is a purely functional, recursive definition that matches the mathematical abstractions, making it easier to reason about. We can be confident each piece works separately, giving confidence in the whole.

As a practical matter, we’ll use a number of Python features to simplify creating functional programs. We’ll take a look at a number of these optimizations in the next version of this example.

## 1.2.2 Using a functional hybrid

We’ll continue this example with a mostly functional version of the previous example to compute the sum of multiples of 3 and 5. Our hybrid functional version might look like the following:

```
def sum_hybrid(limit: int = 10) -> int:
return sum(
n for n in range(1, limit)
if n % 3 == 0 or n % 5 == 0
)
```

We’ve used a generator expression to iterate through a collection of values and compute the sum of these values. The `range(1,`

` 10)`

object is an iterable; it generates a sequence of values {n∣1 ≤ n < 10}, often summarized as “values of n such that 1 is less than or equal to n and n is less than 10.” The more complex expression `n`

` for`

` n`

` in`

` range(1,`

` 10)`

` if`

` n`

` %`

` 3`

` ==`

` 0`

` or`

` n`

`%`

` 5`

` ==`

` 0`

is also a generator. It produces a set of values, {n∣1 ≤ n < 10 ∧ (n ≡ 0 mod 3 ∨n ≡ 0 mod 5)}; something we can describe as “values of n such that 1 is less than or equal to n and n is less than 10 and n is equivalent to 0 modulo 3 or n is equivalent to 0 modulo 5.” These are multiples of 3 and 5 taken from the set of values between 1 and 10. The variable `n`

is bound, in turn, to each of the values provided by the `range`

object. The `sum()`

function consumes the iterable values, creating a final object, 23.

The bound variable, `n`

, doesn’t exist outside the generator expression. The variable `n`

isn’t visible elsewhere in the program.

The variable `n`

in this example isn’t directly comparable to the variable `n`

in the first two imperative examples. A `for`

statement (outside a generator expression) creates a proper variable in the local namespace. The generator expression does not create a variable in the same way that a `for`

statement does:

```
>>> sum(
... n for n in range(1, 10)
... if n % 3 == 0 or n % 5 == 0
... )
23
>>> n
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name ’n’ is not defined
```

The generator expression doesn’t pollute the namespace with variables, like `n`

, which aren’t relevant outside the very narrow context of the expression. This is a pleasant feature that ensures we won’t be confused by the values of variables that don’t have a meaning outside a single expression.

## 1.2.3 The stack of turtles

When we use Python for functional programming, we embark down a path that will involve a hybrid that’s not strictly functional. Python is not Haskell, OCaml, or Erlang. For that matter, our underlying processor hardware is not functional; it’s not even strictly object-oriented, as CPUs are generally procedural.

All programming languages rest on abstractions, libraries, frameworks and virtual machines. These abstractions, in turn, may rely on other abstractions, libraries, frameworks and virtual machines. The most apt metaphor is this: the world is carried on the back of a giant turtle. The turtle stands on the back of another giant turtle. And that turtle, in turn, is standing on the back of yet another turtle.

It’s turtles all the way down.

— Anonymous

There’s no practical end to the layers of abstractions. Even something as concrete as circuits and electronics may be an abstraction to help designers summarize the details of quantum electrodynamics.

More importantly, the presence of abstractions and virtual machines doesn’t materially change our approach to designing software to exploit the functional programming features of Python.

Even within the functional programming community, there are both purer and less pure functional programming languages. Some languages make extensive use of monads to handle stateful things such as file system input and output. Other languages rely on a hybridized environment that’s similar to the way we use Python. In Python, software can be generally functional, with carefully chosen procedural exceptions.

Our functional Python programs will rely on the following three stacks of abstractions:

Our applications will be functions—all the way down—until we hit the objects;

The underlying Python runtime environment that supports our functional programming is objects—all the way down—until we hit the libraries;

The libraries that support Python are a turtle on which Python stands.

The operating system and hardware form their own stack of turtles. These details aren’t relevant to the problems we’re going to solve.

# 1.3 A classic example of functional programming

As part of our introduction, we’ll look at a classic example of functional programming. This is based on the paper Why Functional Programming Matters by John Hughes. The article appeared in a paper called Research Topics in Functional Programming, edited by D. Turner, published by Addison-Wesley in 1990.

Here’s a link to one of the papers in Research Topics in Functional Programming, “Why Functional Programming Matters”: http://www.cs.kent.ac.uk/people/staff/dat/miranda/whyfp90.pdf

This paper is a profound discussion of functional programming. There are several examples given. We’ll look at just one: the Newton-Raphson algorithm for locating any roots of a function. In this case, we’ll define a function that will compute a square root of a number.

It’s important because many versions of this algorithm rely on the explicit state managed via loops. Indeed, the Hughes paper provides a snippet of the Fortran code that emphasizes stateful, imperative processing.

The backbone of this approximation is the calculation of the next approximation from the current approximation. The `next_()`

function takes `x`

, an approximation to the `sqrt(n)`

value, and calculates a next value that brackets the proper root. Take a look at the following example:

```
def next_(n: float, x: float) -> float:
return (x + n / x) / 2
```

This function computes a series of values that will quickly converge on some value x such that x = , which means x = .

Note that the name `next()`

would collide with a built-in function. Calling it `next_()`

lets us follow the original presentation as closely as possible, using Pythonic names.

Here’s how the function looks when used in Python’s interactive REPL:

```
>>> n = 2
>>> f = lambda x: next_(n, x)
>>> a0 = 1.0
>>> [round(x, 4)
... for x in (a0, f(a0), f(f(a0)), f(f(f(a0))),)
... ]
[1.0, 1.5, 1.4167, 1.4142]
```

We defined the `f()`

function as a lambda that will converge on where n = 2. We started with 1.0 as the initial value for a_{0}. Then we evaluated a sequence of recursive evaluations: a_{1} = f(a_{0}), a_{2} = f(f(a_{0})), and so on. We evaluated these functions using a generator expression so that we could round each value to four decimal places. This makes the output easier to read and easier to use with `doctest`

. The sequence appears to converge rapidly on . To get a more precise answer, we must continue to perform the series of steps after the first four shown above.

We can write a function that will (in principle) generate an infinite sequence of a_{i} values. This series will converge on the proper square root:

```
from collections.abc import Iterator, Callable
def repeat(
f: Callable[[float], float],
a: float
) -> Iterator[float]:
yield a
yield from repeat(f, f(a))
```

This function will generate a sequence of approximations using a function, `f()`

, and an initial value, `a`

. If we provide the `next_()`

function defined earlier, we’ll get a sequence of approximations to the square root of the `n`

argument.

The `repeat()`

function expects the `f()`

function to have a single argument; however, our `next_()`

function has two arguments. We’ve used a lambda object, `lambda`

` x:`

` next_(n,`

` x)`

, to create a partial version of the `next_()`

function with one of two variables bound.

The Python generator functions can’t be trivially recursive; they must explicitly iterate over the recursive results, yielding them individually.

Attempting to use a simple `return`

` repeat(f,`

` f(a))`

will end the iteration, returning a generator expression instead of yielding values.

There are two ways to return all the values instead of returning a generator expression, which are as follows:

We can write an explicit

`for`

statement to yield values as follows:`for x in some_iter: yield x`

We can use the

`yield`

`from`

expression as follows:`yield from some_iter`

Both techniques of yielding the values of a recursive generator function are will have similar results. We’ll try to emphasize `yield`

` from`

.

It turns out that `yield`

and `yield`

` from`

are a bit more sophisticated than we’ve shown here. For our purposes, we’ll limit ourselves to working with recursive results. For more information on the full feature set for `yield`

and `yield`

` from`

, see PEP 342 and PEP 380: https://peps.python.org/pep-0342/ and https://peps.python.org/pep-0380/.

Of course, we don’t want the entire infinite sequence created by the `repeat()`

function. It’s essential to stop generating values when we’ve found the square root we’re looking for. The common symbol for the limit we can consider “close enough” is the Greek letter epsilon, 𝜖.

In Python, we have to be a little clever when taking items from an infinite sequence one at a time. It works out well to use a simple interface function that wraps a slightly more complex recursion. Take a look at the following code snippet:

```
from collections.abc import Iterator
def within(
𝜖: float,
iterable: Iterator[float]
) -> float:
def head_tail(
𝜖: float,
a: float,
iterable: Iterator[float]
) -> float:
b = next(iterable)
if abs(a-b) <= 𝜖:
return b
return head_tail(𝜖, b, iterable)
return head_tail(𝜖, next(iterable), iterable)
```

We’ve defined an internal function, `head_tail()`

, which accepts the tolerance, 𝜖, an item from the iterable sequence, `a`

, and the rest of the iterable sequence, `iterable`

. The first item from the iterable, extracted with the `next()`

function, is bound to a name, `b`

. If |a − b|≤ 𝜖, the two values of `a`

and `b`

are close enough to call the value of `b`

the square root; the difference is less than or equal to the very small value of 𝜖. Otherwise, we use the `b`

value in a recursive invocation of the `head_tail()`

function to examine the next pair of values.

Our `within()`

function properly initializes the internal `head_tail()`

function with the first value from the `iterable`

parameter.

We can use the three functions, `next_()`

, `repeat()`

, and `within()`

, to create a square root function, as follows:

```
def sqrt(n: float) -> float:
return within(
𝜖=0.0001,
iterable=repeat(
lambda x: next_(n, x),
1.0
)
)
```

We’ve used the `repeat()`

function to generate a (potentially) infinite sequence of values based on the `next_(n,x)`

function. Our `within()`

function will stop generating values in the sequence when it locates two values with a difference less than 𝜖.

This definition of the `sqrt()`

function provides useful default values to the underlying `within()`

function. It provides an 𝜖 value of 0.0001 and an initial a_{0} value of 1.0.

A more advanced version could use default parameter values to make changes possible. As an exercise, the definition of `sqrt()`

can be rewritten so an expression such as `sqrt(1.0,`

` 0.000_01,`

` 3)`

will start with an approximation of 1.0 and compute the value of to within 0.00001. For most applications, the initial a_{0} value can be 1.0. However, the closer it is to the actual square root, the more rapidly this algorithm converges.

The original example of this approximation algorithm was shown in the Miranda language. It’s easy to see there are some profound differences between Miranda and Python. In spite of the differences, the similarities give us confidence that many kinds of functional programming can be easily implemented in Python.

The `within`

function shown here is written to match the original article’s function definition. Python’s `itertools`

library provides a `takewhile()`

function that might be better for this application than the `within()`

function. Similarly, the `math.isclose()`

function may be better than the `abs(a-b)`

` <=`

` `

𝜖 expression used here. Python offers a great many pre-built functional programming features; we’ll look closely at these functions in Chapter 8, The Itertools Module and Chapter 9, Itertools for Combinatorics – Permutations and Combinations.

# 1.4 Exploratory data analysis

Later in this book, we’ll use the field of exploratory data analysis as a source for concrete examples of functional programming. This field is rich with algorithms and approaches to working with complex datasets; functional programming is often a very good fit between the problem domain and automated solutions.

While details vary from author to author, there are several widely accepted stages of EDA. These include the following:

Data preparation: This might involve extraction and transformation for source applications. It might involve parsing a source data format and doing some kind of data scrubbing to remove unusable or invalid data. This is an excellent application of functional design techniques.

David Mertz’s superb book Cleaning Data for Effective Data Science( https://www.packtpub.com/product/cleaning-data-for-effective-data-science/9781801071291) provides additional information on data cleaning. This is a crucial subject for all data science and analytical work.

Data exploration: This is a description of the available data. This usually involves the essential statistical functions. This is another excellent place to explore functional programming. We can describe our focus as univariate and bivariate statistics, but that sounds too daunting and complex. What this really means is that we’ll focus on mean, median, mode, and other related descriptive statistics. Data exploration may also involve data visualization. We’ll skirt this issue because it doesn’t involve very much functional programming.

For more information on Python visualization, see Interactive Data Visualization with Python, https://www.packtpub.com/product/interactive-data-visualization-with-python-second-edition/9781800200944. See https://www.projectpro.io/article/python-data-visualization-libraries/543 for some additional visualization libraries.

Data modeling and machine learning: This tends to be prescriptive as it involves extending a model to new data. We’re going to skirt around this because some of the models can become mathematically complex. If we spend too much time on these topics, we won’t be able to focus on functional programming.

Evaluation and comparison: When there are alternative models, each must be evaluated to determine which is a better fit for the available data. This can involve ordinary descriptive statistics of model outputs, which can benefit from functional design techniques.

One goal of EDA is often to create a model that can be deployed as a decision support application. In many cases, a model might be a simple function. A functional programming approach can apply the model to new data and display results for human consumption.

# 1.5 Summary

In this chapter, we’ve looked at programming paradigms with an eye toward distinguishing the functional paradigm from the imperative paradigm. For our purposes, object-oriented programming is a kind of imperative programming; it relies on explicit state changes. Our objective in this book is to explore the functional programming features of Python. We’ve noted that some parts of Python don’t allow purely functional programming; we’ll be using some hybrid techniques that meld the good features of succinct, expressive functional programming with some high-performance optimizations in Python.

In the next chapter, we’ll look at five specific functional programming techniques in detail. These techniques will form the essential foundation for our hybridized functional programming in Python.

# 1.6 Exercises

The exercises in this book are based on code available from Packt Publishing on GitHub. See https://github.com/PacktPublishing/Functional-Python-Programming-3rd-Edition.

In some cases, the reader will notice that the code provided on GitHub includes partial solutions to some of the exercises. These serve as hints, allowing the reader to explore alternative solutions.

In many cases, exercises will need unit test cases to confirm they actually solve the problem. These are often identical to the unit test cases already provided in the GitHub repository. The reader will need to replace the book’s example function name with their own solution to confirm that it works.

## 1.6.1 Convert an imperative algorithm to functional code

The following algorithm is stated as imperative assignment statements and a while construct to indicate processing something iteratively.

What does this appear to compute? Given Python built-in functions like `sum`

, can this be simplified?

It helps to write this in Python and refactor the code to be sure that correct answers are created.

A test case is the following:

The computed value for m is approximately `7.5`

.

## 1.6.2 Convert step-wise computation to functional code

The following algorithm is stated as a long series of single assignment statements. The rad(x) function converts degrees to radians, rad(d) = π ×. See the `math`

module for an implementation.

Is this code easy to understand? Can you summarize this computation as a short mathematical-looking formula?

Breaking it down into sections, lines 1 to 8 seem to be focused on some conversions, differences, and mid-point computations. Lines 9 to 12 compute two values, x and y. Can these be summarized or simplified? The final four lines do a relatively direct computation of d. Can this be summarized or simplified? As a hint, look at `math.hypot()`

for a function that might be applicable in this case.

It helps to write this in Python and refactor the code.

A test case is the following:

_{1}← 32.82950

lon

_{1}←−79.93021

lat

_{2}← 32.74412

lon

_{2}←−79.85226

The computed value for d is approximately `6.4577`

.

Refactoring the code can help to confirm your understanding.

## 1.6.3 Revise the sqrt() function

The `sqrt()`

function defined in the A classic example of functional programming section has only a single parameter value, `n`

. Rewrite this to create a more advanced version using default parameter values to make changes possible. An expression such as `sqrt(1.0,`

` 0.000_01,`

` 3)`

will start with an approximation of 1.0 and compute the value to a precision of 0.00001. The final parameter value, `3`

, is the value of n, the number we need to compute the square root of.

## 1.6.4 Data cleansing steps

A file of source data has US ZIP codes in a variety of formats. This problem often arises when spreadsheet software is used to collect or transform data.

Some ZIP codes were processed as numbers. This doesn’t work out well for places in New England, where ZIP codes have a leading zero. For example, one of Portsmouth, New Hampshire’s codes should be stated as

`03801`

. In the source file, it is`3801`

. For the most part, these numbers will have five or nine digits, but some codes in New England will be four or eight digits when a single leading zero was dropped. For Puerto Rico, there may be two leading zeroes.Some ZIP codes are stored as strings, 12345−0100, where a four-digit extension for a post-office box has been appended to the base five-digit code.

A CSV-format file has only text values. However, when data in the file has been processed by a spreadsheet, problems can arise. Because a ZIP code has only digits, it can be treated as numeric data. This means the original data values will have been converted to a number, and then back to a text representation. These conversions will drop the leading zeroes. There are a number of workarounds in various spreadsheet applications to prevent this problem. If they’re not used, the data can have anomalous values that can be cleansed to restore the original representation.

The objective of the exercise is to compute a histogram of the most popular ZIP codes in the source data file. The data must be cleansed to have the following two ZIP formats:

Five characters with no post-office box, for example

`03801`

Ten characters with a hyphen, for example

`03899-9876`

The essential histogram can be done with a `collections.Counter`

object as follows.

```
from collections import Counter
import csv
from pathlib import Path
DEFAULT_PATH = Path.cwd() / "address.csv"
def main(source_path: Path = DEFAULT_PATH) -> None:
frequency: Counter[str] = Counter()
with source_path.open() as source:
rdr = csv.DictReader(source)
for row in rdr:
if "-" in row[’ZIP’]:
text_zip = row[’ZIP’]
missing_zeroes = 10 - len(text_zip)
if missing_zeroes:
text_zip = missing_zeroes*’0’ + text_zip
else:
text_zip = row[’ZIP’]
if 5 < len(row[’ZIP’]) < 9:
missing_zeroes = 9 - len(text_zip)
else:
missing_zeroes = 5 - len(text_zip)
if missing_zeroes:
text_zip = missing_zeroes*’0’ + text_zip
frequency[text_zip] += 1
print(frequency)
if __name__ == "__main__":
main()
```

This makes use of imperative processing features to read a file. The overall design, using a `for`

statement to process rows of a file, is an essential Pythonic feature that we can preserve.

On the other hand, the processing of the `text_zip`

and `missing_zeroes`

variables through a number of state changes seems like it’s a potential source for confusion.

This can be refactored through several rewrites:

Decompose the

`main()`

function into two parts. A new`zip_histogram()`

function should be written to contain much of the processing detail. This function will process the opened file, and return a`Counter`

object. A suggested signature is the following:`def zip_histogram( reader: csv.DictReader[str]) -> Counter[str]: pass`

The

`main()`

function is left with the responsibility to open the file, create the`csv.DictReader`

instance, evaluate`zip_histogram()`

, and print the histogram.Once the

`zip_histogram()`

function has been defined, the cleansing of the`ZIP`

attribute can be refactored into a separate function, with a name like`zip_cleanse()`

. Rather than setting the value of the`text_zip`

variable, this function can return the cleansed result. This can be tested separately to be sure the various cases are handled gracefully.The distinction between long ZIP codes with a hyphen and without a hyphen is something that should be fixed. Once the

`zip_cleanse()`

works in general, add a new function to inject hyphens into ZIP codes with only digits. This should transform`38011234`

to`03801-1234`

. Note that short, five-digit ZIP codes do not need to have a hyphen added; this additional transformation only applies to nine-digit codes to make them into ten-position strings.

The final `zip_histogram()`

function should look something like the following:

```
def zip_histogram(
reader: csv.DictReader[str]) -> Counter[str]:
return Counter(
zip_cleanse(
row[’ZIP’]
) for row in reader
)
```

This provides a framework for performing a focused data cleanup in the given column. It allows us to distinguish between CSV and file processing features, and the details of how to clean up a specific column of data.

## 1.6.5 (Advanced) Optimize this functional code

The following algorithm is stated as a single ”step” that has been decomposed into three separate formulae. The decomposition is more a concession to the need to fit the expression into the limits of a printed page than a useful optimization. The rad(x) function converts degrees to radians, rad(d) = π ×.

There are a number of redundant expressions, like rad(lat_{1}) and rad(lat_{2}). If these are assigned to local variables, can the expression be simplified?

The final computation of d does not match the conventional understanding of computing a hypotenuse, . Should the code be refactored to match the definition in `math.hypot`

?

It helps to start by writing this in Python and then refactoring the code.

A test case is the following:

_{1}← 32.82950

lon

_{1}←−79.93021

lat

_{2}← 32.74412

lon

_{2}←−79.85226

The computed value for d is approximately `6.4577`

.

Refactoring the code can help to confirm your understanding of what this code really does.

# Join our community Discord space

Join our Python Discord workspace to discuss and know more about the book: https://packt.link/dHrHU