1.8 A potential mess
It’s common for folks who know some Python but haven’t extensively made use of the OOP features to wonder whether all the brain-calories that are burned really do lead to better software. We’ll touch on a few questions (really, objections phrased as a questions) first. Then, we can look at a concrete example of restating a script as objects.
One common question is: “Isn’t a class just a bunch of functions with shared data?” The short answer is “yes.” The object-oriented feature that’s important here is bundling the data and the related functions into a single namespace, called a class definition. When we write a batch of closely related functions, we often give them similar-looking names to be sure that the relationship is obvious. This is the purpose of a class: it provides a common container name for related functions.
Additionally, a class definition lets us create multiple instances of the shared data. This helps us encapsulate the processing for multiple objects with similar behavior but distinct states.
Another question is: “Why is a collection of class definitions easier to understand than one long function?” The short answer is “chunking.” To keep complicated ideas in our heads, we break things into chunks. For example, we don’t read a long number as a haphazard string of digits; we decompose it into blocks of digits. This is why we throw punctuation into things such as telephone numbers. In North America, we write “(111)222-3333” to break a 10-digit phone number into three small chunks. When we talk about an automobile’s “interior” or “engine,” we’re decomposing the complicated whole into more intellectually manageable chunks.
A long script or a long function is generally hard to understand. The programmer will often break the long script into sections using comments. Sometimes, the comments are big billboards announcing major steps in the processing. Each of these sections could have been a smaller function. Smaller functions that are closely related often manage the state of a single object; these are methods of a class.
1.8.1 Reading a big script
Imagine a long Python script that summarizes details from a number of files in JSON format. It opens files, parses the JSON content, locates the details, and accumulates a summary. It does a lot of things, and reflects poorly managed complexity. Here’s an outline of the code:
import json
from pathlib import Path
import shlex
def main():
optional = {"type"}
result_dir = Path.cwd() / "data"
for file in result_dir.glob("*.json"):
# 1. Load file
result = json.loads(file.read_text())
# 2. Set Outcome
app_name = file.stem
env_outcome = None
# 3. Examine environments
for env_name, env in result[’testenvs’].items():
# 2a. Skip special names
if env_name.startswith("."):
continue
# 2b. Accumulate outcomes
if env:
if env[’result’][’success’]:
if env_outcome is None:
env_outcome = "ok"
else:
for step in env[’test’]:
if step[’retcode’] != 0:
command = Path(step[’command’][0]).stem
args = shlex.join(step[’command’][1:])
message = f"{env_name} failed {command} {args}"
if env_outcome is None or env_outcome == "ok":
env_outcome = message
else:
env_outcome = f"{env_outcome}, {message}"
else:
if env_outcome is None:
env_outcome = f"{env_name} did not run"
elif env_outcome == "ok" and env_name in optional:
env_outcome = f"ok (except {env_name})"
else:
env_outcome = f"{env_outcome}, {env_name} did not run"
# 4. Write summary
print(f"{app_name:20s} {env_outcome}")
This script is just shy of 50 lines of code. Within this function, there are numerous shifts in focus: first the paths, then the JSON document on each path, then the environments that were tested, and then the commands that were executed. Ultimately, there are some complicated rules that define a final status that’s printed. These shifts can make sense to the original author, but they are very hard for anyone else to grasp.
Further, of course, the complexity is quite difficult to test.
Buried in the clutter of processing details are a few essential ideas. This is for a suite of application instances, where each application is tested with the tox tool. The tool produces the JSON-formatted files with the details of the test outcomes for each application. (This tool is one of many available in the PyPI repository that are commonly added to projects to automate testing.) The tool will exercise each application in a number of environments. Each environment has a number of commands, using tools such as pytest, pyright, and ruff. An environment can have a result where the success attribute is true, meaning all the commands worked. Otherwise, at least one command failed.
Note that we started highlighting the key concepts that may need to be implemented as classes of objects: an application, several environment instances, and several command instances. A command, for example, has a JSON representation as a list of strings. An environment has a JSON representation as a simple string, "3.13", with a dictionary of supporting details.
The script dives into details of the file, environment, and command. It’s rare for a script like this to provide any sort of overview to help clarify the three varieties of outcomes for each application that’s being tested:
-
All commands in all environments were successful. The application is ready for deployment.
-
A command failed in an environment. The application needs debugging.
-
Something else went wrong and there’s no JSON file at all. This also suggests the application isn’t ready for deployment. Or, it may suggest something else is wrong with the entire test framework.
We have three classes of objects in the problem domain:
-
An application, associated with one or more environments. The application is also associated with a summary that reduces the environment and command details to a final decision.
-
An environment, associated with one or more commands. The environment object will have a summary of the commands, in the form of a status of success or failure.
-
A command, which has details of each step performed. These are mostly interesting when they record a failure.
When looking at the script, we see a lot of navigation through JSON data structures. While this is an important implementation detail, it tends to obscure the overall objective of understanding applications and environments.
Note that it’s common to gloss over some other categories of objects that are part of the implementation details:
-
The Path object with a glob() method to locate all the files
-
The dict objects created by the json module
-
The list objects that contain the commands within an environment
When programming in Python, it helps to recognize that these implementation classes — the pathlib.Path, dict, and list classes — are the essence of object-oriented programming. These are classes we did not write, but we use them to create our applications. It turns out that parts of any Python script are already object-oriented, even if the script — as a whole — doesn’t seem to have a robust design.
Revising the script permits us to emphasize the processing of applications, environments, and commands. You should consider ways to adjust this code to be object-oriented. We won’t dive into these details. Instead, we’ll look more broadly at the ideas behind refactoring throughout this book. We’ll end the chapter having exposed the problem and a path toward a solution. There are two interrelated concepts here:
-
Python is already object-oriented; the built-in types are all based on class definitions
-
Good object-oriented design is a shift in focus from implementation details to the concepts behind the problem being solved