Clean Code in Python

3.6 (5 reviews total)
By Mariano Anaya
  • Instant online access to over 8,000+ books and videos
  • Constantly updated with 100+ new titles each month
  • Breadth and depth in over 1,000+ technologies
  1. Introduction, Code Formatting, and Tools

About this book

Python is currently used in many different areas such as software construction, systems administration, and data processing.

In all of these areas, experienced professionals can find examples of inefficiency, problems, and other perils, as a result of bad code. After reading this book, readers will understand these problems, and more importantly, how to correct them.

The book begins by describing the basic elements of writing clean code and how it plays an important role in Python programming. You will learn about writing efficient and readable code using the Python standard library and best practices for software design. You will learn to implement the SOLID principles in Python and use decorators to improve your code. The book delves more deeply into object oriented programming in Python and shows you how to use objects with descriptors and generators. It will also show you the design principles of software testing and how to resolve software problems by implementing design patterns in your code. In the final chapter we break down a monolithic application to a microservice one, starting from the code as the basis for a solid platform.

By the end of the book, you will be proficient in applying industry approved coding practices to design clean, sustainable and readable Python code.

Publication date:
August 2018
Publisher
Packt
Pages
332
ISBN
9781788835831

 

Chapter 1. Introduction, Code Formatting, and Tools

In this chapter, we will explore the first concepts related to clean code, starting with what it is and what it means. The main point of the chapter is to understand that clean code is not just a nice thing to have or a luxury in software projects. It's a necessity. Without quality code, the project will face the perils of failing due to an accumulated technical debt.

Along the same lines, but going into a bit more detail, are the concepts of formatting and documenting the code. This also might sound like a superfluous requirement or task, but again, we will discover that it plays a fundamental role in keeping the code base maintainable and workable.

We will analyze the importance of adopting a good coding guideline for this project. Realizing that maintaining the code align to the reference is a continuous task, and we will see how we can get help from automated tools that will ease our work. For this reason, we quickly discuss how to configure the main tools so that they automatically run on the project as part of the build.

After reading this chapter, you will have an idea of what clean code is, why it is important, why formatting and documenting the code are crucial tasks, and how to automate this process. From this, you should acquire the mindset for quickly organizing the structure of a new project, aiming for good code quality.

After reading this chapter, you will have learned the following:

  • That clean code really means something far more important than formatting in software construction
  • That even so, having a standard formatting is a key component to have in a software project, for the sake of its maintainability
  • How to make the code self-documenting by using the features that Python provides
  • How to configure tools to help arrange the layout of the code in a consistent way so that team members can focus on the essence of the problem
 

The meaning of clean code


There is no sole or strict definition of clean code. Moreover, there is probably no way of formally measuring clean code, so you cannot run a tool on a repository that could tell you how good, bad, or maintainable or not that code is. Sure, you can run tools such as checkers, linters, static analyzers, and so on. And those tools are of much help. They are necessary, but not sufficient. Clean code is not something a machine or script could tell (so far), but rather something that us, as professionals, can decide.

For decades of using the terms programming languages, we thought that they were languages to communicate our ideas to the machine, so it can run our programs. We were wrong. That's not the truth, but part of the truth. The real language behind programming languages is to communicate our ideas to other developers.

Here is where the true nature of clean code lies. It depends on other engineers to be able to read and maintain the code. Therefore, we, as professionals, are the only ones who can judge this. Think about it; as developers, we spend much more time reading code than actually writing it. Every time we want to make a change or add a new feature, we first have to read all the surroundings of the code we have to modify or extend. The language (Python), is what we use to communicate among ourselves.

So, instead of giving you a definition (or my definition) of clean code, I invite you to go through the book, read all about idiomatic Python, see the difference between good and bad code, identify traits of good code and good architecture, and then come up with your own definition. After reading this book, you will be able to judge and analyze code for yourself, and you will have a more clear understanding of clean code. You will know what it is and what it means, regardless of any definition given to you.

 

The importance of having clean code


There are a huge number of reasons why clean code is important. Most of them revolve around the ideas of maintainability, reducing technical debt, working effectively with agile development, and managing a successful project.

 

The first idea I would like to explore is in regards to agile development and continuous delivery. If we want our project to be able to successfully deliver features constantly at a steady and predictable pace, then having a good and maintainable code base is a must.

Imagine you are driving a car on a road toward a destination you want to reach at a certain point in time. You have to estimate your arrival time so that you can tell the person who is waiting for you. If the car works fine, and the road is flat and perfect, then I do not see why you would miss your estimation by a large margin. Now, if the road is broken and you have to step out to move rocks out of the way, or avoid cracks, stop to check the engine every few kilometers, and so on, then it is very unlikely that you will know for sure when are you going to arrive (or if you are). I think the analogy is clear; the road is the code. If you want to move at a steady, constant, and predictable pace, the code needs to be maintainable and readable. If it is not, every time product management asks for a new feature, you will have to stop to refactor and fix the technical debt.

Technical debt refers to the concept of problems in the software as a result of a compromise, and a bad decision being made. In a way, it's possible to think about technical debt in two ways. From the present to the past. What if the problems we are currently facing are the result of previously written bad code? From the present to the future—if we decide to take the shortcut now, instead of investing time in a proper solution, what problems are we creating for ourselves in the future?

The word debt is a good choice. It's a debt because the code will be harder to change in the future than it would be to change it now. That incurred cost is the interests of the debt. Incurring in technical debt means that tomorrow, the code will be harder and more expensive (it would be possible to even measure this) than today, and even more expensive the day after, and so on.

Every time the team cannot deliver something on time and has to stop to fix and refactor the code is paying the price of technical debt.

The worst thing about technical debt is that it represents a long-term and underlying problem. It is not something that raises a high alarm. Instead, it is a silent problem, scattered across all parts of the project, that one day, at one particular time, will wake up and become a show-stopper.

The role of code formatting in clean code

Is clean code about formatting and structuring the code, according to some standards (for example, PEP-8, or a custom standard defined by the project guidelines)? The short answer is no.

 

Clean code is something else that goes way beyond coding standards, formatting, linting tools, and other checks regarding the layout of the code. Clean code is about achieving quality software and building a system that is robust, maintainable, and avoiding technical debt. A piece of code or an entire software component could be 100% with PEP-8 (or any other guideline), and still not satisfy these requirements.

However, not paying attention to the structure of the code has some perils. For this reason, we will first analyze the problems with a bad code structure, how to address them, and then we will see how to configure and use tools for Python projects in order to automatically check and correct problems.

To sum this up, we can say that clean code has nothing to do with things like PEP-8 or coding styles. It goes way beyond that, and it means something more meaningful to the maintainability of the code and the quality of the software. However, as we will see, formatting the code correctly is important in order to work efficiently.

Adhering to a coding style guide on your project

A coding guideline is a bare minimum a project should have to be considered being developed under quality standards. In this section, we will explore the reasons behind this, so in the following sections, we can start looking at ways to enforce this automatically by the means of tools.

The first thing that comes to my mind when I try to find good traits in a code layout is consistency. I would expect the code to be consistently structured so that it is easier to read and follow. If the code is not correct or consistently structured, and everyone on the team is doing things in their own way, then we will end up with code that will require extra effort and concentration to be followed correctly. It will be error-prone, misleading, and bugs or subtleties might slip through easily.

We want to avoid that. What we want is exactly the opposite of that—code that we can read and understand as quickly as possible at a single glance.

If all members of the development team agree on a standardized way of structuring the code, the resulting code would look much more familiar. As a result of that, you will quickly identify patterns (more about this in a second), and with these patterns in mind, it will be much easier to understand things and detect errors. For example, when something is amiss, you will notice that somehow, there is something odd in the patterns you are used to seeing, which will catch your attention. You will take a closer look, and you will more than likely spot the mistake!

As it was stated in the classical book, Code Complete, an interesting analysis of this was done on the paper titled Perceptions in Chess (1973), where an experiment was conducted in order to identify how different people can understand or memorize different chess positions. The experiment was conducted on players of all levels (novices, intermediate, and chess masters), and with different chess positions on the board. They found out that when the position was random, the novices did as well as the chess masters; it was just a memorization exercise that anyone could do at reasonably the same level. When the positions followed a logical sequence that might occur in a real game (again, consistency, adhering to a pattern), then the chess masters performed exceedingly better than the rest.

Now imagine this same situation applied to software. We, as the software engineers experts in Python, are like the chess masters in the previous example. When the code is structured randomly, without following any logic, or adhering to any standard, then it would be as difficult for us to spot mistakes as a novice developer. On the other hand, if we are used to reading code in a structured fashion, and we have learned to quickly get the ideas from the code by following patterns, then we are at a considerable advantage.

In particular, for Python, the sort of coding style you should follow is PEP-8. You can extend it or adopt some of its parts to the particularities of the project you are working on (for example, the length of the line, the notes about strings, and so on). However, I do suggest that regardless of whether you are using just plain PEP-8 or extending it, you should really stick to it instead of trying to come up with another different standard from scratch.

The reason for this is that this document already takes into consideration many of the particularities of the syntax of Python (that would not normally apply for other languages), and it was created by core Python developers who actually contributed to the syntax of Python. For this reason, it is hard to think that the accuracy of PEP-8 can be otherwise matched, not to mention, improved.

In particular, PEP-8 has some characteristics that carry other nice improvements when dealing with code, such as following:

  • Grepability: This is the ability to grep tokens inside the code; that is, to search in certain files (and in which part of those files) for the particular string we are looking for. One of the items introduced by this standard is something that differentiates the way of writing the assignment of values to variables, from the keyword arguments being passed to functions.

To see this better, let's use an example. Let's say we are debugging, and we need to find where the value to a parameter named location is being passed. We can run the following grep command, and the result will tell us the file and the line we are looking for:

$ grep -nr "location=" . 
./core.py:13: location=current_location,

Now, we want to know where this variable is being assigned this value, and the following command will also give us the information we are looking for:

$ grep -nr "location =" .
./core.py:10: current_location = get_location()

PEP-8 establishes the convention that, when passing arguments by keyword to a function, we don't use spaces, but we do when we assign variables. For that reason, we can adapt our search criteria (no spaces around the = on the first search, and one space on the second) and be more efficient in our search. That is one of the advantages of following a convention. 

  • Consistency: If the code looks like a uniform format, the reading of it will be much easier. This is particularly important for onboarding, if you want to welcome new developers to your project, or even hire new (and probably less experienced) programmers on your team, and they need to become familiar with the code (which might even consist of several repositories). It will make their lives much easier if the code layout, documentation, naming convention, and such is identical across all files they open, in all repositories.
  • Code quality:By looking at the code in a structured fashion, you will become more proficient at understanding it at a glance (again, like in Perception in Chess), and you will spot bugs and mistakes more easily. In addition to that, tools that check for the quality of the code will also hint at potential bugs. Static analysis of the code might help to reduce the ratio of bugs per line of code.
 

Docstrings and annotations


This section is about documenting the code in Python, from within the code. Good code is self-explanatory but is also well-documented. It is a good idea to explain what it is supposed to do (not how).

One important distinction; documenting the code is not the same as adding comments on it. Comments are bad, and they should be avoided. By documentation, we refer to the fact of explaining the data types, providing examples of them, and annotating the variables.

This is relevant in Python, because being dynamically typed, it might be easy to get lost on the values of variables or objects across functions and methods. For this reason, stating this information will make it easier for future readers of the code.

There is another reason that specifically relates to annotations. They can also help in running some automatic checks, such as type hinting, through tools such as Mypy. We will find that, in the end, adding annotations pays off.

Docstrings

In simple terms, we can say that docstrings are basically documentation embedded in the source code. A docstring is basically a literal string, placed somewhere in the code, with the intention of documenting that part of the logic.

Notice the emphasis on the word documentation. This subtlety is important because it's meant to represent explanation, not justification. Docstrings are not comments; they are documentation.

Having comments in the code is a bad practice for multiple reasons. First, comments represent our failure to express our ideas in the code. If we actually have to explain why or how we are doing something, then that code is probably not good enough. For starters, it fails to be self-explanatory. Second, it can be misleading. Worst than having to spend some time reading a complicated section is to read a comment on how it is supposed to work, and figuring out that the code actually does something different. People tend to forget to update comments when they change the code, so the comment next to the line that was just changed will be outdated, resulting in a dangerous misdirection.

Sometimes, on rare occasions, we cannot avoid having comments. Maybe there is an error on a third-party library that we have to circumvent. In those cases, placing a small but descriptive comment might be acceptable.

With docstrings, however, the story is different. Again, they do not represent comments, but the documentation of a particular component (a module, class, method, or function) in the code. Their use is not only accepted but also encouraged. It is a good practice to add docstrings whenever possible.

 

 

 

 

 

 

The reason why they are a good thing to have in the code (or maybe even required, depending on the standards of your project) is that Python is dynamically typed. This means that, for example, a function can take anything as the value for any of its parameters. Python will not enforce, nor check, anything like this. So, imagine that you find a function in the code that you know you will have to modify. You are even lucky enough that the function has a descriptive name, and that its parameters do as well. It might still not be quite clear what types you should pass to it. Even if this is the case, how are they expected to be used?

Here is where a good docstring might be of help. Documenting the expected input and output of a function is a good practice that will help the readers of that function understand how it is supposed to work.

Consider this good example from the standard library:

In [1]: dict.update??
Docstring:
D.update([E, ]**F) -> None. Update D from dict/iterable E and F.
If E is present and has a .keys() method, then does: for k in E: D[k] = E[k]
If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v
In either case, this is followed by: for k in F: D[k] = F[k]
Type: method_descriptor

Here, the docstring for the update method on dictionaries gives us useful information, and it is telling us that we can use it in different ways:

  1. We can pass something with a .keys() method (for example, another dictionary), and it will update the original dictionary with the keys from the object passed per parameter:
>>> d = {}
>>> d.update({1: "one", 2: "two"})
>>> d
{1: 'one', 2: 'two'}
  1. We can pass an iterable of pairs of keys and values, and we will unpack them to update:
>>> d.update([(3, "three"), (4, "four")])
>>> d
{1: 'one', 2: 'two', 3: 'three', 4: 'four'}

 

 

 

 

In any case, the dictionary will be updated with the rest of the keyword arguments passed to it.

This information is crucial for someone that has to learn and understand how a new function works, and how they can take advantage of it.

Notice that in the first example, we obtained the docstring of the function by using the double question mark on it (dict.update??). This is a feature of the IPython interactive interpreter. When this is called, it will print the docstring of the object you are expecting. Now, imagine that in the same way, we obtained help from this function of the standard library; how much easier could you make the lives of your readers (the users of your code), if you place docstrings on the functions you write so that others can understand their workings in the same way?

The docstring is not something separated or isolated from the code. It becomes part of the code, and you can access it. When an object has a docstring defined, this becomes part of it via its __doc__ attribute:

>>> def my_function():
... """Run some computation"""
... return None
...
>>> my_function.__doc__
'Run some computation'

This means that it is even possible to access it at runtime and even generate or compile documentation from the source code. In fact, there are tools for that. If you run Sphinx, it will create the basic scaffold for the documentation of your project. With the autodoc extension (sphinx.ext.autodoc) in particular, the tool will take the docstrings from the code and place them in the pages that document the function.

Once you have the tools in place to build the documentation, make it public so that it becomes part of the project itself. For open source projects, you can use read the docs, which will generate the documentation automatically per branch or version (configurable). For companies or projects, you can have the same tools or configure these services on-premise, but regardless of this decision, the important part is that the documentation should be ready and available to all members of the team.

There is, unfortunately, one downside to docstrings, and it is that, as it happens with all documentation, it requires manual and constant maintenance. As the code changes, it will have to be updated. Another problem is that for docstrings to be really useful, they have to be detailed, which requires multiple lines.

 

Maintaining proper documentation is a software engineering challenge that we cannot escape from. It also makes sense to be like this. If you think about it, the reason for documentation to be manually written is because it is intended to be read by other humans. If it were automated, it would probably not be of much use. For the documentation to be of any value, everyone on the team must agree that it is something that requires manual intervention, hence the effort required. The key is to understand that software is not just about code. The documentation that comes with it is also part of the deliverable. Therefore, when someone is making a change on a function, it is equally important to also update the corresponding part of the documentation to the code that was just changed, regardless of whether its a wiki, a user manual, a README file, or several docstrings.

Annotations 

PEP-3107 introduced the concept of annotations. The basic idea of them is to hint to the readers of the code about what to expect as values of arguments in functions. The use of the word hint is not casual; annotations enable type hinting, which we will discuss later on in this chapter, after the first introduction to annotations.

Annotations let you specify the expected type of some variables that have been defined. It is actually not only about the types, but any kind of metadata that can help you get a better idea of what that variable actually represents.

Consider the following example:

class Point:
    def __init__(self, lat, long):
        self.lat = lat
        self.long = long


def locate(latitude: float, longitude: float) -> Point:
    """Find an object in the map by its coordinates"""

Here, we use float to indicate the expected types of latitude and longitude. This is merely informative for the reader of the function so that they can get an idea of these expected types. Python will not check these types nor enforce them.

We can also specify the expected type of the returned value of the function. In this case, Point is a user-defined class, so it will mean that whatever is returned will be an instance of Point.

 

 

 

 

 

 

However, types or built-ins are not the only kind of thing we can use as annotations. Basically, everything that is valid in the scope of the current Python interpreter could be placed there. For example, a string explaining the intention of the variable, a callable to be used as a callback or validation function, and so on.

With the introduction of annotations, a new special attribute is also included, and it is __annotations__. This will give us access to a dictionary that maps the name of the annotations (as keys in the dictionary) with their corresponding values, which are those we have defined for them. In our example, this will look like the following:

>>> locate.__annotations__
{'latitude': float, 'longitue': float, 'return': __main__.Point}

We could use this to generate documentation, run validations, or enforce checks in our code if we think we have to.

Speaking of checking the code through annotations, this is when PEP-484 comes into play. This PEP specifies the basics of type hinting; the idea of checking the types of our functions via annotations. Just to be clear again, and quoting PEP-484 itself:

"Python will remain a dynamically typed language, and the authors have no desire to ever make type hints mandatory, even by convention."

The idea of type hinting is to have extra tools (independent from the interpreter) to check and assess the correct use of types throughout the code and to hint to the user in case any incompatibilities are detected. The tool that runs these checks, Mypy, is explained in more detail in a later section, where we will talk about using and configuring the tools for the project. For now, you can think of it as a sort of linter that will check the semantics of the types used on the code. This sometimes helps in finding bugs early on, when the tests and checks are run. For this reason, it is a good idea to configure Mypy on the project and use it at the same level as the rest of the tools for static analysis.

However, type hinting means more than just a tool for checking the types on the code. Starting with Python 3.5, the new typing module was introduced, and this significantly improved how we define the types and the annotations in our Python code.

The basic idea behind this is that now the semantics extend to more meaningful concepts, making it even easier for us (humans) to understand what the code means, or what is expected at a given point. For example, you could have a function that worked with lists or tuples in one of its parameters, and you would have put one of these two types as the annotation, or even a string explaining it. But with this module, it is possible to tell Python that it expects an iterable or a sequence. You can even identify the type or the values on it; for example, that it takes a sequence of integers.

 

There is one extra improvement made in regards to annotations at the time of writing this book, and that is that starting from Python 3.6, it is possible to annotate variables directly, not just function parameters and return types. This was introduced in PEP-526, and the idea is that you can declare the types of some variables defined without necessarily assigning a value to them, as shown in the following listing:

class Point:
    lat: float
    long: float

>>> Point.__annotations__
{'lat': <class 'float'>, 'long': <class 'float'>}

Do annotations replace docstrings?

This is a valid question, since on older versions of Python, long before annotations were introduced, the way of documenting the types of the parameters of functions or attributes was done by putting docstrings on them. There are even some conventions on formats on how to structure docstrings to include the basic information for a function, including types and meaning of each parameter, type, and meaning of the result, and possible exceptions that the function might raise.

Most of this has been addressed already in a more compact way by means of annotations, so one might wonder if it is really worth having docstrings as well. The answer is yes, and this is because they complement each other.

It is true that a part of the information previously contained on the docstring can now be moved to the annotations. But this should only leave more room for a better documentation on the docstring. In particular, for dynamic and nested data types, it is always a good idea to provide examples of the expected data so that we can get a better idea of what we are dealing with.

Consider the following example. Let's say we have a function that expects a dictionary to validate some data:

def data_from_response(response: dict) -> dict:
    if response["status"] != 200:
        raise ValueError
    return {"data": response["payload"]}

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Here, we can see a function that takes a dictionary and returns another dictionary. Potentially, it can raise an exception if the value under the key "status" is not the expected one. However, we do not have much more information about it. For example, what does a correct instance of a response object look like? What would an instance of result look like? To answer both of these questions, it would be a good idea to document examples of the data that is expected to be passed in by a parameter and returned by this function.

Let's see if we can explain this better with the help of a docstring:

def data_from_response(response: dict) -> dict:
    """If the response is OK, return its payload.

    - response: A dict like::

    {
        "status": 200, # <int>
        "timestamp": "....", # ISO format string of the current
        date time
        "payload": { ... } # dict with the returned data
    }

    - Returns a dictionary like::

    {"data": { .. } }

    - Raises:
    - ValueError if the HTTP status is != 200
    """
    if response["status"] != 200:
        raise ValueError
    return {"data": response["payload"]}

Now, we have a better idea of what is expected to be received and returned by this function. The documentation serves as valuable input, not only for understanding and getting an idea of what is being passed around, but also as a valuable source for unit tests. We can derive data like this to use as input, and we know what would be the correct and incorrect values to use on the tests. Actually, the tests also work as actionable documentation for our code, but this will be explained in more detail.

The benefit is that now we know what the possible values of the keys are, as well as their types, and we have a more concrete interpretation of what the data looks like. The cost is that, as we mentioned earlier, it takes up a lot of lines, and it needs to be verbose and detailed to be effective.

 

Configuring the tools for enforcing basic quality gates

In this section, we will explore how to configure some basic tools and automatically run checks on the code, with the goal of leveraging part of the repetitive verification checks.

This is an important point: remember that code is for us, people, to understand, so only we can determine what is good or bad code. We should invest time in code reviews, thinking about what good code is, and how readable and understandable it is. When looking at the code written by a peer, you should ask such questions:

  • Is this code easy to understand and follow for a fellow programmer?
  • Does it speak in terms of the domain of the problem?
  • Would a new person joining the team be able to understand it and work with it effectively?

As we saw previously, code formatting, consistent layout, and proper indentation are required but not sufficient traits to have in a code base. Moreover, this is something that we, as engineers with a high sense of quality, would take for granted, so we would read and write code far beyond the basic concepts of its layout. Therefore, we are not willing to waste time reviewing these kinds of items, so we can invest our time more effectively by looking at actual patterns in the code in order to understand its true meaning and provide valuable results.

All of these checks should be automated. They should be part of the tests or checklist, and this, in turn, should be part of the continuous integration build. If these checks do not pass, make the build fail. This is the only way to actually ensure the continuity of the structure of the code at all times. It also serves as an objective parameter for the team to have as a reference. Instead of having some engineers or the leader of the team always having to tell the same comments about PEP-8 on code reviews, the build will automatically fail, making it something objective.

Type hinting with Mypy

Mypy (http://mypy-lang.org/) is the main tool for optional static type checking in Python. The idea is that, once you install it, it will analyze all of the files on your project, checking for inconsistencies on the use of the types. This is useful since, most of the time, it will detect actual bugs early, but sometimes it can give false positives.

 

You can install it with pip, and it is recommended to include it as a dependency for the project on the setup file:

$ pip install mypy

Once it is installed in the virtual environment, you just have to run the preceding command and it will report all of the findings on the type checks. Try to adhere to its report as much as possible, because most of the time, the insights provided by it help to avoid errors that might otherwise slip into production. However, the tool is not perfect, so if you think it is reporting a false positive, you can ignore that line with the following marker as a comment:

type_to_ignore = "something" # type: ignore

Checking the code with Pylint

There are many tools for checking the structure of the code (basically, this is compliance with PEP-8) in Python, such as pycodestyle (formerly known as PEP-8), Flake8, and many more. They all are configurable and are as easy to use as running the command they provide. Among all of them, I have found Pylint to be the most complete (and strict). It is also configurable.

Again, you just have to install it in the virtual environment with pip:

$ pip install pylint

Then, just running the pylint command would be enough to check it in the code.

It is possible to configure Pylint via a configuration file named pylintrc.

In this file, you can decide the rules you would like to enable or disable, and parametrize others (for example, to change the maximum length of the column).

Setup for automatic checks

On Unix development environments, the most common way of working is through makefiles. Makefiles are powerful tools that let us configure commands to be run in the project, mostly for compiling, running, and so on. Besides this, we can use a makefile in the root of our project, with some commands configured to run checks of the formatting and conventions on the code, automatically.

 

A good approach for this would be to have targets for the tests, and each particular test, and then have another one that will run altogether. For example:

typehint:
mypy src/ tests/

test:
pytest tests/

lint:
pylint src/ tests/

checklist: lint typehint test

.PHONY: typehint test lint checklist

Here, the command we should run (both in our development machines and in the continuous integration environment builds) is the following:

make checklist

This will run everything in the following steps:

  1. It will first check the compliance with the coding guideline (PEP-8, for instance)
  2. Then it will check for the use of types on the code
  3. Finally, it will run the tests

If any of these steps fail, consider the entire process a failure.

Besides configuring these checks automatically in the build, it is also a good idea if the team adopts a convention and an automatic approach for structuring the code. Tools such as Black (https://github.com/ambv/black) automatically format the code. There are many tools that will edit the code automatically, but the interesting thing about Black is that it does so in a unique form. It's opinionated and deterministic, so the code will always end up arranged in the same way.

For example, Black strings will always be double-quotes, and the order of the parameters will always follow the same structure. This might sound rigid, but it's the only way to ensure the differences in the code are minimal. If the code always respects the same structure, changes in the code will only show up in pull requests with the actual changes that were made, and no extra cosmetic modifications. It's more restrictive than PEP-8, but it's also convenient because, by formatting the code directly through a tool, we don't have to actually worry about that, and we can focus on the crux of the problem at hand.

 

 

 

At the time of writing this book, the only thing that can be configured is the length of the lines. Everything else is corrected by the criteria of the project.

The following code is PEP-8 correct, but it doesn't follow the conventions of black:

def my_function(name):
"""
>>> my_function('black')
'received Black'
"""
return 'received {0}'.format(name.title())

Now, we can run the following command to format the file:

black -l 79 *.py

Now, we can see what the tool has written:

def my_function(name):
"""
>>> my_function('black')
'received Black'
"""
return "received {0}".format(name.title())

On more complex code, a lot more would have changed (trailing commas, and more), but the idea can be seen clearly. Again, it's opinionated, but it's also a good idea to have a tool that takes care of details for us. It's also something that the Golang community learned a long time ago, to the point that there is a standard tool library, got fmt, that automatically formats the code according to the conventions of the language. It's good that Python has something like this now.

These tools (Black, Pylint, Mypy, and many more) can be integrated with the editor or IDE of your choice to make things even easier. It's a good investment to configure your editor to make these kinds of modifications either when saving the file or through a shortcut.

 

 

 

Summary


We now have a first idea of what clean code is, and a workable interpretation of it, which will serve us as a reference point for the rest of this book.

More importantly, we understood that clean code is something much more important than the structure and layout of the code. We have to focus on how the ideas are represented on the code to see if they are correct. Clean code is about readability, maintainability of the code, keeping technical debt to the minimum, and effectively communicating our ideas into the code so that others can understand the same thing we intended to write in the first place.

However, we discussed that the adherence to coding styles or guidelines is important for multiple reasons. We have agreed that this is a condition that is necessary, but not sufficient, and since it is a minimal requirement every solid project should comply with, it is clear that is something we better leave to the tools. Therefore, automating all of these checks becomes critical, and in this regard, we have to keep in mind how to configure tools such as Mypy, Pylint, and more.

The next chapter is going to be more focused on the Python-specific code, and how to express our ideas in idiomatic Python. We will explore the idioms in Python that make for more compact and efficient code. In this analysis, we will see that, in general, Python has different ideas or different ways to accomplish things compared to other languages.

About the Author

  • Mariano Anaya

    Mariano Anaya is a software engineer who spends most of his time creating software with Python and mentoring fellow programmers. Mariano's main areas of interests besides Python are software architecture, functional programming, distributed systems, and speaking at conferences. He was a speaker at Euro Python 2016 and 2017. To know more about him, you can refer to his GitHub account with the username rmariano. His speakerdeck username is rmariano.

    Browse publications by this author

Latest Reviews

(5 reviews total)
Haven't made it all the way through the book yet, but its a terrific textbook. I would recommend it to anybody who wants to improve their coding skills.
No hard copies received!!
Structure was not easy to follow, some topics very short or distributed over several chapters. Often a little too theoretical, but certainly a good comprehensive overview.

Recommended For You

Book Title
Access this book and the full library for just $5/m.
Access now