Modular programming is an essential tool for the modern developer. Gone are the days when you could just throw something together and hope that it works. To build robust systems that last, you need to understand how to organize your programs so that they can grow and evolve over time. Spaghetti coding is not an option. Modular programming techniques, and in particular the use of Python modules and packages, will give you the tools you need to succeed as a professional in the fast changing programming landscape.
In this chapter, we will:
Look at the fundamental aspects of modular programming
See how Python modules and packages can be used to organize your code
Discover what happens when modular programming techniques are not used
Learn how modular programming helps you stay on top of the development process
Take a look at the Python standard library as an example of modular programming
Create a simple program, built using modular techniques, to see how it works in practice
Let's get started by learning about modules and how they work.
For most beginner programmers, their first Python program is some version of the famous Hello World program. This program would look something like this:
print("Hello World!")
This one-line program would be saved in a file on disk, typically named something like hello.py
, and it would be executed by typing the following command into a terminal or command-line window:
python hello.py
The Python interpreter would then dutifully print out the message you have asked it to:
Hello World!
This hello.py
file is called a
Python source file. When you are first starting out, putting all your program code into a single source file is a great way of organizing your program. You can define functions and classes, and put instructions at the bottom which start your program when you run it using the Python interpreter. Storing your program code inside a Python source file saves you from having to retype it each time you want to tell the Python interpreter what to do.
As your programs get more complicated, however, you'll find that it becomes harder and harder to keep track of all the various functions and classes that you define. You'll forget where you put a particular piece of code and find it increasingly difficult to remember how all the various pieces fit together.
Modular programming is a way of organizing programs as they become more complicated. You can create a Python module, a source file that contains Python source code to do something useful, and then import this module into your program so that you can use it. For example, your program might need to keep track of various statistics about events that take place while the program is running. At the end, you might want to know how many events of each type have occurred. To achieve this, you might create a Python source file named stats.py
which contains the following Python code:
def init(): global _stats _stats = {} def event_occurred(event): global _stats try: _stats[event] = _stats[event] + 1 except KeyError: _stats[event] = 1 def get_stats(): global _stats return sorted(_stats.items())
The stats.py
Python source file defines a module named stats
—as you can see, the name of the module is simply the name of the source file without the .py
suffix. Your main program can make use of this module by importing it and then calling the various functions that you have defined as they are needed. The following frivolous example shows how you might use the stats
module to collect and display statistics about events:
import stats stats.init() stats.event_occurred("meal_eaten") stats.event_occurred("snack_eaten") stats.event_occurred("meal_eaten") stats.event_occurred("snack_eaten") stats.event_occurred("meal_eaten") stats.event_occurred("diet_started") stats.event_occurred("meal_eaten") stats.event_occurred("meal_eaten") stats.event_occurred("meal_eaten") stats.event_occurred("diet_abandoned") stats.event_occurred("snack_eaten") for event,num_times in stats.get_stats(): print("{} occurred {} times".format(event, num_times))
We're not interested in recording meals and snacks, of course—this is just an example—but the important thing to notice here is how the stats
module gets imported, and then how the various functions you defined within the stats.py
file get used. For example, consider the following line of code:
stats.event_occurred("snack_eaten")
Because the event_occurred()
function is defined within the stats
module, you need to include the name of the module whenever you refer to this function.
Note
There are ways in which you can import modules so you don't need to include the name of the module each time. We'll take a look at this in Chapter 3, Using Modules and Packages, when we look at namespaces and how the import
command works in more detail.
As you can see, the import
statement is used to load a module, and any time you see the module name followed by a period, you can tell that the program is referring to something (for example, a function or class) that is defined within that module.
In the same way that Python modules allow you to organize your functions and classes into separate Python source files, Python packages allow you to group multiple modules together.
A Python package is a directory with certain characteristics. For example, consider the following directory of Python source files:

This Python package, called animals
, contains five Python modules: cat
, cow
, dog
, horse
, and sheep
. There is also a special file with the rather unusual name __init__.py
. This file is called a
package initialization file; the presence of this file tells the Python system that this directory contains a package. The package initialization file can also be used to initialize the package (hence the name) and can also be used to make importing the package easier.
Note
Starting with Python version 3.3, packages don't always need to include an initialization file. However, packages without an initialization file (called namespace packages) are still quite uncommon and are only used in very specific circumstances. To keep things simple, we will be using regular packages (with the __init__.py
file) throughout this book.
Just like we used the module name when calling a function within a module, we use the package name when referring to a module within a package. For example, consider the following code:
import animals.cow animals.cow.speak()
In this example, the speak()
function is defined within the cow.py
module, which itself is part of the animals
package.
Packages are a great way of organizing more complicated Python programs. You can use them to group related modules together, and you can even define packages inside packages (called nested packages) to keep your program super-organized.
Note that the import
statement (and the related from...import
statement) can be used in a variety of ways to load packages and modules into your program. We have only scratched the surface here, showing you what modules and packages look like in Python so that you can recognize them when you see them in a program. We will be looking at the way modules and packages can be defined and imported in much more depth in Chapter 3, Using Modules and Packages.
Tip
Downloading the example code
The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Modular-Programming-with-Python. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
Modules and packages aren't just there to spread your Python code across multiple source files and directories—they allow you to organize your code to reflect the logical structure of what your program is trying to do. For example, imagine that you have been asked to create a web application to store and report on university examination results. Thinking about the business requirements that you have been given, you come up with the following overall structure for your application:

The program is broken into two main parts: a web interface, which interacts with the user (and with other computer programs via an API), and a backend, which handles the internal logic of storing information in a database, generating reports, and e-mailing results to students. As you can see, the web interface itself has been broken down into four parts:
A user authentication section, which handles user sign-up, sign-in, and sign-out
A web interface to view and enter exam results
A web interface to generate reports
An API, which allows other systems to retrieve exam results on request
As you consider each logical component of your application (that is, each of the boxes in the preceding illustration), you are also starting to think about the functionality that each component will provide. As you do this, you are already thinking in modular terms. Indeed, each of the logical components of your application can be directly implemented as a Python module or package. For example, you might choose to break your program into two main packages named web
and backend
, where:
The
web
package has modules namedauthentication
,results
,reports
, andapi
The
backend
package has modules nameddatabase
,reportgenerator
, andemailer
As you can see, each shaded box in the preceding illustration becomes a Python module, and each of the groupings of boxes becomes a Python package.
Once you have decided on the collection of packages and modules that you want to define, you can start to implement each component by writing the appropriate set of functions within each module. For example, the backend.database
module might have a function named get_students_results()
, which returns a single student's exam results for a given subject and year.
Note
In a real web application, your modular structure may actually be somewhat different. This is because you typically create a web application using a web application framework such as Django, which imposes its own structure on your program. However, in this example we are keeping the modular structure as simple as possible to show how business functionality translates directly into packages and modules.
Obviously, this example is fictitious, but it shows how you can think about a complex program in modular terms, breaking it down into individual components and then using Python modules and packages to implement each of these components in turn.
One of the great things about using modular design techniques, as opposed to just leaping in and writing code, is that they force you to think about the way your program should be structured and let you define a structure that will grow as your program evolves. Your program will be robust, easy to understand, easy to restructure as the scope of the program expands, and easy for others to work with too.
Woodworkers have a motto that equally applies to modular programming: there's a place for everything, and everything should be in its place. This is one of the hallmarks of high quality code, just as it's a hallmark of a well-organized woodworker's workshop.
To see why modular programming is such an important skill, imagine what would happen if you didn't apply modular techniques when writing a program. If you put all your Python code into a single source file, didn't try to logically arrange your functions and classes, and just randomly added new code to the end of the file, you would end up with a terrible mess of incomprehensible code. The following is an example of a program written without any sort of modular organization:
import configparser def load_config(): config = configparser.ConfigParser() config.read("config.ini") return config['config'] def get_data_from_user(): config = load_config() data = [] for n in range(config.getint('num_data_points')): value = input("Data point {}: ".format(n+1)) data.append(value) return data def print_results(results): for value,num_times in results: print("{} = {}".format(value, num_times)) def analyze_data(): data = get_data_from_user() results = {} config = load_config() for value in data: if config.getboolean('allow_duplicates'): try: results[value] = results[value] + 1 except KeyError: results[value] = 1 else: results[value] = 1 return results def sort_results(results): sorted_results = [] for value in results.keys(): sorted_results.append((value, results[value])) sorted_results.sort() return sorted_results if __name__ == "__main__": results = analyze_data() sorted_results = sort_results(results) print_results(sorted_results)
This program is intended to prompt the user for a number of data points and count how often each data point occurs. It does work, and the function and variable names do help to explain what each part of the program does—but it is still a mess. Just looking at the source code, it is hard to figure out what this program does. Functions were just added to the end of the file as the author decided to implement them, and even for a relatively small program, it is difficult to keep track of the various pieces. Imagine trying to debug or maintain a program like this if it was 10,000 lines long!
This program is an example of spaghetti coding—programming where everything is jumbled together and there is no overall organization to the source code. Unfortunately, spaghetti coding is often combined with other programming habits that make a program even harder to understand. Some of the more common problems include:
Poorly chosen variable and function names that don't hint at what each variable or function is for. A typical example of this is a program that uses variable names such as
a
,b
,c
, andd
.A complete lack of any documentation explaining what the code is supposed to do.
Functions that have unexpected side effects. For example, imagine if the
print_results()
function in our example program modified theresults
array as it was being printed. If you wanted to print the results twice or use the results after they had been printed, your program would fail in a most mysterious way.
While modular programming won't cure all these ills, the fact that it forces you to think about the logical organization of your program will help you to avoid them. Organizing your code into logical pieces will help you structure your program so that you know where each part belongs. Thinking about the packages and modules, and what each module contains, will encourage you to choose clear and appropriate names for the various parts of your program. Using modules and packages also makes it natural to include docstrings
to explain the functionality of each part of your program as you go along. Finally, using a logical structure encourages each part of your program to perform one particular task, reducing the likelihood of side effects creeping into your code.
Of course, like any programming technique, modular programming can be abused, but if it is used well it will vastly improve the quality of the programs you write.
Imagine that you are writing a program to calculate the price of overseas purchases. Your company is based in England, and you need to calculate the local price of something purchased in US dollars. Someone else has already written a Python module which downloads the exchange rate, so your program starts out looking something like the following:
def calc_local_price(us_dollar_amount): exchange_rate = get_exchange_rate("USD", "EUR") local_amount = us_dollar_amount * exchange_rate return local_amount
So far so good. Your program is included in your company's online ordering system and the code goes into production. However, two months later, your company starts ordering products not just from the US, but from China, Germany, and Australia as well. You scramble to update your program to support these alternative currencies, and write something like the following:
def calc_local_price(foreign_amount, from_country): if from_country == "United States": exchange_rate = get_exchange_rate("USD", "EUR") elif from_country == "China": exchange_rate = get_exchange_rate("CHN", "EUR") elif from_country == "Germany": exchange_rate = get_exchange_rate("EUR", "EUR") elif from_country = "Australia": exchange_rate = get_exchange_rate("AUS", "EUR") else: raise RuntimeError("Unsupported country: " + from_country) local_amount = us_dollar_amount * exchange_rate return local_amount
Once again, this program goes into production. Six months later, another 14 countries are added, and the project manager also decides to add a new feature, where the user can see how the price of a product has changed over time. As the programmer responsible for this code, you now have to add support for those 14 countries, and also add support for historical exchange rates going back in time.
This is a contrived example, of course, but it does show how programs typically evolve. Program code isn't something you write once and then leave forever. Your program is constantly changing and evolving in response to new requirements, newly discovered bugs, and unexpected consequences. Sometimes, a change that seems simple can be anything but. For example, consider the poor programmer who wrote the get_exchange_rate()
function in our previous example. This function now has to support not only the current exchange rate for any given pair of currencies, it also has to return historical exchange rates going back to any desired point in time. If this function is obtaining its information from a source that doesn't support historical exchange rates, then the whole function may need to be rewritten from scratch to support an alternative data source.
Sometimes, programmers and IT managers try to suppress change, for example by writing detailed specifications and then implementing one part of the program at a time (the so-called waterfall method of programming). But change is an integral part of programming, and trying to suppress it is like trying to stop the wind from blowing—it's much better to just accept that your program will change, and learn how to manage the process as well as you can.
Modular techniques are an excellent way of managing change in your programs. For example, as your program grows and evolves, you may find that a particular change requires the addition of a new module to your program:

You can then import and use that module in the other parts of your program that need to use this new functionality.
Alternatively, you might find that a new feature only requires you to change the contents of a module:

This is one of the major benefits of modular programming—since the details of how a particular feature is implemented is inside a module, you can often change the internals of a module without affecting any other parts of your program. The rest of your program continues to import and use the module as it did before—only the internal implementation of the module has changed.
Finally, you might find that you need to refactor your program. This is where you have to change the modular organization of your code to improve the way the program works:

Refactoring may involve moving code between modules as well as creating new modules, removing old ones, and changing the way modules work. In essence, refactoring is the process of rethinking the program so that it works better.
In all of these changes, the use of modules and packages help you to manage the changes you make. Because the various modules and packages each perform a well-defined task, you know exactly which parts of your program need to be changed, and you can limit the effects of your changes to only the affected modules and the parts of the system that use them.
Modular programming won't make change go away, but it will help you to deal with change—and the ongoing process of programming—in the best possible way.
One of the buzzwords used to describe Python is that it is a batteries included language, that is, it comes with a rich collection of built-in modules and packages called the Python Standard Library. If you've written any non-trivial Python program, you've almost certainly used modules from the Python Standard Library to do so. To get an idea of how vast the Python Standard Library is, here are a few example modules from this library:
Module |
Description |
---|---|
|
Defines classes to store and perform calculations using date and time values |
|
Defines a range of functions to work with temporary files and directories |
|
Supports reading and writing of CSV format files |
|
Implements cryptographically secure hashes |
|
Allows you to write log messages and manage log files |
|
Supports multi-threaded programming |
|
A collection of modules (that is, a package) used to parse and generate HTML documents |
|
A framework for creating and running unit tests |
|
A collection of modules to read data from URLs |
These are just a few of the over 300 modules available in the Python Standard Library. As you can see, there is a vast range of functionality provided, and all of this is built in to every Python distribution.
Because of the huge range of functionality provided, the Python Standard Library is an excellent example of modular programming. For example, the math
standard library module provides a range of mathematical functions that make it easier to work with integer and floating-point numbers. If you look through the documentation for this module (http://docs.python.org/3/library/math.html), you will find a large collection of functions and constants, all defined within the math
module, that perform almost any mathematical operation you could imagine. In this example, the various functions and constants are all defined within a single module, making it easy to refer to them when you need to.
In contrast, the xmlrpc
package allows you to make and respond to remote procedure calls that use the XML protocol to send and receive data. The xmlrpc
package is made up of two modules: xmlrpc.server
and xmlrpc.client
, where the server
module allows you to create an XML-RPC server, and the client
module includes code to access and use an XML-RPC server. This is an example of where a hierarchy of modules is used to logically group related functionality together (in this case, within the xmlrpc
package), while using sub-modules to separate out the particular parts of the package.
If you haven't already done so, it is worth spending some time to review the documentation for the Python Standard Library. This can be found at https://docs.python.org/3/library/. It is worth studying this documentation to see how Python has organized such a vast collection of features into modules and packages.
The Python Standard Library is not perfect, but it has been improved over time, and the library as it is today makes a great example of modular programming techniques applied to a comprehensive library, covering a wide range of features and functions.
Now that we've seen what modules are and how they can be used, let's implement our first real Python module. While this module is simple, you may find it a useful addition to the programs you write.
In computer programming, a cache is a way of storing previously calculated results so that they can be retrieved more quickly. For example, imagine that your program had to calculate shipping costs based on three parameters:
The weight of the ordered item
The dimensions of the ordered item
The customer's location
Calculating the shipping cost based on the customer's location might be quite involved. For example, you may have a fixed charge for deliveries within your city but charge a premium for out-of-town orders based on how far away the customer is. You may even need to send a query to a freight company's API to see how much it will charge to ship the given item.
Since the process of calculating the shipping cost can be quite complex and time consuming, it makes sense to use a cache to store the previously calculated results. This allows you to use the previously calculated results rather than having to recalculate the shipping cost each time. To do this, you would need to structure your calc_shipping_cost()
function to look something like the following:
def calc_shipping_cost(params): if params in cache: shipping_cost = cache[params] else: ...calculate the shipping cost. cache[params] = shipping_cost return shipping_cost
As you can see, we take the supplied parameters (in this case, the weight, dimensions, and the customer's location) and check whether there is already an entry in the cache for those parameters. If so, we retrieve the previously-calculated shipping cost from the cache. Otherwise, we go through the possibly time-consuming process of calculating the shipping cost, storing this in the cache using the supplied parameters, and then returning the shipping cost back to the caller.
Notice how the cache
variable in the preceding pseudo code looks very much like a Python dictionary—you can store entries in the dictionary based on a given key and then retrieve the entry using this key. There is, however, a crucial difference between a dictionary and a cache: a cache typically has a limit on the number of entries that it can contain, while the dictionary has no such limit. This means that a dictionary will continue to grow forever, possibly taking up all the computer's memory if the program runs for a long time, while a cache will never take too much memory, as the number of entries is limited.
Once the cache reaches its maximum size, an existing entry has to be removed each time a new entry is added so that the cache doesn't continue to grow:

While there are various ways of choosing the entry to remove, the most common way is to remove the least recently used entry, that is, the entry that hasn't been used for the longest period of time.
Caches are very commonly used in computer programs. In fact, even if you haven't yet used a cache in the programs you write, you've almost certainly encountered them before. Has someone ever suggested that you clear your browser's cache to solve a problem with your web browser? Yes, web browsers use a cache to hold previously downloaded images and web pages so that they don't have to be retrieved again, and clearing the contents of the browser cache is a common way of fixing a misbehaving web browser.
Let's now write our own Python module to implement a cache. Before we write it, let's think about the functionality that our cache module will require:
We're going to limit the size of our cache to 100 entries.
We will need an
init()
function to initialize the cache.We will have a
set(key, value)
function to store an entry in the cache.A
get(key)
function will retrieve an entry from the cache. If there is no entry for that key, this function should returnNone
.We'll also need a
contains(key)
function to check whether a given entry is in the cache.Finally, we'll implement a
size()
function which returns the number of entries in the cache.
Note
We are deliberately keeping the implementation of this module quite simple. A real cache would make use of a Cache
class to allow you to use multiple caches at once. It would also allow the size of the cache to be configured as necessary. To keep things simple, however, we will implement these functions directly within a module, as we want to concentrate on modular programming rather than combining it with object-oriented programming and other techniques.
Go ahead and create a new Python source file named cache.py
. This file will hold the Python source code for our new module. At the top of this module, enter the following Python code:
import datetime MAX_CACHE_SIZE = 100
We will be using the datetime
Standard Library module to calculate the least recently used entry in the cache. The second statement, defining MAX_CACHE_SIZE
, sets the maximum size for our cache.
Tip
Note that we are following the standard Python convention of defining constants using uppercase letters. This makes them easier to see in your source code.
We now want to implement the init()
function for our cache. To do this, add the following to the end of your module:
def init(): global _cache _cache = {} # Maps key to (datetime, value) tuple.
As you can see, we have created a new function named init()
. The first statement in this function, global _cache
, defines a new variable named _cache
. The global
statement makes this variable available as a module-level global variable, that is, this variable can be shared by all parts of the cache.py
module.
Notice the underscore character at the start of the variable name. In Python, a leading underscore is a convention indicating that a name is private. In other words, the _cache
global is intended to be used as an internal part of the cache.py
module—the underscore tells you that you shouldn't need to use this variable outside of the cache.py
module itself.
The second statement in the init()
function sets the _cache
global to an empty dictionary. Notice that we've added a comment explaining how the dictionary will be used; it's good practice to add notes like this to your code so others (and you, when you look at this code after a long time working on something else) can easily see what this variable is used for.
In summary, calling the init()
function has the effect of creating a private _cache
variable within the module and setting it to an empty dictionary. Let's now write the set()
function, which will use this variable to store an entry in the cache.
Add the following to the end of your module:
def set(key, value): global _cache if key not in _cache and len(_cache) >= MAX_CACHE_SIZE: _remove_oldest_entry() _cache[key] = [datetime.datetime.now(), value]
Once again, the set()
function starts with a global _cache
statement. This makes the _cache
module-level global variable available for the function to use.
The if
statement checks to see whether the cache is going to exceed the maximum allowed size. If so, we call a new function, named _remove_oldest_entry()
, to remove the oldest entry from the cache. Notice how this function name also starts with an underscore—once again, this indicates that this function is private and should only be used by code within the module itself.
Finally, we store the entry in the _cache
dictionary. Notice that we store the current date and time as well as the value in the cache; this will let us know when the cache entry was last used, which is important when we have to remove the oldest entry.
Let's now implement the get()
function. Add the following to the end of your module:
def get(key): global _cache if key in _cache: _cache[key][0] = datetime.datetime.now() return _cache[key][1] else: return None
You should be able to figure out what this code does. The only interesting part to note is that we update the date and time for the cache entry before returning the associated value. This lets us know when the cache entry was last used.
With these functions implemented, the remaining two functions should also be easy to understand. Add the following to the end of your module:
def contains(key): global _cache return key in _cache def size(): global _cache return len(_cache)
There shouldn't be any surprises here.
There's only one more function left to implement: our private _remove_oldest_entry()
function. Add the following to the end of your module:
def _remove_oldest_entry(): global _cache oldest = None for key in _cache.keys(): if oldest == None: oldest = key elif _cache[key][0] < _cache[oldest][0]: oldest = key if oldest != None: del _cache[oldest]
This completes the implementation of our cache.py
module itself, with the five main functions we described earlier, as well as one private function and one private global variable which are used internally to help implement our public functions.
Let's now write a simple test program to use this cache
module and verify that it's working properly. Create a new Python source file, which we'll call test_cache.py
, and add the following to this file:
import random import string import cache def random_string(length): s = '' for i in range(length): s = s + random.choice(string.ascii_letters) return s cache.init() for n in range(1000): while True: key = random_string(20) if cache.contains(key): continue else: break value = random_string(20) cache.set(key, value) print("After {} iterations, cache has {} entries".format(n+1, cache.size()))
This program starts by importing three modules: two from the Python Standard Library, and the cache
module we have just written. We then define a utility function named random_string()
, which generates a string of random letters of a given length. After this, we initialize the cache by calling cache.init()
and then generate 1,000 random entries to add to the cache. After adding each cache entry, we print out the number of entries we have added as well as the current cache size.
If you run this program, you can see that it's working as expected:
$ python test_cache.py After 1 iterations, cache has 1 entries After 2 iterations, cache has 2 entries After 3 iterations, cache has 3 entries ... After 98 iterations, cache has 98 entries After 99 iterations, cache has 99 entries After 100 iterations, cache has 100 entries After 101 iterations, cache has 100 entries After 102 iterations, cache has 100 entries ... After 998 iterations, cache has 100 entries After 999 iterations, cache has 100 entries After 1000 iterations, cache has 100 entries
The cache continues to grow until it reaches 100 entries, at which point the oldest entry is removed to make room for a new one. This ensures that the cache stays the same size, no matter how many new entries are added.
While there is a lot more we could do with our cache.py
module, this is enough to demonstrate how to create a useful Python module and then use it within another program. Of course, you aren't just limited to importing modules within a main program—modules can import other modules as well.
In this chapter, we introduced the concept of Python modules and saw how Python modules are simply Python source files, which are imported and used by another source file. We then took a look at Python packages and saw that these are collections of modules identified by a package initialization file named __init__.py
.
We explored how modules and packages can be used to organize your program's source code and why the use of these modular techniques is so important for the development of large systems. We also explored what spaghetti code looks like and discovered some of the other pitfalls that can occur if you don't modularize your programs.
Next, we looked at programming as a process of constant change and evolution and how modular programming can help deal with a changing codebase in the best possible way. We then learned that the Python Standard Library is an excellent example of a large collection of modules and packages, and finished by creating our own simple Python module that demonstrates effective modular programming techniques. In implementing this module, we learned how a module can use leading underscores in variable and function names to mark them as private to the module, while making the remaining functions and other definitions available for other parts of the system to use.
In the next chapter, we will apply modular techniques to the development of a more sophisticated program consisting of several modules working together to solve a more complex programming problem.