In this chapter, we will talk about Python modules, specifically covering the following topics:
- Using and importing modules and namespaces
- Implementing virtual Python environments
- Python package installation options
- Utilizing requirement files and resolving conflicts
- Using local patches and constraint files
- Working with packages
- Creating wheels and bundles
- Comparing source code to bytecode
- How to create and reference module packages
- Operating system-specific binaries
- How to upload programs to PyPI
- Project packaging
- Uploading to PyPI
Python modules are the highest-level components of Python programs. As suggested by their name, modules are modular, capable of being plugged in with other modules as part of an overall program to provide better separation of code while combining together to create a cohesive application.
Modules allow easy reuse of code, and provide separate namespaces to prevent variable shadowing between blocks of code. Variable shadowing involves having duplicate variables in different namespaces, possibly causing the interpreter to use an incorrect variable. Each Python file a developer creates is considered a separate module, allowing different files to be imported into a single, overall file that forms the final application.
Realistically, any Python file can be made a module by simply removing the .py
extension; this is most commonly seen when importing libraries. Python packages are collections of modules; what makes a package special is the inclusion of an __init__.py
file. We will cover the differences in detail later, so for now just recognize that there are several names for the same items.
A key point with modules is that they produce separate namespaces. A namespace (also called a scope) is simply the domain of control that a module, or component of a module, has. Normally, objects within a module are not visible outside that module, that is, attempting to call a variable located in a separate module will produce an error.
Namespaces are also used to segregate objects within the same program. For example, a variable defined within a function is only visible for use while operating within that function. Attempting to call that variable from another function will result in an error. This is why global variables are available; they can be called by any function and interacted with. This is also why global variables are frowned upon as a best practice because of the possibility of modifying a global variable without realizing it, causing a breakage later on in the program.
Scope essentially works inside-out. If a variable is called for use in a function, the Python interpreter will first look within that function for the variable's declaration. If it's not there, Python will move up the stack and look for a globally-defined variable. If not found there, Python will look in the built-in libraries that are always available. If still not found, Python will throw an error. In terms of flow, it looks something like this: local scope -> global scope -> built-in module -> error.
One slight change to the scope discovery process comes when importing modules. Imported modules will be examined for object calls as well, with the caveat that an error will still be generated unless the desired object is explicitly identified via dot-nomenclature.
For example, if you want to generate a random number between 0 and 1,000, you can't just call the randint()
function without importing the random
library. Once a module is imported, any publicly available classes, methods, functions, and variables can be used by expressly calling them with <module_name>
and <object_name>
. Following is an example of this:
In the preceding example, randint()
is first called on its own. Since it is not part of the normal Python built-in functions, the interpreter knows nothing about it, thus throwing an error.
However, after importing the random
library that actually contains the various random number generation functions, randint()
can then be explicitly called via dot-nomenclature, that is, random.randint()
. This tells the Python interpreter to look for randint()
within the random
library, resulting in the desired result.
To clarify, when importing modules into a program, Python assumes some things about namespaces. If a normal import is performed, that is, import foo
, then both the main program and foo
maintain their separate namespaces. To use a function within the foo
module, you have to expressly identify it using dot-nomenclature: foo.bar()
.
On the other hand, if part of a module is imported, for example, from foo import bar
, then that imported component becomes a part of the main program's namespace. This also happens if all components are imported using a wildcard: from foo import *
.
The following example shows these properties in action:
In the preceding example, the randint()
function from the random
module is expressly imported by itself; this importation puts randint()
within the main program's namespace. This allows randint()
to be called without having to clarify it as random.randint()
. However, when attempting to do the same thing with the randrange()
function, an error occurs because it wasn't imported.
To illustrate scope, we will create nested functions, where a function is defined and then called within an enclosing function:
nested_functions.py
includes a nested function, and ends with calling the nested function:
- First, call the parent function and checks the results:
- Next, call the nested function directly and notice that an error is received:
- To work with another module, import the desired module:
- Below, we call the
sin()
function from within the module in the form<module>.<function>
:
- Try calling a function, as demonstrated below, without using the dot-nomenclature to specify its library package results in an error:
- Alternatively, the example below shows how to import all items from a module using the
*
wildcard to place the items within the current program's namespace:
- A common way to run modules as scripts is to simply call the module explicitly from the command line, providing any arguments as necessary. This can be set up by configuring the module to accept command-line arguments, as shown in
print_funct.py
:
print_mult_args.py
shows that, if more than one argument is expected, and the quantity is known, each one can be specified using its respective index values in the arguments list:
- Alternatively, where the function can capture multiple arguments but the quantity is unknown, the
*args
parameter can be used, as shown below:
The location of a named assignment within the code determines its namespace visibility. In the preceding example, steps 1-3, if you directly call second_funct()
immediately after calling first_funct()
, you'll get an error stating second_funct()
is not defined. This is true, because globally, the second function doesn't exist; it's nested within the first function and can't be seen outside the first function's scope. Everything within the first function is part of its namespace, just as the value for x
within the second function can't be called directly but has to use the second_funct()
call to get its value.
In the preceding examples, step 4-7, the math
module is imported in its entirety, but it keeps its own namespace. Thus, calling math.sin()
provides a result, but calling sin()
by itself results in an error.
Then, the math
module is imported using a wildcard. This tells the Python interpreter to import all the functions into the main namespace, rather than keeping them within the separate math
namespace. This time, when sin()
is called by itself, it provides the correct answer.
This demonstrates the point that namespaces are important to keep code separated while allowing the use of the same variables and function names. By using dot-nomenclature, the exact object can be called with no fear of name shadowing causing the wrong result to be provided.
In preceding examples, steps 7-10, using sys.argv()
allows Python to parse command-line arguments and places them in a list for use. sys.argv([0])
is always the name of the program taking the arguments, so it can be safely ignored. All other arguments are stored in a list and can, therefore, be accessed by their index value.
Using *args
tells Python to accept any number of arguments, allowing the program to accept a varying number of input values. An alternative version, **kwargs
, does the same thing but with keyword:value pairs.
In addition to knowing about namespaces, there are some other important terms to know about when installing and working with modules:
- https://pypi.python.org/pypi is the primary database for third-party Python packages.
pip
is the primary installer program for third-party modules and, since Python 3.4, has been included by default with Python binary installations.- A virtual Python environment allows packages to be installed for a particular application's development, rather than being installed system-wide.
venv
has been the primary tool for creating virtual Python environments since Python 3.3. With Python 3.4, it automatically installspip
andsetuptools
in all virtual environments.- The following are common terms for Python files: module, package, library, and distribution. While they have distinct definitions (https://packaging.python.org/glossary/), this book will use them interchangeably at times.
The following is part of dice_roller.py
, an example of embedded tests from one of the first Python programs this author wrote when first learning Python:
In this example, we are simply creating a random number generator that simulates rolling different polyhedral dice (commonly used in role-playing games). The random
library is imported, then the function defining how the dice rolls are generated is created. For each die roll, the integer provided indicates how many sides the die has. With this method, any number of possible values can be simulated with a single integer input.
The key part of this program is at the end. The part if __name__ == "__main__"
tells Python that, if the namespace for the module is main
, that is, it is the main program and not imported into another program, then the interpreter should run the code below this line. Otherwise, when imported, only the code above this line is available to the main program. (It's also worth noting that this line is necessary for cross-platform compatibility with Windows.)
When this program is called from the command line, the sys
library is imported. Then, the first argument provided to the program is read from the command line and passed into the randomNumGen()
function as an argument. The result is printed to the screen. Following are some examples of results from this program:
Configuring a module in this manner is an easy way to allow a user to interface directly with the module on a stand-alone basis. It is also a great way to run tests on the script; the tests are only run when the file is called as a stand-alone, otherwise the tests are ignored. dice_roller_tests.py
is the full dice-rolling simulator that this author wrote:
This program builds on the previous random-dice program by allowing multiple dice to be added together. In addition, the test()
function only runs when the program is called by itself to provide a sanity check of the code. The test
function would probably be better if it wasn't in a function with the rest of the code, as it is still accessible when the module is imported, as shown below:
So, if you have any code you don't want to be accessible when the module is imported, make sure to include it below the line, as it were.