Home Programming Secret Recipes of the Python Ninja

Secret Recipes of the Python Ninja

books-svg-icon Book
eBook $43.99 $29.99
Print $54.99
Subscription $15.99 $10 p/m for three months
$10 p/m for first 3 months. $15.99 p/m after that. Cancel Anytime!
What do you get with a Packt Subscription?
This book & 7000+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook + Subscription?
Download this book in EPUB and PDF formats, plus a monthly download credit
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook?
Download this book in EPUB and PDF formats
Access this title in our online reader
DRM FREE - Read whenever, wherever and however you want
Online reader with customised display settings for better reading experience
What do you get with video?
Download this video in MP4 format
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with video?
Stream this video
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with Audiobook?
Download a zip folder consisting of audio files (in MP3 Format) along with supplementary PDF
What do you get with Exam Trainer?
Flashcards, Mock exams, Exam Tips, Practice Questions
Access these resources with our interactive certification platform
Mobile compatible-Practice whenever, wherever, however you want
BUY NOW $10 p/m for first 3 months. $15.99 p/m after that. Cancel Anytime!
eBook $43.99 $29.99
Print $54.99
Subscription $15.99 $10 p/m for three months
What do you get with a Packt Subscription?
This book & 7000+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook + Subscription?
Download this book in EPUB and PDF formats, plus a monthly download credit
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook?
Download this book in EPUB and PDF formats
Access this title in our online reader
DRM FREE - Read whenever, wherever and however you want
Online reader with customised display settings for better reading experience
What do you get with video?
Download this video in MP4 format
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with video?
Stream this video
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with Audiobook?
Download a zip folder consisting of audio files (in MP3 Format) along with supplementary PDF
What do you get with Exam Trainer?
Flashcards, Mock exams, Exam Tips, Practice Questions
Access these resources with our interactive certification platform
Mobile compatible-Practice whenever, wherever, however you want
  1. Free Chapter
    Working with Python Modules
About this book
This book covers the unexplored secrets of Python, delve into its depths, and uncover its mysteries. You’ll unearth secrets related to the implementation of the standard library, by looking at how modules actually work. You’ll understand the implementation of collections, decimals, and fraction modules. If you haven’t used decorators, coroutines, and generator functions much before, as you make your way through the recipes, you’ll learn what you’ve been missing out on. We’ll cover internal special methods in detail, so you understand what they are and how they can be used to improve the engineering decisions you make. Next, you’ll explore the CPython interpreter, which is a treasure trove of secret hacks that not many programmers are aware of. We’ll take you through the depths of the PyPy project, where you’ll come across several exciting ways that you can improve speed and concurrency. Finally, we’ll take time to explore the PEPs of the latest versions to discover some interesting hacks.
Publication date:
May 2018
Publisher
Packt
Pages
380
ISBN
9781788294874

 

Chapter 1. Working with Python Modules

In this chapter, we will talk about Python modules, specifically covering the following topics:

  • Using and importing modules and namespaces
  • Implementing virtual Python environments
  • Python package installation options
  • Utilizing requirement files and resolving conflicts
  • Using local patches and constraint files
  • Working with packages
  • Creating wheels and bundles
  • Comparing source code to bytecode
  • How to create and reference module packages
  • Operating system-specific binaries
  • How to upload programs to PyPI
  • Project packaging
  • Uploading to PyPI
 

Introduction


Python modules are the highest-level components of Python programs. As suggested by their name, modules are modular, capable of being plugged in with other modules as part of an overall program to provide better separation of code while combining together to create a cohesive application.

Modules allow easy reuse of code, and provide separate namespaces to prevent variable shadowing between blocks of code. Variable shadowing involves having duplicate variables in different namespaces, possibly causing the interpreter to use an incorrect variable. Each Python file a developer creates is considered a separate module, allowing different files to be imported into a single, overall file that forms the final application.

Realistically, any Python file can be made a module by simply removing the .py extension; this is most commonly seen when importing libraries. Python packages are collections of modules; what makes a package special is the inclusion of an __init__.py file. We will cover the differences in detail later, so for now just recognize that there are several names for the same items.

 

Using and importing modules and namespaces


A key point with modules is that they produce separate namespaces. A namespace (also called a scope) is simply the domain of control that a module, or component of a module, has. Normally, objects within a module are not visible outside that module, that is, attempting to call a variable located in a separate module will produce an error.

Namespaces are also used to segregate objects within the same program. For example, a variable defined within a function is only visible for use while operating within that function. Attempting to call that variable from another function will result in an error. This is why global variables are available; they can be called by any function and interacted with. This is also why global variables are frowned upon as a best practice because of the possibility of modifying a global variable without realizing it, causing a breakage later on in the program.

Scope essentially works inside-out. If a variable is called for use in a function, the Python interpreter will first look within that function for the variable's declaration. If it's not there, Python will move up the stack and look for a globally-defined variable. If not found there, Python will look in the built-in libraries that are always available. If still not found, Python will throw an error. In terms of flow, it looks something like this: local scope -> global scope -> built-in module -> error.

One slight change to the scope discovery process comes when importing modules. Imported modules will be examined for object calls as well, with the caveat that an error will still be generated unless the desired object is explicitly identified via dot-nomenclature.

For example, if you want to generate a random number between 0 and 1,000, you can't just call the randint() function without importing the random library. Once a module is imported, any publicly available classes, methods, functions, and variables can be used by expressly calling them with <module_name> and <object_name>. Following is an example of this:

>>> randint(0, 1000)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'randint' is not defined
>>> import random
>>> random.randint(0, 1000)
607

In the preceding example, randint() is first called on its own. Since it is not part of the normal Python built-in functions, the interpreter knows nothing about it, thus throwing an error.

However, after importing the random library that actually contains the various random number generation functions, randint() can then be explicitly called via dot-nomenclature, that is, random.randint(). This tells the Python interpreter to look for randint() within the random library, resulting in the desired result.

To clarify, when importing modules into a program, Python assumes some things about namespaces. If a normal import is performed, that is, import foo, then both the main program and foo maintain their separate namespaces. To use a function within the foo module, you have to expressly identify it using dot-nomenclature: foo.bar().

On the other hand, if part of a module is imported, for example, from foo import bar, then that imported component becomes a part of the main program's namespace. This also happens if all components are imported using a wildcard: from foo import *.

The following example shows these properties in action:

>>> from random import randint
>>> randint(0, 10)
2
>>> randrange(0, 25)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'randrange' is not defined

In the preceding example, the randint() function from the random module is expressly imported by itself; this importation puts randint() within the main program's namespace. This allows randint() to be called without having to clarify it as random.randint(). However, when attempting to do the same thing with the randrange() function, an error occurs because it wasn't imported.

How to do it...

To illustrate scope, we will create nested functions, where a function is defined and then called within an enclosing function:

  1. nested_functions.py includes a nested function, and ends with calling the nested function:
      >>> def first_funct():
      ...    x = 1
      ...    print(x)
      ...    def second_funct():
      ...        x = 2
      ...        print(x)
      ...    second_funct()
      ...
  1. First, call the parent function and checks the results:
      >>> first_funct()
      1
      2
  1. Next, call the nested function directly and notice that an error is received:
      >>> second_funct()
      Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      NameError: name 'second_funct' is not defined
  1. To work with another module, import the desired module:
      >>> import math
  1. Below, we call the sin() function from within the module in the form <module>.<function>:
      >>> math.sin(45)
      0.8509035245341184
  1. Try calling a function, as demonstrated below, without using the dot-nomenclature to specify its library package results in an error:
      >>> sin(45)
      Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
       NameError: name 'sin' is not defined
  1. Alternatively, the example below shows how to import all items from a module using the * wildcard to place the items within the current program's namespace:
      >>> from math import *
      >>> sin(45)
      0.8509035245341184
  1. A common way to run modules as scripts is to simply call the module explicitly from the command line, providing any arguments as necessary. This can be set up by configuring the module to accept command-line arguments, as shown in print_funct.py:
        def print_funct(arg):
            print(arg)
            if __name__ == "__main__":
                import sys
                print_funct(sys.argv[1])
  1. print_mult_args.py shows that, if more than one argument is expected, and the quantity is known, each one can be specified using its respective index values in the arguments list:
        def print_funct(arg1, arg2, arg3):
            print(arg1, arg2, arg3)
        if __name__ == "__main__":
            import sys
            print_funct(sys.argv[1], sys.argv[2], sys.argv[3])
  1. Alternatively, where the function can capture multiple arguments but the quantity is unknown, the *args parameter can be used, as shown below:
      >>> def print_input(*args):
      ...   for val, input in enumerate(args):
      ...       print("{}. {}".format(val, input))
      ...
      >>> print_input("spam", "spam", "eggs", "spam")
      0. spam
      1. spam
      2. eggs
      3. spam

How it works...

The location of a named assignment within the code determines its namespace visibility. In the preceding example, steps 1-3, if you directly call second_funct() immediately after calling first_funct(), you'll get an error stating second_funct() is not defined. This is true, because globally, the second function doesn't exist; it's nested within the first function and can't be seen outside the first function's scope. Everything within the first function is part of its namespace, just as the value for x within the second function can't be called directly but has to use the second_funct() call to get its value.

In the preceding examples, step 4-7, the math module is imported in its entirety, but it keeps its own namespace. Thus, calling math.sin() provides a result, but calling sin() by itself results in an error.

Then, the math module is imported using a wildcard. This tells the Python interpreter to import all the functions into the main namespace, rather than keeping them within the separate math namespace. This time, when sin() is called by itself, it provides the correct answer.

This demonstrates the point that namespaces are important to keep code separated while allowing the use of the same variables and function names. By using dot-nomenclature, the exact object can be called with no fear of name shadowing causing the wrong result to be provided.

In preceding examples, steps 7-10, using sys.argv() allows Python to parse command-line arguments and places them in a list for use. sys.argv([0]) is always the name of the program taking the arguments, so it can be safely ignored. All other arguments are stored in a list and can, therefore, be accessed by their index value.

Using *args tells Python to accept any number of arguments, allowing the program to accept a varying number of input values. An alternative version, **kwargs, does the same thing but with keyword:value pairs.

There's more...

In addition to knowing about namespaces, there are some other important terms to know about when installing and working with modules:

  • https://pypi.python.org/pypi is the primary database for third-party Python packages.
  • pip is the primary installer program for third-party modules and, since Python 3.4, has been included by default with Python binary installations.
  • A virtual Python environment allows packages to be installed for a particular application's development, rather than being installed system-wide.
  • venv has been the primary tool for creating virtual Python environments since Python 3.3. With Python 3.4, it automatically installs pip and setuptools in all virtual environments.
  • The following are common terms for Python files: module, package, library, and distribution. While they have distinct definitions (https://packaging.python.org/glossary/), this book will use them interchangeably at times.

The following is part of dice_roller.py, an example of embedded tests from one of the first Python programs this author wrote when first learning Python:

import random
def randomNumGen(choice):
    if choice == 1: #d6 roll
        die = random.randint(1, 6)
    elif choice == 2: #d10 roll
        die = random.randint(1, 10)
    elif choice == 3: #d100 roll
        die = random.randint(1, 100)
    elif choice == 4: #d4 roll
      die = random.randint(1, 4)
    elif choice == 5: #d8 roll
      die = random.randint(1, 8)
    elif choice == 6: #d12 roll
      die = random.randint(1, 12)
    elif choice == 7: #d20 roll
      die = random.randint(1, 20)
    else: #simple error message
        return "Shouldn't be here. Invalid choice"
    return die
if __name__ == "__main__":
    import sys
    print(randomNumGen(int(sys.argv[1])))

In this example, we are simply creating a random number generator that simulates rolling different polyhedral dice (commonly used in role-playing games). The random library is imported, then the function defining how the dice rolls are generated is created. For each die roll, the integer provided indicates how many sides the die has. With this method, any number of possible values can be simulated with a single integer input.

The key part of this program is at the end. The part if __name__ == "__main__" tells Python that, if the namespace for the module is main, that is, it is the main program and not imported into another program, then the interpreter should run the code below this line. Otherwise, when imported, only the code above this line is available to the main program. (It's also worth noting that this line is necessary for cross-platform compatibility with Windows.)

When this program is called from the command line, the sys library is imported. Then, the first argument provided to the program is read from the command line and passed into the randomNumGen() function as an argument. The result is printed to the screen. Following are some examples of results from this program:

$ python3 dice_roller.py 1
2
$ python3 dice_roller.py 2
10
$ python3 dice_roller.py 3
63
$ python3 dice_roller.py 4
2
$ python3 dice_roller.py 5
5
$ python3 dice_roller.py 6
6
$ python3 dice_roller.py 7
17
$ python3 dice_roller.py 8
Shouldn't be here. Invalid choice

Configuring a module in this manner is an easy way to allow a user to interface directly with the module on a stand-alone basis. It is also a great way to run tests on the script; the tests are only run when the file is called as a stand-alone, otherwise the tests are ignored. dice_roller_tests.py is the full dice-rolling simulator that this author wrote:

import random #randint
def randomNumGen(choice):
    """Get a random number to simulate a d6, d10, or d100 roll."""
    
    if choice == 1: #d6 roll
      die = random.randint(1, 6)
    elif choice == 2: #d10 roll
        die = random.randint(1, 10)
    elif choice == 3: #d100 roll
        die = random.randint(1, 100)
    elif choice == 4: #d4 roll
      die = random.randint(1, 4)
    elif choice == 5: #d8 roll
      die = random.randint(1, 8)
    elif choice == 6: #d12 roll
      die = random.randint(1, 12)
    elif choice == 7: #d20 roll
      die = random.randint(1, 20)
    else: #simple error message
        return "Shouldn't be here. Invalid choice"
    return die
def multiDie(dice_number, die_type):
    """Add die rolls together, e.g. 2d6, 4d10, etc."""
    
#---Initialize variables 
    final_roll = 0
    val = 0
    
    while val < dice_number:
        final_roll += randomNumGen(die_type)
        val += 1
    return final_roll
def test():
    """Test criteria to show script works."""
    
    _1d6 = multiDie(1,1) #1d6
    print("1d6 = ", _1d6, end=' ') 
    _2d6 = multiDie(2,1) #2d6
    print("\n2d6 = ", _2d6, end=' ')
    _3d6 = multiDie(3,1) #3d6
    print("\n3d6 = ", _3d6, end=' ')
    _4d6 = multiDie(4,1) #4d6
    print("\n4d6 = ", _4d6, end=' ')
    _1d10 = multiDie(1,2) #1d10
    print("\n1d10 = ", _1d10, end=' ')
    _2d10 = multiDie(2,2) #2d10
    print("\n2d10 = ", _2d10, end=' ')
    _3d10 = multiDie(2,2) #3d10
    print("\n3d10 = ", _3d10, end=' ')
    _d100 = multiDie(1,3) #d100
    print("\n1d100 = ", _d100, end=' ') 
    
if __name__ == "__main__": #run test() if calling as a separate program
    test()

This program builds on the previous random-dice program by allowing multiple dice to be added together. In addition, the test() function only runs when the program is called by itself to provide a sanity check of the code. The test function would probably be better if it wasn't in a function with the rest of the code, as it is still accessible when the module is imported, as shown below:

>>> import dice_roller_tests.py
>>> dice_roller_tests.test()
1d6 = 1 
2d6 = 8 
3d6 = 10 
4d6 = 12 
1d10 = 5 
2d10 = 8 
3d10 = 6 
1d100 = 26

So, if you have any code you don't want to be accessible when the module is imported, make sure to include it below the line, as it were.

 

Implementing virtual Python environments


As touched on previously, Python virtual environments create separate Python environments, much like virtual machines allow multiple but separate operating systems. Python virtual environments are particularly useful when installing multiple instances of the same module.

For example, assume you are working on a project that requires version 1.2 of a particular library module for legacy support. Now assume you download a Python program that uses version 2.2 of the same library. If you install everything in the default global location on your hard drive, for example, /usr/lib/python3.6/site-packages, the new program will install the updated library into the same location, overwriting the legacy software. Since you were using an old library for legacy support, there's a good chance that the updated library will break your application.

Also, on shared systems (especially if you don't have admin rights), there is a strong possibility that you simply can't install modules on the system, at least not in the default global site-packages directory. You may luck out and be able to install software for your account but, if you can't, you have to either request permission to install it or go without.

This is where virtual Python environments come into play. Each environment has its own installation directories and there is no sharing of libraries between environments. This means that each version of a module within an environment stays the same, even if you update global libraries. It also means you can have multiple versions of modules installed on your computer at the same time without having conflicts.

Virtual environments have their own shells as well, allowing access to an OS shell that is independent of any other environment or the underlying operating system. This recipe also shows how to spawn a new Python shell from pipenv. Doing this ensures all commands will have access to the installed packages within the virtual environment.

Getting ready

The old way to manage virtual environments was with the venv tool. To install it, use the command sudo apt install python3-venv.

To manage virtual environments in a modern way, the pipenv module (https://docs.pipenv.org/) was developed; it automatically creates and manages virtual environments for projects, as well as adding and removing packages from Pipfile when you install/uninstall packages. It can be installed using pip install pipenv.

Pipfile is an alternative to requirements.txt, which is used to specify exact versions of modules to include in a program. Pipfile actually comprises two separate files: Pipfile and (optionally) Pipfile.lock. Pipfile is simply a listing of the source location of imported modules, the module names themselves (defaulting to the most recent version), and any development packages that are required. pipfile.py, below, is an example of a Pipfile from the Pipenv site (https://docs.pipenv.org/basics/#example-pipfile-pipfile-lock):

[[source]]
url = "https://pypi.python.org/simple"
verify_ssl = true
name = "pypi"

[packages]
requests = "*"


[dev-packages]
pytest = "*"

Pipfile.lock takes the Pipfile and sets actual version numbers to all the packages, as well as identifying specific hashes for those files. Hashed values are beneficial to minimize security risks; that is, if a particular module version has a vulnerability, its hash value allows it to be easily identified, rather than having to search by version name or some other method. pipfile_lock.py, below, is an example of a Pipfile.lock file from the Pipenv site (https://docs.pipenv.org/basics/#example-pipfile-pipfile-lock):

{
  "_meta": {
    "hash": {
      "sha256": "8d14434df45e0ef884d6c3f6e8048ba72335637a8631cc44792f52fd20b6f97a"
    },
    "host-environment-markers": {
      "implementation_name": "cpython",
      "implementation_version": "3.6.1",
      "os_name": "posix",
      "platform_machine": "x86_64",
      "platform_python_implementation": "CPython",
      "platform_release": "16.7.0",
      "platform_system": "Darwin",
      "platform_version": "Darwin Kernel Version 16.7.0: Thu Jun 15 17:36:27 PDT 2017; root:xnu-3789.70.16~2/RELEASE_X86_64",
      "python_full_version": "3.6.1",
      "python_version": "3.6",
      "sys_platform": "darwin"
    },
    "pipfile-spec": 5,
    "requires": {},
    "sources": [
      {
        "name": "pypi",
        "url": "https://pypi.python.org/simple",
        "verify_ssl": true
      }
    ]
  },
  "default": {
    "certifi": {
      "hashes": [
        "sha256:54a07c09c586b0e4c619f02a5e94e36619da8e2b053e20f594348c0611803704",
        "sha256:40523d2efb60523e113b44602298f0960e900388cf3bb6043f645cf57ea9e3f5"
      ],
      "version": "==2017.7.27.1"
    },
    "chardet": {
      "hashes": [
         "sha256:fc323ffcaeaed0e0a02bf4d117757b98aed530d9ed4531e3e15460124c106691",
         "sha256:84ab92ed1c4d4f16916e05906b6b75a6c0fb5db821cc65e70cbd64a3e2a5eaae"
      ],
      "version": "==3.0.4"
    },
***further entries truncated***

How to do it...

  1. The original, normal way to create a virtual environment comprises three separate steps. First, the virtual environment is created:
      >>> python3 -m venv <dir_name>
  1. Next, the virtual environment is activated so it can be used:
      >>> source <dir_name>/bin/activate
  1. Finally, pip is used to install the necessary module:
      >>> pip install <module>
  1. To make this process easier, pipenv combines the pip and venv calls, so first we have to move to the desired directory where the virtual environment will be placed:
      >>> cd <project_name>
  1. Next, we simply call pipenv to create the environment and install the desired module:
      >>> pipenv install <module>
  1. Use pipenv to call the shell command and wait for the shell to be created. Observe that a virtual environment has been created and the command prompt is now activated within the environment. The following screenshot includes the commands from the previous steps, for clarity:

How it works...

The preceding pipenv example shows the developer changing to the desired directory for the project, and then invoking pipenv to simultaneously create the virtual environment, activate it, and install the desired module.

In addition to creating the virtual environment, once you have created your Python program, you can run the program using pipenv as well:

>>> pipenv run python3 <program_name>.py

Doing this ensures all installed packages in the virtual environment are available to your program, thus reducing the likelihood of unexpected errors.

When launching a pipenv shell, a new virtual environment is created, with indications of where the environment is created in the file system. In this case, two environment executables are created, referencing both the Python 3.6 command and the default Python command. (Depending on the systems, these may actually reference different versions of Python. For example, the default Python command may call the Python 2.7 environment instead of Python 3.6.)

There's more...

On a side note, the -m option indicates that Python is to run the module as a stand-alone script, that is, its contents will be ran within the __main__ namespace. Doing this means you don't have to know the full path to the module, as Python will look for the script in sys.path. In other words, for modules that you would normally import into another Python file can be run directly from the command line.

In the example of running pipenv, the command takes advantage of the fact that Python allows the -m option to run a module directly or allow it to be imported; in this case, pipenv imports venv to create the virtual environment as part of the creation process.

 

Python package installation options


Installing packages normally happens by looking at http://pypi.python.org/pypi for the desired module, but pip supports installing from version control, local projects, and from distribution files as well.

Python wheels are pre-built archives that can speed up the package installation process compared to installing from source files. They can be compared to installing pre-made binary applications for an operating system rather than building and installing source files.

Wheels were developed to replace Python eggs, which performed wheels' functions before the new packaging standards were developed. Wheels improve on eggs by specifying the .dist-info directory (a database of installed Python packages that is very close to the on-disk format) and by implementing package metadata (which helps identify software dependencies).

pip installs from wheels whenever possible, though this feature can be disabled using pip install --no-binary. If wheel files aren't available, pip will look for source files. Wheels can be downloaded from PyPI manually or pulled from a local repository; just tell pip where the local file is located.

How to do it...

  1. Use pip to pull the latest version of the package directly from PyPI:
      $ pip install <package_name>
  1. Alternately, a specific version of the package can be downloaded:
      $ pip install <package_name>==1.2.2

Here is an example of downgrading pygments from our earlier install in pipenv:

  1. As a final option, a minimum version of a package can be downloaded; this is common when a package has a significant change between versions:
      $ pip install "<package_name> >= 1.1"
  1. If a PyPI package has a wheel file available, pip will automatically download the wheel; otherwise, it will pull the source code and compile it.
      $ pip install <some_package>
  1. To install a local wheel file, provide the full path to the file:
      $ pip install /local_files/SomePackage-1.2-py2.py3-none-any.whl

How it works...

The wheel file name format breaks down to <package_name>-<version>-<language_version>-<abi_tag>-<platform_tag>.whl. The package name is the name of the module to be installed, followed by the version of this particular wheel file.

The language version refers to Python 2 or Python 3; it can be as specific as necessary, such as py27 (any Python 2.7.x version) or py3 (any Python 3.x.x version).

The ABI tag refers to the Application Binary Interface. In the past, the underlying C API (Application Programming Interface) that the Python interpreter relies on changed with every release, typically by adding API features rather than changing or removing existing APIs. The Windows OS is particularly affected, where each Python feature release creates a new name for the Python Window's DLL.

The ABI refers to Python's binary compatibility. While changes to Python structure definitions may not break API compatibility, ABI compatibility may be affected. Most ABI issues occur from changes in the in-memory structure layout.

Since version 3.2, a limited set of API features has been guaranteed to be stable for the ABI. Specifying an ABI tag allows the developer to specify which Python implementations a package is compatible with, for example, PyPy versus CPython. Generally speaking, this tag is set to none, implying there is no specific ABI requirement.

The platform tag specifies which OS and CPU the wheel package is designed to run. This is normally any, unless the wheel's developer had a particular reason to limit the package to a specific system type.

 

Utilizing requirement files and resolving conflicts


As mentioned previously, a requirements file, requirements.txt, can be created to provide a list of packages to install all at once, via pip install -r requirements.txt. The requirements file can specify specific or minimum versions, or simply specify the library name and the latest version will be installed.

It should be noted that files pulled from the requirements file aren't necessarily installed in a particular order. If you require certain packages to be installed prior to others, you will have to take measures to ensure that the installation is sequential, such as having multiple pip install calls.

Requirements files can specify version numbers of packages explicitly. For example, two different modules (m1 and m2) both depend on a third module (m3). The module m1 requires m3 to be at least version 1.5, but m2 requires it to be no later than version 2.0; the current version of m3 is 2.3. In addition, the latest version of m2 (version 1.7) is known to contain a bug.

Hash digests can be used in requirements files to verify downloaded packages to guard against a compromise of the PyPI database or the HTTPS certificate chain. This is actually a good thing, as in 2017 ten Python libraries (https://www.bleepingcomputer.com/news/security/ten-malicious-libraries-found-on-pypi-python-package-index/) uploaded to PyPI were found to be hosting malicious files.

Because PyPI does not perform any security checks or code auditing when packages are uploaded, it is actually very easy to upload malicious software.

How to do it...

  1. Manually create requirements.txt by typing in the packages to include in the project. The following is an example from https://pip.pypa.io/en/latest/reference/pip_install/#requirements-file-format:

  1. Alternatively, runpip freeze > requirements.txt. This automatically directs the currently installed packages to a properly formatted requirements file.
  2. To implement hash-checking mode, simply include the digest with the package name in the requirements file, demonstrated below:
      FooProject == 1.2 --hash=sha256:<hash_digest>

Note

Note: Supported hash algorithms include: md5, sha1, sha224, sha384, sha256, and sha512.

  1. If there are module conflicts, or special versioning is needed, provide the first module required:
      m1
  1. Indicate the second module, but ensure the version installed pre-dates the known bad version:
      m2<1.7
  1. Provide the third module, ensuring it is at least equal to the minimum version required, but no greater than the maximum version that can be used:
     m3>=1.5, <=2.0

While the preceding screenshot shows some version specifier requirements, here is an example showing some of the different ways to specify module versions in requirements.txt:

        flask
        flask-pretty == 0.2.0
        flask-security <= 3.0
        flask-oauthlib >= 0.9.0, <= 0.9.4

How it works...

In this example, module m1 is specified as a requirement, but the version number doesn't matter; in this case, pip will install the latest version. However, because of the bug in the latest version of m2, an earlier version is specified to be installed. Finally, m3 must be a version between 1.5 and 2.0 to satisfy the installation. Naturally, if one of these conditions can't be met, the installation will fail and the offending library and version numbers will be displayed for further troubleshooting.

There's more...

It's worth noting that pip doesn't have true dependency resolution; it will simply install the first file specified. Thus, it is possible to have dependency conflicts or a sub-dependency that doesn't match the actual requirement. This is why a requirements file is useful, as it alleviates some dependency problems.

Verifying hashes also ensures that a package can't be changed without its version number changing as well, such as in an automated server deployment. This is an ideal situation for efficiency, as it eliminates the need for a private index server that maintains only approved packages.

 

Using local patches and constraint files


The benefit of open-source software is the ability to view and modify source code. If you are working on a project and create a local version of a PyPI module, such as customizing for a project or creating a patch, requirements.txt can be used to override the normal download of the file.

Constraints files are a modification of requirements files that simply indicate what version of a library is installed, but they don't actually control the installation of files.

One example of using a constraints file is when using a local patched version of a PyPI module, for example, ReqFile. Some software packages downloaded from PyPI rely on ReqFile, but other packages don't. Rather than writing a requirements file for every single package from PyPI that depends on ReqFile, a constraints file can be created as a master record and implemented across all Python projects. Any package being installed that requires ReqFile will see the constraints file and install from the local repository, rather than from PyPI.

In this manner, a single file can be used by every developer and it no longer matters what a PyPI package depends on; the correct version will either be pulled down from PyPI, or the local version will be used as needed.

How to do it...

  1. Tag the in-house version of the file. Assuming you are using Git, a tag is generated by using the following:
      git tag -a <tag_name> -m "<tag_message>"
      # git tag -a v0.3 -m "Changed the calculations"
  1. Upload it to the version control system.
  2. Indicate the local version in the requirements.txt file, as shown in the following example:
      git+https://<vcs>/<dependency>@<tag_name>#egg=<dependency>
      # git+https://gitlab/pump_laws@v0.3#egg=pump_laws
  1. Write the constraints.txt file in the same manner as a requirements.txt file. The following example comes from https://github.com/mldbai/mldb (this was released under the Apache v2.0 license by MLDB.ai):
      # math / science / graph stuff
      bokeh==0.11.1
      numpy==1.10.4
      pandas==0.17.1
      scipy==0.17.0
      openpyxl==2.3.3
      patsy==0.4.1
      matplotlib==1.5.1
      ggplot==0.6.8
      Theano==0.7.0
      seaborn==0.7.0
      scikit-learn==0.17

      pymldb==0.8.1
      pivottablejs==0.1.0

      # Progress bar
      tqdm==4.11.0

      # notebook and friends
      ipython==5.1.0
      jupyter==1.0.0
      jupyter-client==4.4.0
      jupyter-console==5.0.0
      jupyter-core==4.2.1

      # validator
      uWSGI==2.0.12
      pycrypto==2.6.1

      tornado==4.4.2

      ## The following requirements were added by pip freeze:
      backports-abc==0.5
      backports.shutil-get-terminal-size==1.0.0
      backports.ssl-match-hostname==3.5.0.1
      bleach==1.5.0

      ***further files truncated***
  1. Next, run the command, pip install -c constraints.txt, to make the file available to Python.

How it works...

In the preceding example, <vcs> is the version control system being used; it could be a local server or an online service such as, GitHub. <tag_name> is the version control tag used to identify this particular update to the control system.

If a required dependency was a top-level requirement for the project, then that particular line in the requirements file can simply be replaced. If it is a sub-dependency of another file, then the above command would be added as a new line.

There's more...

Constraints files differ from requirements files in one key way: putting a package in the constraints file does not cause the package to be installed, whereas a requirements file will install all packages listed. Constraints files are simply requirements files that control which version of a package will be installed, but provide no control over the actual installation.

 

Working with packages


There are a variety of utilities available to work with Python packages. Every so often, a developer needs to uninstall Python packages from a system. Uninstalling packages is as easy as installing them.

As it is easy to install packages and forget what has been installed in the past, pip provides the ability to list all currently installed packages, as well as indicating which ones are out of date. The examples in the next section are from the Python list (https://pip.pypa.io/en/stable/reference/pip_list/) and show documentation pages (https://pip.pypa.io/en/stable/reference/pip_show/).

Finally, when looking for packages to install, rather than opening a browser and navigating to PyPI directly, it is possible to find packages from the command line.

How to do it...

  1. To uninstall packages, run the pip uninstall <package_name> command. This will uninstall most packages on the system.
  2. Requirements files can be used to remove a number of packages at once, by using the -r option, such as pip uninstall -r <requirements_file>. The -y option allows for automatic confirmation of file removal.
  1. List currently installed packages by running pip list.
  1. To show packages that are outdated, use pip list --outdated, as follows:
      $ pip list --outdated
      docutils (Current: 0.10 Latest: 0.11)
      Sphinx (Current: 1.2.1 Latest: 1.2.2)

While it is possible to update all outdated packages at once, this is not available within pip itself. There are two primary options: the first involves using sedawk, or grep to walk through the list of packages, find the outdated packages, and update them. Alternatively, install the package pip-review to see outdated packages and update them. In addition, a number of other tools have been created by different developers, as well as instructions on how to do it yourself, so you should decide which works best for you.

Note

Note: Automatically upgrading all Python packages can break dependencies. You should only update packages on an as-needed basis.

  1. Details of a particular installed package can be shown using pip show <package_name>, as follows:
      $ pip show sphinx
      Name: Sphinx
      Version: 1.7.2
      Summary: Python documentation generator
      Home-page: http://sphinx-doc.org/
      Author: Georg Brandl
      Author-email: georg@python.org
      License: BSD
      Location: /my/env/lib/python2.7/site-packages
      Requires: docutils, snowballstemmer, alabaster, Pygments, 
                imagesize, Jinja2, babel, six
  1. Run the command pip search "query_string". The example below comes from https://pip.pypa.io/en/stable/reference/pip_search/, and shows how the output looks:
      $ pip search peppercorn
      pepperedform    - Helpers for using peppercorn with formprocess.
      peppercorn      - A library for converting a token stream into [...]

How it works...

When searching for packages, the query can be a package name or simply a word, as pip will find all packages with that string in the package name or in the package description. This is a useful way to locate a package if you know what you want to do but don't know the actual name of the package.

There's more...

Packages installed with python setup.py install, and program wrappers that were installed using python setup.py develop, cannot be uninstalled via pip, as they do not provide metadata about which files were installed.

A number of other options are available for listing files, such as listing only non-global packages, beta versions of packages, outputting the list in columns, and other tools that may prove useful.

Additional information can be shown by using the --verbose option, as shown in the following screenshot:

The verbose option shows the same information as the default mode, but also includes such information as the classifier information that would found on the package's PyPI page. While this information could obviously be found simply by going to the PyPI site, if you are on a stand-alone computer or otherwise unable to connect to the internet, this can be useful when figuring out whether a package is supported by our current environment or when looking for similar packages within a particular topic.

 

Creating wheels and bundles


pip wheel allows the developer to bundle all project dependencies, along with any compiled files, into a single archive file. This is useful for installing when index servers aren't available, and eliminates recompiling code. However, recognize that compiled packages are normally OS- and architecture-specific, as they are normally C code, meaning they are generally not portable across different systems without recompiling. This is also a good use of hash-checking to ensure future wheels are built with identical packages.

How to do it...

To create an archive (from the official documentation: https://pip.pypa.io/en/latest/user_guide/#installation-bundles), perform the following:

  1. Create a temporary directory:
      $ tempdir = $(mktemp -d /tmp/archive_dir)
  1. Create a wheel file:
      $ pip wheel -r requirements.txt --wheel-dir = $tempdir
  1. Let the OS know where to place the archive file:
      $ cwd = `pwd`
  1. Change to the temporary directory and create the archive file:
      $ (cd "$tempdir"; tar -cjvf "$cwd/<archive>.tar.bz2" *)

To install from an archive, do the following:

  1. Create a temporary directory:
      $ tempdir=$(mktemp -d /tmp/wheelhouse-XXXXX)
  1. Change to the temporary directory and unarchive the file:
      $ (cd $tempdir; tar -xvf /path/to/<archive>.tar.bz2)
  1. Use pip to install the unarchived files:
      $ pip install --force-reinstall --ignore-installed --upgrade --no-index --no-deps $tempdir/*

How it works...

In the first example (creating an archive), a temporary directory is first made, then the wheel is created using a requirements file and placed in the temporary directory. Next, the cwd variable is created and set equal to the present working directory (pwd). Finally, a combined command is issued, changing to the temporary directory, and creating an archive file in cwd of all the files in the temporary directory.

In the second example (installing from an archive), a temporary directory is created. Then, a combined command is given to change to that temporary directory and extract the files that make up the archive file. Then, using pip, the bundled files are used to install the Python program onto the computer in the temporary directory.

There's more...

--force-reinstall will reinstall all packages when upgrading, even if they are already current. --ignore-installed forces a reinstall, ignoring whether the packages are already present. --upgrade upgrades all specified packages to the newest version available. --no-index ignores the package index and only looks at at URLs to parse for archives. --no-deps ensures that no package dependencies are installed.

 

Comparing source code to bytecode


Interpreted languages, such as Python, typically take raw source code and generate bytecode. Bytecode is encoded instructions that are on a lower level than source code but not quite as optimized as machine code, that is, assembly language.

Bytecode is often executed within the interpreter (which is a type of virtual machine), though it can also be compiled further into assembly language. Bytecode is used primarily to allow easy, cross-platform compatibility. Python, Java, Ruby, Perl, and similar languages, are examples of languages that use bytecode interpreters for different architectures while the source code stays the same.

While Python automatically compiles source code into bytecode, there are some options and features that can be used to modify how the interpreter works with bytecode. These options can improve the performance of Python programs, a key feature as interpreted languages are, by nature, slower than compiled languages

How to do it...

  1. To create bytecode, simply execute a Python program via python <program>.py.
  2. When running a Python command from the command line, there are a couple of switches that can reduce the size of the compiled bytecode. Be aware that some programs may expect the statements that are removed from the following examples to function correctly, so only use them if you know what to expect.

-O removes assertstatements from the compiled code. These statements provide some debugging help when testing the program, but generally aren't required for production code.

-OO removes both assert and __doc__ strings for even more size reduction.

  1. Loading programs from bytecode into memory is faster than with source code, but actual program execution is no faster (due to the nature of the Python interpreter).
  2. The compileall module can generate bytecode for all modules within a directory. More information on the command can be found at https://docs.python.org/3.6/library/compileall.html.

How it works...

When source code (.py) is read by the Python interpreter, the bytecode is generated and stored in __pycache__ as <module_name>.<version>.pyc. The .pyc extension indicates that it is compiled Python code. This naming convention is what allows different versions of Python code to exist simultaneously on the system.

When source code is modified, Python will automatically check the date with the compiled version in cache and, if it's out of date, will automatically recompile the bytecode. However, a module that is loaded directly from the command line will not be stored in __pycache__ and is recompiled every time. In addition, if there is no source module, the cache can't be checked, that is, a bytecode-only package won't have a cache associated with it.

There's more...

Because bytecode is platform-independent (due to being run through the platform's interpreter), Python code can be released either as .py source files or as .pyc bytecode. This is where bytecode-only packages come into play; to provide a bit of obfuscation and (subjective) security, Python programs can be released without the source code and only the pre-compiled .pyc files are provided. In this case, the compiled code is placed in the source directory rather than the source-code files.

 

How to create and reference module packages


We have talked about modules and packages, using the terms interchangeably. However, there is a difference between a module and a package: packages are actually collections of modules and they include a __init__.py file, which can just be an empty file.

The dot-nomenclature used in modules to access specific functions or variables is also used in packages. This time, dotted names allow multiple modules within a package to be accessed without having name conflicts; each package creates its own namespace, and all the modules have their own namespaces.

When packages contain sub-packages (as in the following example), importing modules can be done with either absolute or relative paths. For example, to import the sepia.py module, one could import it with an absolute path: from video.effects.specialFX import sepia.

How to do it...

  1. When making a package, follow the normal filesystem hierarchy in terms of directory structure; that is, modules that relate to each other should be placed in their own directory.
  2. A possible package for a video file handler is shown in package_tree.py:
      video/                  # Top-level package
          __init__.py         # Top-level initialization
          formats/            # Sub-package for file formats
              __init__.py     # Package-level initialization
              avi_in.py
              avi_out.py
              mpg2_in.py
              mpg2_out.py
              webm_in.py
              webm_out.py
          effects/             # Sub-package for video effects
              specialFX/       # Sub-package for special effects
                  __init__.py
                  sepia.py
                  mosaic.py
                  old_movie.py
                  glass.py
                  pencil.py
                  tv.py
              transform/        # Sub-package for transform effects
                  __init__.py
                  flip.py
                  skew.py
                  rotate.py
                  mirror.py
                  wave.py
                  broken_glass.py
              draw/              # Sub-package for draw effects
                  __init__.py
                  rectangle.py
                  ellipse.py
                  border.py
                  line.py
                  polygon.py
  1. But, what happens if you were already in the specialFX/ directory and wanted to import from another package? Use relative paths to walk the directory and import using dots, just like changing directories on the command-line:
      from . import mosaic
      from .. import transform
      from .. draw import rectangle

How it works...

In this example, the whole video package comprises two sub-packages, video formats and video effects, with video effects having several sub-packages of its own. Within each package, each .py file is a separate module. During module importation, Python looks for packages on sys.path.

The inclusion of the __init__.py files is necessary so Python will treat the directories as packages. This prevents directories with common names from shadowing Python modules further along the search path. They also allow calling modules as stand-alone programs via the -m option, when calling Python programs.

Initialization files are normally empty but can contain initialization code for the package. They can also contain an __all__ list, which is a Python list of modules that should be imported whenever from <package> import * is used.

The reason for __all__ is for the developer to explicitly indicate which files should be imported. This is to prevent excessive delay from importing all modules within a package that aren't necessarily needed for other developers. It also limits the chance of undesired side-effects when a module is inadvertently imported. The catch is, the developer needs to update the __all__ list every time the package is updated.

Relative imports are based on the name of the current module. As the main module for a program always has the name "__main__", any modules that will be the main module of an application must use absolute imports.

To be honest, it is generally safer to use absolute imports just to make sure you know exactly what you're importing; with most development environments nowadays providing suggestions for paths, it is just as easy to write out the auto-populated path as it is to use relative paths.

There's more...

If __all__ is not defined in __init__.py, then import * only imports the modules within the specified package, not all sub-packages or their modules. For example, from video.formats import * only imports the video formats; the modules in the effects/ directory will not be included.

This is a best practice for Python programmers: as the Zen of Python (https://www.python.org/dev/peps/pep-0020/) states, explicit is better than implicit. Thus, importing a specific sub-module from a package is a good thing, whereas import * is frowned upon because of the possibility of variable name conflicts.

Packages have the __path__ attribute, which is rarely used. This attribute is a list that has the name of the directory where the package's __init__.py file is located. This location is accessed before the rest of the code for the file is run.

Modifying the package path affects future searches for modules and sub-packages within the package. This is useful when it is necessary to extend the number of modules found during a package search.

 

Operating system-specific binaries


Python programs are normally provided in source code or wheel files. However, there are times when a developer wants to provide OS-specific files, such as a Windows .exe, for ease of installation. Python has a number of options for developers to create stand-alone executable files.

py2exe (https://pypi.python.org/pypi/py2exe/) is one option for creating Windows-specific files. Unfortunately, it is difficult to tell how maintained this project is, as the last release on https://pypi.python.org/pypi/py2exe/0.9.2.2 was in 2014, while http://www.py2exe.org references a release from 2008. It also appears to be only available for Python 3.4 and older versions. However, if you believe this program may be useful, it does convert Python scripts into Windows executables without requiring the installation of Python.

py2app (https://py2app.readthedocs.io/en/latest/) is the primary tool for creating stand-alone Mac bundles. This tool is still maintained at https://bitbucket.org/ronaldoussoren/py2app, and the latest release came out in January 2018. Building is much like with py2exe, but there are several library dependencies required, listed at https://py2app.readthedocs.io/en/latest/dependencies.html.

There are more cross-platform tools for making OS-specific executable programs than there are for specific operating systems. This is good, as many developers use Linux as their development environment and may not have access to a Windows or Mac machine.

For developers who don't want to set up multiple operating systems themselves, there are several online services that allow you to rent operating systems online. For example, http://virtualmacosx.com allows you access to a hosted Mac environment, while there are multiple options for Windows hosting, from Amazon Web Services to regular web hosts.

For those desiring local control of binary execution, cx_Freeze (https://anthony-tuininga.github.io/cx_Freeze/) is one of the more popular executable creation programs for Python. It only works with Python 2.7 or newer, but that shouldn't be a problem for most developers. However, if you want to use it with Python 2 code, you will have to use cx_Freeze version 5; starting with version 6, support for Python 2 code has been dropped.

Note

The modules created by cx_Freeze are stored in ZIP files. Packages, by default, are stored in the file system but can be included in the same ZIP files, if desired.

PyInstaller (http://www.pyinstaller.org) has, as its main goal, compatibility with third-party packages, requiring no user intervention to make external packages work during binary creation. It is available for Python 2.7 and newer versions.

PyInstaller provides multiple ways to package your Python code: as a single directory (containing the executable as well as all necessary modules), as a single file (self-contained and requiring no external dependencies), or in custom mode.

The majority of third-party packages will work with PyInstaller with no additional configuration required. Conveniently, a list, located at https://github.com/pyinstaller/pyinstaller/wiki/Supported-Packages, is provided for packages known to work with PyInstaller; if there are any limitations, for example, only working on Windows, these are noted as well.

Cython (http://cython.org) is actually a superset of Python, designed to give C-like performance to Python code. This is done by allowing types to be added to the Python code; whereas Python is normally dynamically typed, Cython allows static typing of variables. The resulting code is compiled into C code, which can be executed by the normal Python interpreter as normal, but at the speed of compiled C code.

While normally used to create extensions for Python, or to speed up Python processing, using the --embed flag with the cpython command will create a C file, which can then be compiled to a normal application file.

Naturally, this takes more knowledge of using gcc or your compiler of choice, as you have to know how to import the Python headers during compilation, and which other directories need to be included. As such, Cython isn't recommended for developers unfamiliar with C code, but it can be a powerful way to make full-featured applications by utilizing both Python and C languages.

Nuitka (http://nuitka.net) is a relatively new Python compiler program. It is compatible with Python 2.6 and later, but also requires gcc or another C compiler. The latest version, 0.5.29, is beta-ware, but the author claims it is able to compile every Python construct currently available without a problem.

Nuitka functions much like Cython, in that it uses a C compiler to convert Python code into C code, and make executable files. Entire programs can be compiled, with the modules embedded in the file, but individual modules can be compiled by themselves, if desired.

By default, the resulting binary requires Python to be installed, plus the necessary C extension modules. However, it is possible to create true stand-alone executables by using the --stand-alone flag.

How to do it...

  1. Write your Python program.
  2. To create a Windows .exe file, create a setup.py file to tell the libraries what you want to do. This is mainly importing the setup() function from the Distutils library, importing py2exe, and then calling setup and telling it what type of application it is making, for example, a console, and what the main Python file is. py2exe_setup.py, following, is an example from the documentation of a setup.py file:
      from distutils.core import setup
      import py2exe
      setup(console=['hello.py'])
  1. Run the setup script by calling python setup.py py2exe. This creates two directories: build/ and dist/. The dist/ directory is where the new files are placed, while build/ is used for temporary files during the creation process.
  2. Test the application by moving to the dist/ directory and running the .exe file located there.
  3. To make a macOS .app file, create the setup.py file. Any icons or data files required for the application need to be included during this step.
  4. Clean up the build/ and dist/ directories to ensure there are no files that may be accidentally included.
  5. Use Alias mode to build the application in-place, that is, not ready for distribution. This allows you to test the program before bundling for delivery.
  6. Test the application and verify it works correctly in alias mode.
  7. Clean up the build/ and dist/ directories again.
  8. Run python setup.py py2app to create the distributable .app file.
  9. For cross-platform files, the easiest way to use cx_Freeze is to use the cxfreeze script:
      cxfreeze <program>.py --target-dir=<directory>

Other options are available for this command, such as compressing the bytecode, setting an initialization script, or even excluding modules.

If more functionality is required, a distutils setup script can be created. The command cxfreeze-quickstart can be used to generate a simple setup script; the cx_Freeze documentation provides an example setup.py file (cxfreeze_setup.py):

      import sys
      from cx_Freeze import setup, Executable

      # Dependencies are automatically detected, but it might need fine tuning.
      build_exe_options = {"packages": ["os"], "excludes": ["tkinter"]}

      # GUI applications require a different base on Windows (the default is for 
      # console application).
      base = None
      if sys.platform == "win32":
          base = "Win32GUI"

      setup(  name = "guifoo",
              version = "0.1",
              description = "My GUI application!",
              options = {"build_exe": build_exe_options},
              executables = [Executable("guifoo.py", base=base)])

To run the setup script, run the command: python setup.py build. This will create the directory build/, which contains the subdirectory exe.xxx, where xxx is the platform-specific executable binary indicator:

    • For developers who need even more control, or are looking at creating C scripts for extending or embedding Python, manually working with the classes and modules within the cx_Freeze program is possible.
  1. If using PyInstaller, its use is like most other Python programs,  and is a simple command:
      pyinstaller <program>.py

This generates the binary bundle in the dist/ subdirectory. Naturally, there many other options available when running this command:

    • Optionally, UPX (https://upx.github.io/) can be used to compress the executable files and libraries. When used, UPX compresses the files and wraps them in a self-decompressing file. When executed, the UPX wrapper decompresses the enclosed files and the resulting binary is executed normally.
    • To create multiple Python environments for a single operating system, it is recommended you to create virtual Python environments for each Python version to be generated. Then, install PyInstaller in each environment and build the binary within each environment.
    • Like cx_Freeze, to create binaries for different operating systems, the other OSes must be available and PyInstaller used on each one.
    • Create your Python file; save it with the extension .pyx. For example, helloworld.pyx.
  1. When working with Cython, create a setup.py file that looks similar to cython_setup.py from http://docs.cython.org/en/latest/src/tutorial/cython_tutorial.html#the-basics-of-cython:
      from distutils.core import setup
      from Cython.Build import cythonize

      setup(
          ext_modules = cythonize("helloworld.pyx")
      )
  1. Create the Cython file by running the following:
      $ python setup.py build_ext --inplace
  1. This creates a file in the local directory: helloworld.so on *nix and helloworld.pyd on Windows.
  2. To use the binary, simply import it into Python as normal.
  3. If your Python program doesn't require additional C libraries or a special build configuration, you can use the pyximport library. The install() function from this library allows loading .pyx files directly when imported, rather than having to rerun setup.py every time the code changes.
  1. To compile a program using Nuitka with all modules embedded, use the following command:
      nuitka --recurse-all <program>.py
  1. To compile a single module, use the following command:
      nuitka --module <module>.py
  1. To compile an entire package and embed all modules, the previous commands are combined into a similar format:
      nuitka --module <package> --recurse-directory=<package>
  1. To make a truly cross-platform binary, use the option --standalone, copy the <program>.dist directory to the destination system, and then run the .exe file inside that directory.

There's more...

Depending on a user's system configuration, you may need to provide the Microsoft Visual C runtime DLL. The py2exe documentation provides different files to choose from, depending on the version of Python you are working with.

In addition, py2exe does not create the installation builder, that is, installation wizard. While it may not be necessary for your application, Windows users generally expect a wizard to be available when running an .exe file. A number of free, open-source, and proprietary installation builders are available.

One benefit of building Mac binaries is that they are simple to pack for distribution; once the .app file is generated, right-click on the file and choose Create Archive. After that, your application is ready to be shipped out.

A common problem with cx_Freeze is that the program doesn't automatically detect a file that needs to be copied. This frequently occurs if you are dynamically importing modules into your program, for example, a plugin system.

Binaries created by cx_Freeze are generated for the OS it was run on; for instance, to create a Windows .exe file, cx_Freeze has to be used on a Windows computer. Thus, to make a truly cross-platform Python program that is distributed as executable binaries, you must have access to other operating systems. This can be alleviated by using virtual machines, cloud hosts, or simply purchasing the relevant systems.

When PyInstaller is run, it analyzes the supplied Python program and creates a <program>.spec file in the same folder as the Python program. In addition, the build/ subdirectory is placed in the same location.

The build/ directory contains log files and the working files used to actually create the binary. After the executable file is generated, a dist/ directory is placed in the same location as the Python program, and the binary is placed in the dist/ directory.

The executable file generated by Nuitka will have the .exe extension on all platforms. It is still usable on non-Windows OSes, but it is recommended to change the extension to a system-specific one to avoid confusion.

The binary files created with any of the commands previously shown require Python to be installed on the end system, as well as any C extension modules that are used.

 

How to upload programs to PyPI


If you have developed a package and want to post it on PyPI for distribution, there are several things you need to do to ensure the proper uploading and registration of your project. While this section will highlight some of the key features of configuring your packages for distribution on PyPI, it is not all-inclusive. Make sure you look at the documentation on the PyPI site to ensure you have the latest information.

One of the first things to do is install the twine package into your Python environment. twine is a collection of utilities for interacting with PyPI. The prime reason for its use is that is authenticates your connection to the database using HTTPS; this ensures your username and password are encrypted when interacting with PyPI. While some people may not care whether a malicious entity captures their login credentials for a Python repository, a number of people use the same login name and password for multiple sites, meaning that someone learning the PyPI login information could potentially access other sites as well.

twine also allows you to pre-create your distribution files, that is, you can test your package files before releasing them to ensure everything works. As part of this, you can upload any packing format, including wheels, to PyPI.

Finally, it allows you to digitally pre-sign your files and pass the .asc files to the command line when uploading the files. This ensures data security by verifying you are passing your credentials into the GPG application, and not something else.

Getting ready

Your project files need to be configured in the proper way so they are of use to other developers, and are listed properly on PyPI. The most important step of this process is setting up the setup.py file, which sits in the root of your project's directory.

setup.py contains configuration data for your project, particularly the setup() function, which defines the details of the project. It is also the command-line interface for running commands related to the packaging process.

A license (license.txt) should be included with the package. This file is important because, in some areas, a package without an explicit license cannot be legally used or distributed by anyone but the copyright holder. Including the license ensures both the creator and users are legally protected against copyright infringement issues.

How to do it...

  1. Create a manifest file.
  2. Configure setup.py by defining the options for the distutils setup() function.

How it works...

A manifest file is also important if you need to package files that aren't automatically included in the source distribution. By default, the following files are included in the package when generated (known as the standard include set):

  • All Python source files implied by the py_modules and packages options
  • All C source files listed in ext_modules or libraries options
  • Any scripts identified with the scripts option
  • Any test scripts, for instance, anything that looks like test*.py
  • Setup and readme files: setup.py, setup.cfg, and README.txt
  • All files that match the package_data and data_files metadata

Any files that don't meet these criteria, such as a license file, need to be included in a MANIFEST.ini template file. The manifest template is a list of instructions on how to generate the actual manifest file that lists the exact files to include in the source distribution.

The manifest template can include or exclude any desired files; wildcards are available as well. For example, manifest_template.py from the distutils package shows one way to list files:

include *.txt
recursive-include examples *.txt *.py
prune examples/sample?/build

This example indicates that all .txt files in the root directory should be included, as well as all .txt and .py files in the examples/ subdirectory. In addition, all directories that match examples/sample?/build will be excluded from the package.

The manifest file is processed after the defaults above are considered, so if you want to exclude files from the standard include set, you can explicitly list them in the manifest. If, however, you want to completely ignore all defaults in the standard set, you can use the --no-defaults option to completely disable the standard set.

The order of commands in the manifest template is important. After the standard include set is processed, the template commands are processed in order. Once that is done, the final resulting command set is processed; all files to be pruned are removed. The resulting list of files is written to the manifest file for future reference; the manifest file is then used to build the source distribution archive.

It is important to note that the manifest template does not affect binary distributions, such as wheels. It is only for use in source-file packaging.

As mentioned previously, setup.py is a key file for the packaging process, and the setup() function is what enables the details of the project to be defined.

There are a number of arguments that can be provided to the setup() function, some of which will be covered in the following list. A good example of this is shown is the Listing Packages section:

  • name: The name of the project, as it will be listed on PyPI. Only ASCII alphanumeric characters, underscores, hyphens, and periods are acceptable. Must also start and end with an ASCII character. This is a required field. Project names are case-insensitive when pulled via pip, that is, My.Project = My-project = my-PROJECT, so make sure the name itself is unique, not just a different capitalization compared to another project.
  • version: The current version of your project. This is used to tell users whether they have the latest version installed, as well as indicating which specific versions they've tested their software against. This is a required field.

There is actually a document on PEP 440 (https://www.python.org/dev/peps/pep-0440/) that indicates how to write your version numbers. versioning.py is an example of versioning a project:

      2.1.0.dev1# Development release      2.1.0a1# Alpha Release      2.1.0b1# Beta Release      2.1.0rc1# Release Candidate      2.1.0# Final Release      2.1.0.post1# Post Release      2018.04# Date based release
      19          # Serial release
  • description: A short and long description of your project. These will be displayed on PyPI when the project is published. The short description is required but the long description is optional.
  • url: The homepage URL for your project. This is an optional field.
  • author: The developer name(s) or organization name. This is an optional field.
  • author_email: The email address for the author listed above. Obfuscating the email address by spelling out the special characters, for example, your_name at your_organization dot com, is discouraged as this is a computer-readable field; use your_name@your_organization.com. This is an optional field.
  • classifiers: These categorize your project to help users find it on PyPI. There is a list of classifiers (https://pypi.python.org/pypi?%3Aaction=list_classifiers) that can be used, but they are optional. Some possible classifiers include: development status, framework used, intended use case, license, and so on.
  • keywords: List of keywords that describe your project. It is suggested you to use keywords that might be used by a user searching for your project. This is an optional field.
  • packages: List of packages used in your project. The list can be manually entered, but setuptools.find_packages() can be used to locate them automatically. A list of excluded packages can also be included to ignore packages that are not intended for release. This is a required field. An optional method for listing packages is to distribute a single Python file, which to change the packages argument to py_modules, which then expects my_module.py to exist in the project.
  • install_requires: Specifies the minimum dependencies for the project to run. pip uses this argument to automatically identify dependencies, so these packages must be valid, existing projects. This is an optional field.
  • python_requires: Specifies the Python versions the project will run on. This will prevent pip from installing the project on invalid versions. This is an optional field. This is a relatively recent feature; setuptools version 24.2.0 is the minimum version required for creating source distributions and wheels to ensure pip properly recognizes this field. In addition, pip version 9.0.0 or newer is required; earlier versions will ignore this field and install the package regardless of Python version.
  • package_data: This is used to indicate additional files to be installed in the package, such as other data files or documentation. This argument is a dictionary mapping the package name to a list of relative path names. This is an optional field.
  • data_fields: While package_data is the preferred method for identifying additional files, and is normally sufficient for the purpose, there are times when data files need to be placed outside your project package, for example, configuration files that need to be stored in a particular location in the file system. This is an optional field.
  • py_modules: List of names for single-file modules that are included in the project. This is a required field.
  • entry_points: Dictionary of executable scripts, such as plugins, that are defined within your project or that your project depends upon. Entry points provide cross-platform support and allow pip to create the appropriate executable form for the target platform. Because of these capabilities, entry points should be used in lieu of the scripts argument. This is an optional field.
 

Project packaging


Everything we have talked about so far is just the basics required to get your project configured and set up for packaging; we haven't actually packaged it yet. To actually create a package that can be installed from PyPI or another package index, you need to run the setup.py script.

How to do it...

  1. Create a source code-based distribution. The minimum required for a package is a source distribution. A source distribution provides the metadata and essential source code files needed by pip for installation. A source distribution is essentially raw code and requires a build step prior to installation to build out the installation metadata from setup.py. A source distribution is created by running python setup.py sdist.
  2. While source distributions are a necessity, it is more convenient to create wheels. Wheel packages are highly recommended, as they are pre-built packages that can be installed without waiting for the build process. This means installation is significantly faster compared to working with a source distribution. There are several types of wheels, depending on whether the project is pure Python and whether it natively supports both Python 2 and 3. To build wheels, you must first install the wheel package: pip install wheel.
  3. The preferred wheel package is a universal wheel. Universal wheels are pure Python, that is, do not contain C-code compiled extensions, and natively support both Python 2 and 3 environments. Universal wheels can be installed anywhere using pip. To build a universal wheel, the following command is used:
      python setup.py bdist_wheel --universal

--universal should only be used when there are no C extensions in use and the Python code runs on both Python 2 and Python 3 without needing modifications, such as running 2to3.bdist_wheel signifies that the distribution is a binary one, as opposed to a source distribution. When used in conjunction with --universal, it does not check to ensure that it is being used correctly, so no warnings will be provided if the criteria are not met. The reason universal wheels shouldn't be used with C extensions is because pip prefers wheels over source distributions. Since an incorrect wheel will mostly likely prevent the C extension from being built, the extension won't be available for use.

  1. Alternatively, pure Python wheels can be used. Pure Python wheels are created when the Python source code doesn't natively support both Python 2 and 3 functionality. If the code can be modified for use between the two versions, such as via 2to3, you can manually create wheels for each version. To build a wheel, use the following command:
      python setup.py bdist_wheel

bdist_wheel will identify the code and build a wheel that is compatible for any Python installation with the same major version number, that is, 2.x or 3.x.

  1. Finally, platform wheels can be used when making packages for specific platforms. Platform wheels are binary builds specific to a certain platform/architecture due to the inclusion of compiled C extensions. Thus, if you need to make a program that is only used on macOS, a platform wheel must be used. The same command as a pure Python wheel is used, but bdist_wheel will detect that the code is not pure Python code and will build a wheel whose name will identify it as only usable on a specific platform. This is the same tag as referenced in the Installing from Wheels section.
 

Uploading to PyPI


When setup.py is run, it creates the new directory dist/ in your project's root directory. This is where the distribution files are placed for uploading. These files are only created when the build command is run; any changes to the source code or configuration files require rebuilding the distribution files.

Getting ready

Before uploading to the main PyPI site, there is a PyPI test site (https://testpypi.python.org/pypi) you can practice with. This allows developers the opportunity to ensure they know what they are doing with the entire building and uploading process, so they don't break anything on the main site. The test site is cleaned up on a semi-regular basis, so it shouldn't be relied on as a storage site while developing.

In addition, check the long and short descriptions in your setup.py to ensure they are valid. Certain directives and URLs are forbidden and stripped during uploading; this is one reason why it is good to test your project on the PyPI test site to see if there are any problems with your configuration.

Before uploading to PyPI, you need to create a user account. Once you have manually created an account on the web site, you can create a $HOME/.pypirc file to store your username and password. This file will be referenced when uploading so you won't have to manually enter it every time. However, be aware that your PyPI password is stored in plaintext, so if you are concerned about that you will have to manually provide it for every upload.

Once you have a created a PyPI account, you can upload your distributions to PyPI via twine; for new distributions, twine will automatically handle the registration of the project on the site. Install twine as normal using pip.

How to do it...

  1. Create your distributions:
      python setup.py sdist bdist_wheel --universal
  1. Register your project (if for a first upload):
      twine register dist/<project>.<version>.tar.gz
      twine register dist/<package_name>-<version>-
      <language_version>-<abi_tag>-<platform_tag>.whl
  1. Upload distributions:
      twine upload dist/*
  1. The following error indicates you need to register your package:
      HTTPError: 403 Client Error: You are not allowed to 
                         edit 'xyz' package information

How it works...

twine securely authenticates users to the PyPI database using HTTPS. The older way of uploading packages to PyPI was using python setup.py upload; this was insecure as the data was transferred via unencrypted HTTP, so your login credentials could be sniffed. With twine, connections are made through verified TLS to prevent credential theft.

This also allows a developer to pre-create distribution files, whereas setup.py upload only works with distributions that are created at the same time. Thus, using twine, a developer is able to test files prior to uploading them to PyPI, to ensure they work.

Finally, you can pre-sign your uploads with digital signatures and attach the .asc certification files to the twine upload. This ensures the developer's password is entered into GPG and not some other software, such as malware.

Latest Reviews (2 reviews total)
Useful but misnamed title. This book is more akin to the ‘missing manual’ guides.
Great content and informative.
Secret Recipes of the Python Ninja
Unlock this book and the full library FREE for 7 days
Start now