Python for Geeks

5 (2 reviews total)
By Muhammad Asif
    What do you get with a Packt Subscription?

  • Instant access to this title and 7,500+ eBooks & Videos
  • Constantly updated with 100+ new titles each month
  • Breadth and depth in over 1,000+ technologies
  1. Chapter 1: Optimal Python Development Life Cycle

About this book

Python is a multipurpose language that can be used for multiple use cases. Python for Geeks will teach you how to advance in your career with the help of expert tips and tricks.

You'll start by exploring the different ways of using Python optimally, both from the design and implementation point of view. Next, you'll understand the life cycle of a large-scale Python project. As you advance, you'll focus on different ways of creating an elegant design by modularizing a Python project and learn best practices and design patterns for using Python. You'll also discover how to scale out Python beyond a single thread and how to implement multiprocessing and multithreading in Python. In addition to this, you'll understand how you can not only use Python to deploy on a single machine but also use clusters in private as well as in public cloud computing environments. You'll then explore data processing techniques, focus on reusable, scalable data pipelines, and learn how to use these advanced techniques for network automation, serverless functions, and machine learning. Finally, you'll focus on strategizing web development design using the techniques and best practices covered in the book.

By the end of this Python book, you'll be able to do some serious Python programming for large-scale complex projects.

Publication date:
October 2021
Publisher
Packt
Pages
546
ISBN
9781801070119

 

Chapter 1: Optimal Python Development Life Cycle

Keeping in mind your prior experience with Python, we have skipped the introductory details of the Python language in this chapter. First, we will have a short discussion of the broader open source Python community and its specific culture. That introduction is important, as this culture is reflected in code being written and shared by the Python community. Then, we will present the different phases of a typical Python project. Next, we will look at different ways of strategizing the development of a typical Python project.

Moving on, we will explore different ways of documenting the Python code. Later, we will look into various options of developing an effective naming scheme that can greatly help improve the maintenance of the code. We will also look into various options for using source control for Python projects, including situations where developers are mainly using Jupyter notebooks for development. Finally, we explore the best practices to deploy the code for use, once it is developed and tested.

We will cover the following topics in this chapter:

  • Python culture and community
  • Different phases of a Python project
  • Strategizing the development process
  • Effectively documenting Python code
  • Developing an effective naming scheme
  • Exploring choices for source control
  • Understanding strategies for deploying the code
  • Python development environments

This chapter will help you understand the life cycle of a typical Python project and its phases so that you can fully utilize the power of Python.

 

Python culture and community

Python is an interpreted high-level language that was originally developed by Guido van Rossum in 1991. The Python community is special in the sense that it pays close attention to how the code is written. For that, since the early days of Python, the Python community has created and maintained a particular flavor in its design philosophy. Today, Python is used in a wide variety of industries, ranging from education to medicine. But regardless of the industry in which it is used, the particular culture of the vibrant Python community is usually seen to be part and parcel of Python projects.

In particular, the Python community wants us to write simple code and avoid complexity wherever possible. In fact, there is an adjective, Pythonic, which means there are multiple ways to accomplish a certain task but there is a preferred way as per the Python community conventions and as per the founding philosophy of the language. Python nerds try their best to create artifacts that are as Pythonic as possible. Obviously, unpythonic code means that we are not good coders in the eyes of these nerds. In this book, we will try to go as Pythonic as possible as we can in our code and design.

And there is something official about being Pythonic as well. Tim Peters has concisely written the philosophy of Python in a short document, The Zen of Python. We know that Python is said to be one of the easiest languages to read, and The Zen of Python wants to keep it that way. It expects Python to be explicit through good documentation and as clean and clear as possible. We can read The Zen of Python ourselves, as explained next.

In order to read The Zen of Python, open up a Python console and run the import this command, as shown in the following screenshot:

Figure 1.1 – The Zen of Python

Figure 1.1 – The Zen of Python

The Zen of Python seems to be a cryptic text discovered in an old Egyptian tomb. Although it is deliberately written in this casual cryptic way, there is a deeper meaning to each line of text. Actually, look closer—it can be used as a guideline to code in Python. We will refer to different lines from The Zen of Python throughout the book. Let's first look into some excerpts from it, as follows:

  • Beautiful is better than ugly: It is important to write code that is well-written, readable, and self-explanatory. Not only should it work—it should be beautifully written. While coding, we should avoid using shortcuts in favor of a style that is self-explanatory.
  • Simple is better than complex: We should not unnecessarily complicate things. Whenever facing a choice, we should prefer the simpler solution. Nerdy, unnecessary, and complicated ways of writing code are discouraged. Even when it adds some more lines to the source code, simpler remains better than the complex alternative.
  • There should be one-- and preferably only one --obvious way to do it: In broader terms, for a given problem there should be one possible best solution. We should strive to discover this. As we iterate through the design to improve it, regardless of our approach, our solution is expected to evolve and converge toward that preferable solution.
  • Now is better than never: Instead of waiting for perfection, let's start solving the given problem using the information, assumptions, skills, tools, and infrastructure we have. Through the process of iteration, we will keep improving the solution. Let's keep things moving instead of idling. Do not slack while waiting for the perfect time. Chances are that the perfect time will never come.
  • Explicit is better than implicit: The code should be as self-explanatory as possible. This should be reflected in the choice of variable names, the class, and the function design, as well as in the overall end-to-end (E2E) architecture. It is better to err on the side of caution. Always make it more explicit whenever facing a choice.
  • Flat is better than nested: A nested structure is concise but also creates confusion. Prefer a flat structure wherever possible.
 

Different phases of a Python project

Before we discuss the optimal development life cycle, let's start by identifying the different phases of a Python project. Each phase can be thought of as a group of activities that are similar in nature, as illustrated in the following diagram:

Figure 1.2 – Various phases of a Python project

Figure 1.2 – Various phases of a Python project

The various phases of a typical Python project are outlined here:

  • Requirement analysis: This phase is about collecting the requirements from all key stakeholders and then analyzing them to understand what needs to be done and later think about the how part of it. The stakeholders can be our actual users of the software or business owners. It is important to collect the requirements in as much detail as possible. Wherever possible, requirements should be fully laid out, understood, and discussed with the end user and stakeholders before starting the design and development.

    An important point is to ensure that the requirement-analysis phase should be kept out of the iterative loop of the design, development, and testing phases. Requirement analysis should be fully conducted and complete before moving on to the next phases. The requirements should include both functional requirements (FRs) and non-functional requirements (NFRs). FRs should be grouped into modules. Within each module, the requirements should be numbered in an effort to map them as closely as possible with the code modules.

  • Design: Design is our technical response to the requirements as laid out in the requirement phase. In the design phase, we figure out the how part of the equation. It is a creative process where we use our experience and skills to come up with the right set and structure of modules and the interactions between them in the most efficient and optimal way.

    Note that coming up with the right design is an important part of a Python project. Any missteps in the design phase will be much more expensive to correct than missteps in later phases. By some measure, it takes 20 times more effort to change the design and implement the design changes in the subsequent phases (for example, coding phase), as compared to a similar degree of changes if they happen in the coding phase—for example, the inability to correctly identify classes or figure out the right data and compute the dimension of the project will have a major impact as compared to a mistake when implementing a function. Also, because coming up with the right design is a conceptual process, mistakes may not be obvious and cannot be caught by testing. On the other hand, errors in the coding will be caught by a well-thought-out exception-handling system.

    In the design phase, we perform the following activities:

    a) We design the structure of the code and identify the modules within the code.

    b) We decide the fundamental approach and decide whether we should be using functional programming, OOP, or a hybrid approach.

    c) We also identify the classes and functions and choose the names of these higher-level components.

    We also produce higher-level documentation.

  • Coding: This is the phase where we will implement the design using Python. We start by implementing the higher-level abstractions, components, and modules identified by the design first, followed by the detailed coding. We will keep a discussion about the coding phase to a minimum in this section as we will discuss it extensively throughout the book.
  • Testing: Testing is the process of verifying our code.
  • Deployment: Once thoroughly tested, we need to hand over the solution to the end user. The end user should not see the details of our design, coding, or testing. Deployment is the process of providing a solution to the end user that can be used to solve the problem as detailed in the requirements. For example, if we are working to develop a machine learning (ML) project to predict rainfall in Ottawa, the deployment is about figuring out how to provide a usable solution to the end user.

Having understood what the different phases of a project are, we will move on to see how we can strategize the overall process.

 

Strategizing the development process

Strategizing the development process is about planning each of the phases and looking into the process flow from one phase to another. To strategize the development process, we need to first answer the following questions:

  1. Are we looking for a minimal design approach and going straight to the coding phase with little design?
  2. Do we want test-driven development (TDD), whereby we first create tests using the requirements and then code them?
  3. Do we want to create a minimum viable product (MVP) first and iteratively evolve the solution?
  4. What is the strategy for validating NFRs such as security and performance?
  5. Are we looking for a single-node development, or do we want to develop and deploy on the cluster or in the cloud?
  6. What are the volume, velocity, and variety of our input and output (I/O) data? Is it a Hadoop distributed file system (HDFS) or Simple Storage Service (S3) file-based structure, or a Structured Query Language (SQL) or NoSQL database? Is the data on-premises or in the cloud?
  7. Are we working on specialized use cases such as ML with specific requirements for creating data pipelines, testing models, and deploying and maintaining them?

Based on the answers to these questions, we can strategize the steps for our development process. In more recent times, it is always preferred to use iterative development processes in one form or another. The concept of MVP as a starting goal is also popular. We will discuss these in the next subsections, along with the domains' specific development needs.

Iterating through the phases

Modern software development philosophy is based on short iterative cycles of design, development, and testing. The traditional waterfall model that was used in code development is long dead. Selecting the right granularity, emphasis, and frequency of these phases depends on the nature of the project and our choice of code development strategy. If we want to choose a code development strategy with minimum design and want to go straight to coding, then the design phase is thin. But even starting the code straight away will require some thought in terms of the design of modules that will eventually be implemented.

No matter what strategy we choose, there is an inherent iterative relationship between the design, development, and testing phases. We initially start with the design phase, implement it in the coding phase, and then validate it by testing it. Once we have flagged the deficiencies, we need to go back to the drawing board by revisiting the design phase.

Aiming for MVP first

Sometimes, we select a small subject of the most important requirements to first implement the MVP with the aim of iteratively improving it. In an iterative process, we design, code, and test, until we create a final product that can be deployed and used.

Now, let's talk about how we will implement the solution of some specialized domains in Python.

Strategizing development for specialized domains

Python is currently being used for a wide variety of scenarios. Let's look into the following five important use cases to see how we can strategize the development process for each of them according to their specific needs:

  • ML
  • Cloud computing and cluster computing
  • Systems programming
  • Networking programming
  • Serverless computing

We will discuss each of them in the following sections.

ML

Over the years, Python has become the most common language used for implementing ML algorithms. ML projects need to have a well-structured environment. Python has an extensive collection of high-quality libraries that are available for use for ML.

For a typical ML project, there is a Cross-Industry Standard Process for Data Mining (CRISP-DM) life cycle that specifies various phases of an ML project. A CRISP-DM life cycle looks like this:

Figure 1.3 – A CRISP-DM life cycle

Figure 1.3 – A CRISP-DM life cycle

For ML projects, designing and implementing data pipelines is estimated to be almost 70% of the development effort. While designing data processing pipelines, we should keep in mind that the pipelines will ideally have these characteristics:

  • They should be scalable.
  • They should be reusable as far as possible.
  • They should process both streaming and batch data by conforming to Apache Beam standards.
  • They should mostly be a concatenation of fit and transform functions, as we will discuss in Chapter 6, Advanced Tips and Tricks in Python.

Also, an important part of the testing phase for ML projects is the model evaluation. We need to figure out which of the performance metrics is the best one to quantify the performance of the model according to the requirement of the problem, nature of the data, and type of algorithm being implemented. Are we looking at accuracy, precision, recall, F1 score, or a combination of these performance metrics? Model evaluation is an important part of the testing process and needs to be conducted in addition to the standard testing done in other software projects.

Cloud computing and cluster computing

Cloud computing and cluster computing add additional complexity to the underlying infrastructure. Cloud service providers offer services that need specialized libraries. The architecture of Python, which starts with bare-minimum core packages and the ability to import any further package, makes it well suited for cloud computing. The platform independence offered by a Python environment is critical for cloud and cluster computing. Python is the language of choice for Amazon Web Services (AWS), Windows Azure, and Google Cloud Platform (GCP).

Cloud computing and cluster computing projects have separate development, testing, and production environments. It is important to keep the development and production environments in sync.

When using infrastructure-as-a-service (IaaS), Docker containers can help a lot, and it is recommended to use them. Once we are using the Docker container, it does not matter where we are running the code as the code will have exactly the same environment and dependencies.

Systems programming

Python has interfaces to operating system services. Its core libraries have Portable Operating System Interface (POSIX) bindings that allow developers to create so-called shell tools, which can be used for system administration and various utilities. Shell tools written in Python are compatible across various platforms. The same tool can be used in Linux, Windows, and macOS without any change, making them quite powerful and maintainable.

For example, a shell tool that copies a complete directory developed and tested in Linux can run unchanged in Windows. Python's support for systems programming includes the following:

  • Defining environment variables
  • Support for files, sockets, pipes, processes, and multiple threads
  • Ability to specify a regular expression (regex) for pattern matching
  • Ability to provide command-line arguments
  • Support for standard stream interfaces, shell-command launchers, and filename expansion
  • Ability to zip file utilities
  • Ability to parse Extensible Markup Language (XML) and JavaScript Object Notation (JSON) files

When using Python for system development, the deployment phase is minimal and may be as simple as packaging the code as an executable file. It is important to mention that Python is not intended to be used for the development of system-level drivers or operating system libraries.

Network programming

In the digital transformation era where Information Technology (IT) systems are moving quickly toward automation, networks are considered the main bottleneck in full-stack automation. The reason for this is the propriety network operating systems from different vendors and a lack of openness, but the prerequisites of digital transformation are changing this trend and a lot of work is in progress to make the network programmable and consumable as a service (network-as-a-service, or NaaS). The real question is: Can we use Python for network programming? The answer is a big YES. In fact, it is one of the most popular languages in use for network automation.

Python support for network programming includes the following:

  • Socket programming including Transmission Control Protocol (TCP) and User Datagram Protocol (UDP) sockets
  • Support for client and server communication
  • Support for port listening and processing data
  • Executing commands on a remote Secure Shell (SSH) system
  • Uploading and downloading files using Secure Copy Protocol (SCP)/File Transfer Protocol (FTP)
  • Support for the library for Simple Network Management Protocol (SNMP)
  • Support for the REpresentational State Transfer (RESTCONF) and Network Configuration (NETCONF) protocols for retrieving and updating configuration

Serverless computing

Serverless computing is a cloud-based application execution model in which the cloud service providers (CSPs) provide the computer resources and application servers to allow developers to deploy and execute the applications without any hassle of managing the computing resources and servers themselves. All of the major public cloud vendors (Microsoft Azure Serverless Functions, AWS Lambda, and Google Cloud Platform, or GCP) support serverless computing for Python.

We need to understand that there are still servers in a serverless environment, but those servers are managed by CSPs. As an application developer, we are not responsible for installing and maintaining the servers as well as having no direct responsibility for the scalability and performance of the servers.

There are popular serverless libraries and frameworks available for Python. These are described next:

  • Serverless: The Serverless Framework is an open source framework for serverless functions or AWS Lambda services and is written using Node.js. Serverless is the first framework developed for building applications on AWS Lambda.
  • Chalice: This is a Python serverless microframework developed by AWS. This is a default choice for developers who want to quickly spin up and deploy their Python applications using AWS Lambda Services, as this enables you to quickly spin up and deploy a working serverless application that scales up and down on its own as required, using AWS Lambda. Another key feature of Chalice is that it provides a utility to simulate your application locally before pushing it to the cloud.
  • Zappa: This is more of a deployment tool built into Python and makes the deployment of your Web Server Gateway Interface (WSGI) application easy.

Now, let's look into effective ways of developing Python code.

 

Effectively documenting Python code

Finding an effective way to document code is always important. The challenge is to develop a comprehensive yet simple way to develop Python code. Let's first look into Python comments and then docstrings.

Python comments

In contrast with a docstring, Python comments are not visible to the runtime compiler. They are used as a note to explain the code. Comments start with a # sign in Python, as shown in the following screenshot:

Figure 1.4 – An example of a comment in Python

Figure 1.4 – An example of a comment in Python

Docstring

The main workhorse for documenting the code is the multiline comments block called a docstring. One of the features of the Python language is that DocStrings are associated with an object and are available for inspection. The guidelines for DocStrings are described in Python Enhancement Proposal (PEP) 257. According to the guidelines, their purpose is to provide an overview to the readers. They should have a good balance between being concise yet elaborative. DocStrings use a triple-double-quote string format: (""").

Here are some general guidelines when creating a docstring:

  • A docstring should be placed right after the function or the class definition.
  • A docstring should be given a one-line summary followed by a more detailed description.
  • Blank spaces should be strategically used to organize the comments but they should not be overused. You can use blank lines to organize code, but don't use them excessively.

In the following sections, let's take a look at more detailed concepts of docStrings.

Docstring styles

A Python docstring has the following slightly different styles:

  • Google
  • NumPy/SciPy
  • Epytext
  • Restructured

Docstring types

While developing the code, various types of documentation need to be produced, including the following:

  • Line-by-line commentary
  • Functional or class-level documentation
  • Algorithmic details

Let's discuss them, one by one.

Line-by-line commentary

One simple use of a docstring is to use it to create multiline comments, as shown here:

Figure 1.5 – An example of a line-by-line commentary-type docstring

Figure 1.5 – An example of a line-by-line commentary-type docstring

Functional or class-level documentation

A powerful use of a docstring is for functional or class-level documentation. If we place the docstring just after the definition of a function or a class, Python associates the docstring with the function or a class. This is placed in the __doc__ attribute of that particular function or class. We can print that out at runtime by either using the __doc__ attribute or by using the help function, as shown in the following example:

Figure 1.6 – An example of the help function

Figure 1.6 – An example of the help function

When using a docstring for documenting classes, the recommended structure is as follows:

  • A summary: usually a single line
  • First blank line
  • Any further explanation regarding the docstring
  • Second blank line

An example of using a docstring on the class level is shown here:

Figure 1.7 – An example of a class-level docstring

Figure 1.7 – An example of a class-level docstring

Algorithmic details

More and more often, Python projects use descriptive or predictive analytics and other complex logic. The details of the algorithm that is used need to be clearly specified with all the assumptions that were made. If an algorithm is implemented as a function, then the best place to write the summary of the logic of the algorithm is before the signature of the function.

 

Developing an effective naming scheme

If developing and implementing the right logic in code is science, then making it pretty and readable is an art. Python developers are famous for paying special attention to the naming scheme and bringing The Zen of Python into it. Python is one of the few languages that have comprehensive guidelines on the naming scheme written by Guido van Rossum. They are written in a PEP 8 document that has a complete section on naming conventions, which is followed by many code bases. PEP 8 has naming and style guidelines that are suggested. You can read more about it at https://www.Python.org/dev/peps/pep-0008/.

The naming scheme suggested in PEP 8 can be summarized as follows:

  • In general, all module names should be all_lower_case.
  • All class names and exception names should be CamelCase.
  • All global and local variables should be all_lower_case.
  • All functions and method names should be all_lower_case.
  • All constants should be ALL_UPPER_CASE.

Some guidelines about the structure of the code from PEP 8 are given here:

  • Indentation is important in Python. Do not use Tab for indentation. Instead, use four spaces.
  • Limit nesting to four levels.
  • Remember to limit the number of lines to 79 characters. Use the \ symbol to break long lines.
  • To make code readable, insert two blank lines to separate functions.
  • Insert a single black line between various logical sections.

Remember that PEP guidelines are just suggestions that may be customized by different teams. Any customized naming scheme should still use PEP 8 as the basic guideline.

Now, let's look in more detail at the naming scheme in the context of various Python language structures.

Methods

Method names should use lowercase. The name should consist of a single word or more than one word separated by underscores. You can see an example of this here:

calculate_sum

To make the code readable, the method should preferably be a verb, related to the processing that the method is supposed to perform.

If a method is non-public, it should have a leading underscore. Here's an example of this:

_my_calculate_sum

Dunder or magic methods are methods that have a leading and trailing underscore. Examples of Dunder or magic methods are shown here:

  • __init__
  • __add__

It is never a good idea to use two leading and trailing underscores to name a method, and the use of these by developers is discouraged. Such a naming scheme is designed for Python methods.

Variables

Use a lowercase word or words separated by an underscore to represent variables. The variables should be nouns that correspond to the entity they are representing.

Examples of variables are given here:

  • x
  • my_var

The names of private variables should start with an underscore. An example is _my_secret_variable.

Boolean variables

Starting a Boolean variable with is or has makes it more readable. You can see a couple of examples of this here:

class Patient:
    is_admitted = False
    has_heartbeat = False

Collection variables

As collections are buckets of variables, it is a good idea to name them in a plural format, as illustrated here:

class Patient:
    admitted_patients = ['John','Peter']

Dictionary variables

The name of the dictionary is recommended to be as explicit as possible. For example, if we have a dictionary of people mapped to the cities they are living in, then a dictionary can be created as follows:

persons_cities = {'Imran': 'Ottawa', 'Steven': 'Los Angeles'}

Constant

Python does not have immutable variables. For example, in C++, we can specify a const keyword to specify that the variable is immutable and is a constant. Python relies on naming conventions to specify constants. If the code tries to treat a constant as a regular variable, Python will not give an error.

For constants, the recommendation is to use uppercase words or words separated by an underscore. An example of a constant is given here:

CONVERSION_FACTOR

Classes

Classes should follow the CamelCase style—in other words, they should start with a capital letter. If we need to use more than one word, the words should not be separated by an underscore, but each word that is appended should have an initial capital letter. Classes should use a noun and should be named in a way to best represent the entity the class corresponds to. One way of making the code readable is to use classes with suffixes that have something to do with their type or nature, such as the following:

  • HadoopEngine
  • ParquetType
  • TextboxWidget

Here are some points to keep in mind:

  • There are exception classes that handle errors. Their names should always have Error as the trailing word. Here's an example of this:
    FileNotFoundError
  • Some of Python's built-in classes do not follow this naming guideline.
  • To make it more readable, for base or abstract classes, a Base or Abstract prefix can be used. An example could be this:
    AbstractCar
    BaseClass

Packages

The use of an underscore is not encouraged while naming a package. The name should be short and all lowercase. If more than one word needs to be used, the additional word or words should also be lowercase. Here's an example of this:

mypackage

Modules

When naming a module, short and to-the-point names should be used. They need to be lowercase, and more than one word will be joined by underscores. Here's an example:

main_module.py

Import conventions

Over the years, the Python community has developed a convention for aliases that are used for commonly used packages. You can see an example of this here:

import numpy as np
import pandas as pd
import seaborn as sns
import statsmodels as sm
import matplotlib.pyplot as plt 

Arguments

Arguments are recommended to have a naming convention similar to variables, because arguments of a function are, in fact, temporary variables.

Useful tools

There are a couple of tools that can be used to test how closely your code conforms to PEP 8 guidelines. Let's look into them, one by one.

Pylint

Pylint can be installed by running the following command:

$ pip install pylint

Pylint is a source code analyzer that checks the naming convention of the code with respect to PEP 89. Then, it prints a report. It can be customized to be used for other naming conventions.

PEP 8

PEP 8 can be installed by running the following command:

 pip: $ pip install pep8

pep8 checks the code with respect to PEP 8.

So far, we have learned about the various naming conventions in Python. Next, we will explore different choices for using source control for Python.

 

Exploring choices for source control

First, we will see a brief history of source control systems to provide a context. Modern source control systems are quite powerful. The evolution of the source control systems went through the following stages:

  • Stage 1: The source code was initially started by local source control systems that were stored on a hard drive. This local code collection was called a local repository.
  • Stage 2: But using source control locally was not suitable for larger teams. This solution eventually evolved into a central server-based repository that was shared by the members of the team working on a particular project. It solved the problem of code sharing among team members, but it also created an additional challenge of locking the files for the multiuser environment.
  • Stage 3: Modern version control repositories such as Git evolved this model further. All members of a team now have a full copy of the repository that is stored. The members of the team now work offline on the code. They need to connect to the repository only when there is a need to share the code.

What does not belong to the source control repository?

Let's look into what should not be checked into the source control repository.

Firstly, anything other than the source code file shouldn't be checked in. The computer-generated files should not be checked into source control. For example, let's assume that we have a Python source file named main.py. If we compile it, the generated code does not belong to the repository. The compiled code is a derived file and should not be checked into source control. There are three reasons for this, outlined as follows:

  • The derived file can be generated by any member of the team once we have the source code.
  • In many cases, the compiled code is much larger than the source code, and adding it to the repository will make it slow and sluggish. Also, remember that if there are 16 members in the team, then all of them unnecessarily get a copy of that generated file, which will unnecessarily slow down the whole system.
  • Source control systems are designed to store the delta or the changes you have made to the source files since your last commit. Files other than the source code files are usually binary files. The source control system is most likely unable to have a diff tool for that, and it will need to store the whole file each time it is committed. It will have a negative effect on the performance of the source control framework.

Secondly, anything that is confidential does not belong to the source control. This includes API keys and passwords.

For the source repository, GitHub is the preferred choice of the Python community. Much of the source control of the famous Python packages also resides on GitHub. If the Python code is to be utilized across teams, then the right protocol and procedures need to be developed and maintained.

 

Understanding strategies for deploying the code

For projects where the development team is not the end user, it is important to come up with a strategy to deploy the code for the end user. For relatively large-scale projects, when there is a well-defined DEV and PROD environment, deploying the code and strategizing it becomes important.

Python is the language of choice for cloud and cluster computing environments as well.

Issues related to deploying the code are listed as follows:

  • Exactly the same transformations need to happen in DEV, TEST, and PROD environments.
  • As the code keeps getting updated in the DEV environment, how will the changes be synced to the PROD environment?
  • What type of testing do you plan to do in the DEV and PROD environments?

Let's look into two main strategies for deploying the code.

Batch development

This is the traditional development process. We develop the code, compile it, and then test it. This process is repeated iteratively until all the requirements are met. Then, the developed code is deployed.

Employing continuous integration and continuous delivery

Continuous integration/continuous delivery (CI/CD) in the context of Python refers to continuous integration and deployment instead of conducting it as a batch process. It helps to create a development-operations (DevOps) environment by bridging the gap between development and operations.

CI refers to continuously integrating, building, and testing various modules of the code as they are being updated. For a team, this means that the code developed individually by each team member is integrated, built, and tested, typically many times a day. Once they are tested, the repository in the source control is updated.

An advantage of CI is that problems or bugs are fixed right in the beginning. A typical bug fixed on the day it was created takes much less time to resolve right away instead of resolving it days, weeks, or months later when it has already trickled down to other modules and those affected may have created multilevel dependencies.

Unlike Java or C++, Python is an interpreted language, which means the built code is executable on any target machine with an interpreter. In comparison, the compiled code is typically built for one type of target machine and may be developed by different members of the team. Once we have figured out which steps need to be followed each time a change is made, we can automate it.

As Python code is dependent on external packages, keeping track of their names and versions is part of automating the build process. A good practice is to list all these packages in a file named requirements.txt. The name can be anything, but the Python community typically tends to call it requirements.txt.

To install the packages, we will execute the following command:

$pip install -r requirements.txt

To create a requirements file that represents the packages used in our code, we can use the following command:

$pip freeze > requirements.txt

The goal of integration is to catch errors and defects early, but it has the potential to make the development process unstable. There will be times when a member of the team has introduced a major bug, thus breaking the code, if other team members may have to wait until that bug is resolved. Robust self-testing by team members and choosing the right frequency for integration will help to resolve the issue. For robust testing, running testing each time a change is made should be implemented. This testing process should be eventually completely automated. In the case of errors, the build should fail and the team member responsible for the defective module should be notified. The team member can choose to first provide a quick fix before taking time to resolve and fully test the problem to make sure other team members are not blocked.

Once the code is built and tested, we can choose to update the deployed code as well. That will implement the CD part. If we choose to have a complete CI/CD process, it means that each time a change is made, it is built and tested and the changes are reflected in the deployed code. If managed properly, the end user will benefit from having a constantly evolving solution. In some use cases, each CI/CD cycle may be an iterative move from MVP to a full solution. In other use cases, we are trying to capture and formulate a fast-changing real-world problem, discarding obsolete assumptions, and incorporating new information. An example is the pattern analysis of the COVID-19 situation, which is changing by the hour. Also, new information is coming at a rapid pace, and any use case related to it may benefit from CI/CD, whereby developers are constantly updating their solutions based on new emerging facts and information.

Next, we will discuss commonly used development environments for Python.

 

Python development environments

Text editors are a tempting choice for editing Python code. But for any medium-to-large-sized project, we have to seriously consider Python integrated development environments (IDEs), which are very helpful for writing, debugging, and troubleshooting the code using the version control and facilitating ease of deployments. There are many IDEs available, mostly free, on the market. In this section, we will review a few of them. Note that we will not try to rank them in any order but will emphasize the value each of them brings, and it is up to the reader to make the best choice based on their past experience, project requirements, and the complexity of their projects.

IDLE

Integrated Development and Learning Environment (IDLE) is a default editor that comes with Python and is available for all main platforms (Windows, macOS, and Linux). It is free and is a decent IDE for beginners for learning purposes. It is not recommended for advanced programming.

Sublime Text

Sublime Text is another popular code editor and can be used for multiple languages. It is free for evaluation purposes only. It is also available for all main platforms (Windows, macOS, and Linux). It comes with basic Python support but with its powerful extensions framework, we can customize it to make a full development environment that needs extra skills and time. Integration with a version control system such as Git or Subversion (SVN) is possible with plugins but may not expose full version control features.

Atom is another popular editor that is also in the same category as Sublime Text. It is free.

PyCharm

PyCharm is one of the best Python IDE editors available for Python programming and it is available for Windows, macOS, and Linux. It is a complete IDE tailored for Python programming, which helps programmers with code completion, debugging, refactoring, smart search, access to popular database servers, integration with version control systems, and many more features. The IDE provides a plugin platform for developers to extend the base functionalities as needed. PyCharm is available in the following formats:

  • Community version, which is free and comes for pure Python development
  • Professional version, which is not free and comes with support for web development such as HyperText Markup Language (HTML), JavaScript, and SQL

Visual Studio Code

Visual Studio Code (VS Code) is an open source environment developed by Microsoft. For Windows, VS Code is the best Python IDE. It does not come with a Python development environment by default. The Python extensions for VS Code can make it a Python development environment.

It is lightweight and full of powerful features. It is free and is also available for macOS and Linux. It comes with powerful features such as code completion, debugging, refactoring, searching, accessing database servers, version control system integration, and much more.

PyDev

If you are using or have used Eclipse, you may like to consider PyDev, which is a third-party editor for Eclipse. It is in the category of one of the best Python IDEs and can also be used for Jython and IronPython. It is free. As PyDev is just a plugin on top of Eclipse, it is available for all major platforms, such as Eclipse. This IDE comes with all the bells and whistles of Eclipse, and on top of that, it streamlines integration with Django, unit testing, and Google App Engine (GAE).

Spyder

If you are planning to use Python for data science and ML, you may want to consider Spyder as your IDE. Spyder is written in Python. This IDE offers tools for full editing, debugging, interactive execution, deep inspection, and advanced visualization capabilities. Additionally, it supports integration with Matplotlib, SciPy, NumPy, Pandas, Cython, IPython, and SymPy to make it a default IDE for data scientists.

Based on the review of different IDEs in this section, we can recommend PyCharm and PyDev for professional application developers. But if you are more into data science and ML, Spyder is surely worth exploring.

 

Summary

In this chapter, we laid down the groundwork for the advanced Python concepts discussed in the later chapters of this book. We started by presenting the flavor, guidance, and ambience of a Python project. We started the technical discussion by first identifying different phases of the Python project and then exploring different ways of optimizing it based on the use cases we are working on. For a terse language such as Python, good-quality documentation goes a long way to make the code readable and explicit.

We also looked into various ways of documenting the Python code. Next, we investigated the recommended ways of creating documentation in Python. We also studied the naming schemes that can help us in making code more readable. Next, we looked into the different ways we can use source control. We also figured out what are the different ways of deploying Python code. Finally, we reviewed a few development environments for Python to help you choose a development environment based on the background they have and the type of project you are going to work on.

The topics we covered in this chapter are beneficial for anyone who is starting a new project involving Python. These discussions help to make the strategy and design decision of a new project promptly and efficiently. In the next chapter, we will investigate how we can modularize the code of a Python project.

 

Questions

  1. What is The Zen of Python?
  2. In Python, what sort of documentation is available at runtime?
  3. What is a CRISP-DM life cycle?
 

Further reading

  • Modern Python Cookbook – Second Edition, by Steven F. Lott
  • Python Programming Blueprints, by Daniel Furtado
  • Secret Recipes of the Python Ninja, by Cody Jackson
 

Answers

  1. A collection of 19 guidelines written by Tim Peters that apply to the design of Python projects.
  2. As opposed to regular comments, docstrings are available at runtime to the compiler.
  3. CRISP-DM stands for Cross-Industry Standard Process for Data Mining. It applies to a Python project life cycle in the ML domain and identifies different phases of a project.

About the Author

  • Muhammad Asif

    Muhammad Asif is a principal solution architect with a wide range of multi-disciplinary experience in web development, network and cloud automation, virtualization, and machine learning. With a strong multi-domain background, he has led many large-scale projects to successful deployment. Although moving to more leadership roles in recent years, Muhammad has always enjoyed solving real-world problems by using appropriate technology and by writing the code himself. He earned a Ph.D. in computer systems from Carleton University, Ottawa, Canada in 2012 and currently works for Nokia as a solution lead.

    Browse publications by this author

Latest Reviews

(2 reviews total)
Great books covering the topics.
I'm still reading it but I can safely say it's a great content.
Python for Geeks
Unlock this book and the full library FREE for 7 days
Start now