Reader small image

You're reading from  Advanced Python Programming - Second Edition

Product typeBook
Published inMar 2022
PublisherPackt
ISBN-139781801814010
Edition2nd Edition
Right arrow
Author (1)
Quan Nguyen
Quan Nguyen
author image
Quan Nguyen

Quan Nguyen is a Python programmer and machine learning enthusiast. He is interested in solving decision-making problems under uncertainty. Quan has authored several books on Python programming and scientific computing. He is currently pursuing a Ph.D. degree in computer science at Washington University in St. Louis, researching Bayesian methods in machine learning.
Read more about Quan Nguyen

Right arrow

Chapter 2: Pure Python Optimizations

As mentioned in the previous chapter, one of the most effective ways of improving the performance of applications is through the use of better algorithms and data structures. The Python standard library provides a large variety of ready-to-use algorithms and data structures that can be directly incorporated into your applications. With the tools learned from this chapter, you will be able to use the right algorithm for the task and achieve massive speed gains.

Even though many algorithms have been around for quite a while, they are especially relevant in today's world as we continuously produce, consume, and analyze ever-increasing amounts of data. Buying a larger server or micro-optimizing can work for some time, but achieving better scaling through algorithmic improvement can solve the problem once and for all.

In this chapter, we will learn how to achieve better scaling using standard algorithms and data structures. More advanced use...

Technical requirements

Using the right algorithms and data structures

Algorithmic improvements are especially effective in increasing performance because they typically allow the application to scale better with increasingly large inputs.

Algorithm running times can be classified according to their computational complexity, a characterization of the resources required to perform a task.

Such classification is expressed through the Big O notation, an upper bound on the operations required to execute the task, which usually depends on the input size. Specifically, Big O notation describes how the runtime or memory requirement of an algorithm grows in terms of the input size. For this reason, a lower Big O denotes a more efficient algorithm, which is what we aim for.

For example, incrementing each element of a list can be implemented using a for loop, as follows:

    input = list(range(10))
    for i, _ in enumerate(input):
     ...

Improved efficiency with caching and memoization

Caching is a great technique used to improve the performance of a wide range of applications. The idea behind caching is to store expensive results in a temporary location, called a cache, that can be located in memory, on disk, or in a remote location.

Web applications make extensive use of caching. In a web application, it is often the case that multiple users request a certain page at the same time. In this case, instead of recomputing the page for each user, the web application can compute it once and serve the user the already rendered page. Ideally, caching also needs a mechanism for invalidation so that if the page needs to be updated, we can recompute it before serving it again. Intelligent caching allows web applications to handle the increasing number of users with fewer resources. Caching can also be done preemptively, such as the later sections of a video getting buffered when watching a video online.

Caching is also...

Efficient iteration with comprehensions and generators

In this section, we will explore a few simple strategies to speed up Python loops using comprehensions and generators. In Python, comprehension and generator expressions are fairly optimized operations and should be preferred in place of explicit for loops, as they are designed to avoid many unnecessary computational overheads during their construction. Another reason to use this construct is readability; even if the speedup over a standard loop is modest, the comprehension and generator syntax is more compact and (most of the time) more intuitive.

In the following example, we can see that both the list comprehension and generator expressions are faster than an explicit loop when combined with the sum function:

    def loop(): 
        res = [] 
        for i in range(100000): 
        ...

Summary

Algorithmic optimization can improve how your application scales as we process increasingly large data. In this chapter, we demonstrated use cases and running times of the most common data structures available in Python, such as lists, deques, dictionaries, heaps, and tries. We also covered caching, a technique that can be used to trade some space, in memory or on disk, in exchange for the increased responsiveness of an application. We also demonstrated how to get modest speed gains by replacing for loops with fast constructs, such as list comprehensions and generator expressions.

Overall, we have seen that by utilizing a specifically designed data structure or technique that is appropriate in certain situations, the efficiency of our program can be greatly improved. The topics covered in this chapter offer us the ability to do just that across a wide range of use cases.

In the subsequent chapters, we will learn how to improve performance further using numerical libraries...

Questions

  1. Identify the best/most appropriate from the data structures covered in this chapter concerning the following use cases:
    1. Mapping items to another set of items (set being used in the most general sense)
    2. Accessing, modifying, and appending elements
    3. Maintaining a collection of unique elements
    4. Keeping track of the minimum/maximum of a set (in the most general sense)
    5. Appending and removing elements at the endpoints of a sequence (in the most general sense).
    6. Fast searching according to some similarity criterion (for example, being used by autocompletion engines).
  2. What is the difference between caching and memoization?
  3. Why are comprehensions and generators (in most situations) more preferred than explicit for loops?
  4. Consider the problem of representing a pairwise association between a set of letters and a set of numbers (for example, a  2, b  1, c  3, and so on), where we need to look at what number a given letter is associated with in our application...
lock icon
The rest of the chapter is locked
You have been reading a chapter from
Advanced Python Programming - Second Edition
Published in: Mar 2022Publisher: PacktISBN-13: 9781801814010
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at £13.99/month. Cancel anytime

Author (1)

author image
Quan Nguyen

Quan Nguyen is a Python programmer and machine learning enthusiast. He is interested in solving decision-making problems under uncertainty. Quan has authored several books on Python programming and scientific computing. He is currently pursuing a Ph.D. degree in computer science at Washington University in St. Louis, researching Bayesian methods in machine learning.
Read more about Quan Nguyen