Reader small image

You're reading from  Learn Python by Building Data Science Applications

Product typeBook
Published inAug 2019
Reading LevelIntermediate
PublisherPackt
ISBN-139781789535365
Edition1st Edition
Languages
Tools
Right arrow
Authors (2):
Philipp Kats
Philipp Kats
author image
Philipp Kats

Philipp Kats is a researcher at the Urban Complexity Lab, NYU CUSP, a research fellow at Kazan Federal University, and a data scientist at StreetEasy, with many years of experience in software development. His interests include data analysis, urban studies, data journalism, and visualization. Having a bachelor's degree in architectural design and a having followed the rocky path (at first) of being a self-taught developer, Philipp knows the pain points of learning programming and is eager to share his experience.
Read more about Philipp Kats

David Katz
David Katz
author image
David Katz

David Katz is a researcher and holds a Ph.D. in mathematics. As a mathematician at heart, he sees code as a tool to express his questions. David believes that code literacy is essential as it applies to most disciplines and professions. David is passionate about sharing his knowledge and has 6 years of experience teaching college and high school students.
Read more about David Katz

View More author details
Right arrow

Best Practices and Python Performance

After going through the preceding chapters and learning various things about Python, we have come to the last chapter. Here, we want to discuss some general strategies that you can implement and how to write code that works faster, is cleaner, and is easier to maintain. These approaches can be used for data-oriented codeor any other type of code, for that matter.

This chapter is split into three parts. The first section will discuss how you can analyze and speed up your code, the second section will cover best practices for maintaining your code so that you'll code faster and cleaner, and in the third and final section, we'll go through a brief overview of the non-Python technologies that you might find useful for your projects.

The following topics will be covered in this chapter:

  • Ways to monitor performance and identify...

Technical requirements

Speeding up your Python code

In the previous chapter, we talked about different best practices, approaches, and ways to boost code performance. As a toy example for performance, we'll build our own KNN model, which we used in Chapter 13, Training a Machine Learning Model. As a reminder, KNN is a simple ML model that predicts the target variable by identifying K closest records in the training set, then taking a mode (for classification) or weighted average (for regression) of the target variable. Obviously, there are quite a few implementations of KNN already, and so we will use one as an example.

For starters, let's write a naive implementation; it has already been fairly optimized through the use of NumPy commands. First, let's import all the Euclidean distance measuring functions and define a function to get the N-closest records. Take a look at the following...

Using best practices for coding in your project

In this section, we'll switch to another, although adjacent, topicbest practices for maintaining good quality code. Here, we will define "good" in a broad wayas dry, concise, expressive, easy to read, change, and build upon. To illustrate this topic, we will review the wikiwwii package we built in Chapter 15, Packaging and Testing with Poetry and PyTest.

All the changes we make to the package throughout this chapter are stored on the best-practices branch in this book's GitHub repository.

Code formatting with black

First of all, let's talk about formatting. It may sound like a minor issueand it generally is—but formatting...

Beyond this book – packages and technologies to look out for

Throughout this book, we've shared a wide range of Python frameworks and libraries for data-driven development. However, there are some tools we couldn't fit in, but that you need to be aware of. We'll discuss some of them here. In particular, we want to cover three somewhat connected topicsPython flavors, Docker containers, and Kubernetes.

Different Python flavors

In the Numba section, we showed you how to use Numba to speed Python code up. To do so, Numba uses a modern compilation engine. It does so by exploiting the C nature of Python. Another project, Cython, does the same—it compiles Python code into C using a somewhat different...

Summary

In this final chapter, we covered multiple topics on code performance and quality and discussed a few important technologies beyond Python. In particular, we discussed how the combination of efficient code, a better understanding of requirements, and smart usage of appropriate data structures can significantly speed up the performance of codein our case, a hundred times more performant! Then, we discussed how we can deal with big data by computing in parallel on multiple CPUsor multiple machines in the cluster.

In the second part of this chapter, we discussed a few ways to keep code quality under controlby running sophisticated non-deterministic test suits, automating code formatting, and tracking code maintainability.

Both code performance and quality are important. Knowing ways to measure and improve both are necessary skills for a professional...

Questions

  1. How can we measure which line in the code took the most time to complete?
  2. Does NumPy run faster than Pandas?
  3. When should we use Numba? What are the challenges and benefits of using Numba?
  4. When should we use Dask?
  5. Does code formatting matter? Why is Black better than linters?
  6. How does Hypothesis help you test your code?

Further reading

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Learn Python by Building Data Science Applications
Published in: Aug 2019Publisher: PacktISBN-13: 9781789535365
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
Philipp Kats

Philipp Kats is a researcher at the Urban Complexity Lab, NYU CUSP, a research fellow at Kazan Federal University, and a data scientist at StreetEasy, with many years of experience in software development. His interests include data analysis, urban studies, data journalism, and visualization. Having a bachelor's degree in architectural design and a having followed the rocky path (at first) of being a self-taught developer, Philipp knows the pain points of learning programming and is eager to share his experience.
Read more about Philipp Kats

author image
David Katz

David Katz is a researcher and holds a Ph.D. in mathematics. As a mathematician at heart, he sees code as a tool to express his questions. David believes that code literacy is essential as it applies to most disciplines and professions. David is passionate about sharing his knowledge and has 6 years of experience teaching college and high school students.
Read more about David Katz