You're reading from Learn Python by Building Data Science Applications

Product typeBook

Published inAug 2019

Reading LevelIntermediate

PublisherPackt

ISBN-139781789535365

Edition1st Edition

Languages

Python

Tools

Pygame

Concepts

Application Development

Authors (2):

Philipp Kats

David Katz

View More author details

Packaging and Testing with Poetry and PyTest

Until now, all our code has lived in either notebooks or Python files. While that is totally fine, with the growth in volume and complexity of our code, it is increasingly becoming a good idea to form one or more go-to sources for the code we use most frequently, as well as sources for the complex code that we don't want to risk adding mistakes to.

In this chapter, we will learn how to build our own packages for use in multiple projects or to be easily shared with others, using the poetry package. A package can work as a deliverable—something you can pass to or share with your client! Building and testing packages is a vital skill that increases your productivity and allows you to save time and reduce stress by enabling you to reuse the same properly tested body of code again and again.

Building packages also likely to...

Technical requirements

For this chapter, we will need the following packages (as always, they are included in our base environment):

poetry
pytest
sphinx

As we're creating an independent package, all the code is stored on GitHub at https://github.com/PacktPublishing/Learn-Python-by-Building-Data-Science-Applications.

Building a package

So far in this book, we have been either using third-party packages, such as requests and pandas, or writing raw code as .py scripts or notebooks. While using Python files directly is absolutely fine for certain projects, it makes it hard for code to be reused and built upon; it is not sustainable for complex algorithms and tools that can be used over and over again. Such code is also hard to share as it has no overall structure, tends to decay over time, and doesn't have a robust dependency system; the code may not work on other systems with other packages (or other versions of packages) installed. Last but not least, this kind of practice affects the quality of our code, as we tend to write and use the code as a one-time solution. The best way to mitigate all those issues at once is to form your code into a package.

But what is a package? In Python, packages...

A few ways to build your package

The structure of a Python package is defined by a few specifications (https://packaging.python.org/specifications/) and PEPs (short for Python Enhancement Proposals, such as PEP517—https://www.python.org/dev/peps/pep-0517/, PEP518—https://www.python.org/dev/peps/pep-0518/, and PEP427—https://www.python.org/dev/peps/pep-0427/), and the overall definition comes from the Python Packaging Authority (PyPA). In essence, a package is required to have, in addition to the actual code, a special file with metadata, including the package name, the description version, the tags, Python version support details, the authors, and the dependencies. This file could be a Python setup.py file—which was the standard solution for a long time—or a pyproject.toml file. The latter is a new, safer approach, but does not have as well-designed...

Testing the code so far

How would we know whether the code is good, anyway? The only good way is to rigorously test your code. While it may sound like a lot of somewhat unnecessary work, it is a practice that will repay you many times over in the future—once you're sure your code behaves as intended, it is much easier to add new features and be sure that they didn't break any of the existing ones. Furthermore, you can upgrade dependencies or compare different implementations, all being sure that your code behaves as intended.

As for many other things, Python has a standard library for testing—unittest. In contrast to most of the standard libraries, however, unittest is fairly unpopular. Instead, another library, pytest, is considered the de facto industry standard for Python testing, as it provides a clean and reusable pattern of code and has support...

Automating the process with CI services

Now, as you may recall, we are working on a tests branch of our repository. If you go to GitHub, it may offer to create a pull request—a procedure meant to merge your branch into the master branch or any other branch, as in the yellow section of the following screenshot. Even if the interface does not offer this (it won't if there was already a pull request a few minutes before), you can create a pull request yourself, via the New pull request button. See the following screenshot:

Using GitHub, you can request other people to review your changes, comment on them, and more; GitHub will also confirm whether merging is possible or whether you'll need to resolve conflicts first.

While, in our case, we did run our tests locally and we know it is safe to merge, there is no way for others to check that easily. In order to make...

Generating documentation generation with sphinx

Documentation is king when it comes to supporting consumers of your code and convincing newcomers that it actually makes sense to buy in and use your package. For most people, a documentation website is the first place they go to learn about the package. It is, by definition, assumed to be the single source of truth on the code in its current version.

The role of documentation is usually threefold:

Explain how to install your package and what the general requirements are (for example, which Python versions are supported)
Show how to use the package (preferably with a quick example showing its immediate value)
Express the general idea and philosophy of the package

A documentation website does benefit from having tutorials, example cases, and a roadmap. With that being said, the core of any documentation website is, obviously, documentation...

Installing a package in editable mode

As we have mentioned, you can install a package from GitHub and it will behave the same as any other installed package—it can be upgraded or uninstalled.

Often, however, you will want to use a package while developing it. It would be hard to do both in the normal installation routine; you'd have to either update or re-install the package every time you made any developmental changes, just to reflect those changes. To get around this, there is a great feature that keeps the advantages of both worlds—your code is treated as a package but can be easily modified in place. This feature is called editable mode. Essentially, it means the folder on your filesystem is registered as a package, and so the imported package will always reflect all the changes that you've made.

In order to reap these benefits, you have to have a...

Summary

In this chapter, we went over all the processes of packaging code. In particular, we created a GitHub repository, generated a template via poetry, and added all the dependencies, meaning everyone can now install the package from GitHub using pip. We then went further, adding a few tests to make sure our package works as expected throughout future development. To simplify the process and make it transparent, we integrated a CI service, Azure pipelines, to run tests on each pull request in order to prevent us from merging failing code into production.

In the next chapter, we will review another case, building a robust, secure, production-ready data pipeline using luigi.

Questions

What are the benefits of packaging code?
What is the main difference between conda and pip as package managers?
What is dependency resolution, and why is it hard?
What are the benefits of poetry over standard setuptools?
Why do we need tests?
What is the purpose of CI?

Philipp Kats is a researcher at the Urban Complexity Lab, NYU CUSP, a research fellow at Kazan Federal University, and a data scientist at StreetEasy, with many years of experience in software development. His interests include data analysis, urban studies, data journalism, and visualization. Having a bachelor's degree in architectural design and a having followed the rocky path (at first) of being a self-taught developer, Philipp knows the pain points of learning programming and is eager to share his experience.
Read more about Philipp Kats

David Katz

David Katz is a researcher and holds a Ph.D. in mathematics. As a mathematician at heart, he sees code as a tool to express his questions. David believes that code literacy is essential as it applies to most disciplines and professions. David is passionate about sharing his knowledge and has 6 years of experience teaching college and high school students.
Read more about David Katz

Personalised recommendations for you

Based on your interests and search pattern

C++ Programming for Linux Systems

This book covers the essential system programming tools and helps you explore the features of C++20. It emphasizes important details to maintain code quality and tackle everyday challenges of developing software for high performance, optimization, and more.

BookSep 2023288 pages

Expert C++

Discover advanced programming techniques, the latest features of C++17 and C++20, and best practices for memory management, debugging, testing, and large-scale application design with Expert C++. Ideal for experienced developers advancing to proficient programmers and building professional-grade C++ applications.

BookAug 2023604 pages

iOS 17 Programming for Beginners

iOS 17 Programming for Beginners, Eighth Edition is your comprehensive guide to learning the art of iOS app development. Whether you dream of creating the next chart-topping app or simply want to enhance your programming skills, this book is your trusted companion on this exciting journey.

BookOct 2023604 pages4

Developer Career Masterplan

Written by industry experts that have spent the last 20+ years helping developers grow their career path towards senior developer positions and beyond. This book provides a comprehensive guide, sharing examples and stories from their global careers. By the end, you’ll have the knowledge to create a clear career progression plan as a technical professional.

BookSep 2023310 pages

Refactoring with C#

In Refactoring with C#, you’ll explore the process of safely refactoring modern .NET code using Visual Studio features, advanced unit tests, AI assistance, and custom Roslyn analyzers.

BookNov 2023434 pages

Python Real-World Projects

Amplify your developer journey by curating a dynamic project portfolio that outshines traditional resumes. Delve into the Python realm through immersive projects, mastering core concepts while constructing comprehensive modules and applications. From data acquisition prowess to impactful data visualization, Python Real-World Projects arms you with essential skills to beat the competition.

BookSep 2023478 pages5

The MVVM Pattern in .NET MAUI

The MVVM Pattern in .NET MAUI enables developers to master MVVM principles and effectively apply them to .NET MAUI. This book uses real-life examples and covers complex problems to help you successfully apply MVVM with .NET MAUI to confidently develop robust and high-performing cross-platform apps.

BookNov 2023386 pages

Extending Microsoft Business Central with Power Platform

Extending Business Central with the Power Platform is a step-by-step guide for Business Central professionals to create solutions that automate business processes, explain complex workflow approvals, and integrate with hundreds of other systems, without traditional development. It’ll guide you in customizing Business Central with Power Platform.

BookAug 2023458 pages5

Extending Microsoft Business Central with Power Platform

Extending Business Central with the Power Platform is a step-by-step guide for Business Central professionals to create solutions that automate business processes, explain complex workflow approvals, and integrate with hundreds of other systems, without traditional development. It’ll guide you in customizing Business Central with Power Platform.

BookAug 2023458 pages5

Quantum Computing Algorithms

The book emphasizes intuitive ideas behind quantum algorithms in ways that other books don’t cover, striking a careful balance between no math and too much math. To get the most from this book, you should be comfortable with basic algebra and writing simple computer code. No prior understanding of quantum physics is needed to get started.

BookSep 2023342 pages

Python – Complete Python, Django, Data Science and ML Guide

Unlock Python's full potential with this 50+ hour course! From programming to web and game development, data manipulation, and machine learning, gain the skills required to succeed in various Python-related careers. With practical tasks, hands-on experience, and a strong foundation in Python, you'll be ready to tackle real-world challenges and take advantage of the many opportunities this versatile language offers.

VideoNov 202350 hours 30 minutes5

Python – Complete Python, Django, Data Science and ML Guide

Unlock Python's full potential with this 50+ hour course! From programming to web and game development, data manipulation, and machine learning, gain the skills required to succeed in various Python-related careers. With practical tasks, hands-on experience, and a strong foundation in Python, you'll be ready to tackle real-world challenges and take advantage of the many opportunities this versatile language offers.

VideoNov 202350 hours 30 minutes5

You're reading from Learn Python by Building Data Science Applications

Unlock this book and the full library FREE for 7 days

Authors (2)

C++ Programming for Linux Systems

This book covers the essential system programming tools and helps you explore the features of C++20. It emphasizes important details to maintain code quality and tackle everyday challenges of developing software for high performance, optimization, and more.

Expert C++

iOS 17 Programming for Beginners

iOS 17 Programming for Beginners, Eighth Edition is your comprehensive guide to learning the art of iOS app development. Whether you dream of creating the next chart-topping app or simply want to enhance your programming skills, this book is your trusted companion on this exciting journey.

Developer Career Masterplan

Refactoring with C#

In Refactoring with C#, you’ll explore the process of safely refactoring modern .NET code using Visual Studio features, advanced unit tests, AI assistance, and custom Roslyn analyzers.

Python Real-World Projects

The MVVM Pattern in .NET MAUI

The MVVM Pattern in .NET MAUI enables developers to master MVVM principles and effectively apply them to .NET MAUI. This book uses real-life examples and covers complex problems to help you successfully apply MVVM with .NET MAUI to confidently develop robust and high-performing cross-platform apps.

Extending Microsoft Business Central with Power Platform

Extending Microsoft Business Central with Power Platform

Quantum Computing Algorithms

Python – Complete Python, Django, Data Science and ML Guide

Python – Complete Python, Django, Data Science and ML Guide