Reader small image

You're reading from  Python Real-World Projects

Product typeBook
Published inSep 2023
PublisherPackt
ISBN-139781803246765
Edition1st Edition
Right arrow
Author (1)
Steven F. Lott
Steven F. Lott
author image
Steven F. Lott

Steven Lott has been programming since computers were large, expensive, and rare. Working for decades in high tech has given him exposure to a lot of ideas and techniques, some bad, but most are helpful to others. Since the 1990s, Steven has been engaged with Python, crafting an array of indispensable tools and applications. His profound expertise has led him to contribute significantly to Packt Publishing, penning notable titles like "Mastering Object-Oriented," "The Modern Python Cookbook," and "Functional Python Programming." A self-proclaimed technomad, Steven's unconventional lifestyle sees him residing on a boat, often anchored along the vibrant east coast of the US. He tries to live by the words “Don't come home until you have a story.”
Read more about Steven F. Lott

Right arrow

Chapter 3
Project 1.1: Data Acquisition Base Application

The beginning of the data pipeline is acquiring the raw data from various sources. This chapter has a single project to create a command-line application (CLI) that extracts relevant data from files in CSV format. This initial application will restructure the raw data into a more useful form. Later projects (starting in Chapter 9, Project 3.1: Data Cleaning Base Application) will add features for cleaning and validating the data.

This chapter’s project covers the following essential skills:

  • Application design in general. This includes an object-oriented design and the SOLID design principles, as well as functional design.

  • A few CSV file processing techniques. This is a large subject area, and the project focuses on restructuring source data into a more usable form.

  • CLI application construction.

  • Creating acceptance tests using the Gherkin language and behave step definitions.

  • Creating unit tests with mock objects...

3.1 Description

Analysts and decision-makers need to acquire data for further analysis. In many cases, the data is available in CSV-formatted files. These files may be extracts from databases or downloads from web services.

For testing purposes, it’s helpful to start with something relatively small. Some of the Kaggle data sets are very, very large, and require sophisticated application design. One of the most fun small data sets to work with is Anscombe’s Quartet. This can serve as a test case to understand the issues and concerns in acquiring raw data.

We’re interested in a few key features of an application to acquire data:

  • When gathering data from multiple sources, it’s imperative to convert it to a common format. Data sources vary, and will often change with software upgrades. The acquisition process needs to be flexible with respect to data sources and avoid assumptions about formats.

  • A CLI application permits a variety of automation possibilities....

3.2 Architectural approach

We’ll take some guidance from the C4 model ( https://c4model.com) when looking at our approach.

  • Context: For this project, a context diagram would show a user extracting data from a source. You may find it helpful to draw this diagram.

  • Containers: This project will run on the user’s personal computer. As with the context, the diagram is small, but some readers may find it helpful to take the time to draw it.

  • Components: We’ll address these below.

  • Code: We’ll touch on this to provide some suggested directions.

We can decompose the software architecture into these two important components:

  • model: This module has definitions of target objects. In this project, there’s only a single class here.

  • extract: This module will read the source document and creates model objects.

Additionally, there will need to be these additional functions:

  • A function for parsing the command-line options.

  • A main() function to parse options and do...

3.3 Deliverables

This project has the following deliverables:

  • Documentation in the docs folder.

  • Acceptance tests in the tests/features and tests/steps folders.

  • Unit tests for model module classes in the tests folder.

  • Mock objects for the csv_extract module tests will be part of the unit tests.

  • Unit tests for the csv_extract module components in the tests folder.

  • Application to acquire data from a CSV file in the src folder.

An easy way to start is by cloning the project zero directory to start this project. Be sure to update the pyproject.toml and README.md when cloning; the author has often been confused by out-of-date copies of old projects’ metadata.

We’ll look at a few of these deliverables in a little more detail. We’ll start with some suggestions for creating the acceptance tests.

3.3.1 Acceptance tests

The acceptance tests need to describe the overall application’s behavior from the user’s point of view. The scenarios will follow the UX...

3.4 Summary

This chapter introduced the first project, the Data Acquisition Base Application. This application extracts data from a CSV file with a complex structure, creating four separate series of data points from a single file.

To make the application complete, we included a command-line interface and logging. This will make sure the application behaves well in a controlled production environment.

An important part of the process is designing an application that can be extended to handle data from a variety of sources and in a variety of formats. The base application contains modules with very small implementations that serve as a foundation for making subsequent extensions.

Perhaps the most difficult part of this project is creating a suite of acceptance tests to describe the proper behavior. It’s common for developers to compare the volume of test code with the application code and claim testing is taking up ”too much” of their time.

Pragmatically, a program...

3.5 Extras

Here are some ideas for you to add to this project.

3.5.1 Logging enhancements

We skimmed over logging, suggesting only that it’s important and that the initialization for logging should be kept separate from the processing within the main() function.

The logging module has a great deal of sophistication, however, and it can help to explore this. We’ll start with logging ”levels”.

Many of our logging messages will be created with the INFO level of logging. For example:

logger.info("%d rows processed", input_count)

This application has a number of possible error situations that are best reflected with error-level logging.

Additionally, there is a tree of named loggers. The root logger, named "", has settings that apply to all the lower-level loggers. This tree tends to parallel the way object inheritance is often used to create classes and subclasses. This can make it advantageous to create loggers for each class...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Python Real-World Projects
Published in: Sep 2023Publisher: PacktISBN-13: 9781803246765
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Steven F. Lott

Steven Lott has been programming since computers were large, expensive, and rare. Working for decades in high tech has given him exposure to a lot of ideas and techniques, some bad, but most are helpful to others. Since the 1990s, Steven has been engaged with Python, crafting an array of indispensable tools and applications. His profound expertise has led him to contribute significantly to Packt Publishing, penning notable titles like "Mastering Object-Oriented," "The Modern Python Cookbook," and "Functional Python Programming." A self-proclaimed technomad, Steven's unconventional lifestyle sees him residing on a boat, often anchored along the vibrant east coast of the US. He tries to live by the words “Don't come home until you have a story.”
Read more about Steven F. Lott