You're reading from Python Real-World Projects

Product typeBook

Published inSep 2023

PublisherPackt

ISBN-139781803246765

Edition1st Edition

Concepts

Programming Language

Author (1)

Steven F. Lott

Chapter 11
Project 3.7: Interim Data Persistence

Our goal is to create files of clean, converted data we can then use for further analysis. To an extent, the goal of creating a file of clean data has been a part of all of the previous chapters. We’ve avoided looking deeply at the interim results of acquisition and cleaning. This chapter formalizes some of the processing that was quietly assumed in those earlier chapters. In this chapter, we’ll look more closely at two topics:

File formats and data persistence
The architecture of applications

11.1 Description

In the previous chapters, particularly those starting with Chapter 9, Project 3.1: Data Cleaning Base Application, the question of ”persistence” was dealt with casually. The previous chapters all wrote the cleaned samples into a file in ND JSON format. This saved delving into the alternatives and the various choices available. It’s time to review the previous projects and consider the choice of file format for persistence.

What’s important is the overall flow of data from acquisition to analysis. The conceptual flow of data is shown in Figure 11.1.

This differs from the diagram shown in Chapter 2, Overview of the Projects, where the stages were not quite as well defined. Some experience with acquiring and cleaning data helps to clarify the considerations around saving and working with data.

The diagram shows a few of the many choices for persisting interim data. A more complete list of...

11.2 Overall approach

For reference see Chapter 9, Project 3.1: Data Cleaning Base Application, specifically Approach. This suggests that the clean module should have minimal changes from the earlier version.

A cleaning application will have several separate views of the data. There are at least four viewpoints:

The source data. This is the original data as managed by the upstream applications. In an enterprise context, this may be a transactional database with business records that are precious and part of day-to-day operations. The data model reflects considerations of those day-to-day operations.
Data acquisition interim data, usually in a text-centric format. We’ve suggested using ND JSON for this because it allows a tidy dictionary-like collection of name-value pairs, and supports quite complex Python data structures. In some cases, we may perform some summarization of this raw data to standardize scores. This data may be used to diagnose and debug problems with upstream...

11.3 Deliverables

The refactoring of existing applications to formalize the interim file formats leads to changes in existing projects. These changes will ripple through to unit test changes. There should not be any acceptance test changes when refactoring the data model modules.

Adding a ”pick up where you left off” feature, on the other hand, will lead to changes in the application behavior. This will be reflected in the acceptance test suite, as well as unit tests.

The deliverables depend on which projects you’ve completed, and which modules need revision. We’ll look at some of the considerations for these deliverables.

11.3.1 Unit test

A function that creates an output file will need to have test cases with two distinct fixtures. One fixture will have a version of the output file, and the other fixture will have no output file. These fixtures can be built on top of the pytest.tmp_path fixture. This fixture provides a unique temporary directory that...

11.4 Summary

In this chapter, we looked at two important parts of the data acquisition pipeline:

File formats and data persistence
The architecture of applications

There are many file formats available for Python data. It seems like newline delimited (ND) JSON is, perhaps, the best way to handle large files of complex records. It fits well with Pydantic’s capabilities, and the data can be processed readily by Jupyter Notebook applications.

The capability to retry a failed operation without losing existing data can be helpful when working with large data extractions and slow processing. It can be very helpful to be able to re-run the data acquisition without having to wait while previously processed data is processed again.

11.5 Extras

Here are some ideas for you to add to these projects.

11.5.1 Using a SQL database

Using a SQL database for cleaned analytical data can be part of a comprehensive database-centric data warehouse. The implementation, when based on Pydantic, requires the native Python classes as well as the ORM classes that map to the database.

It also requires some care in handling repeated queries for enterprise data. In the ordinary file system, file names can have processing dates. In the database, this is more commonly assigned to an attribute of the data. This means multiple time periods of data occupy a single table, distinguished by the ”as-of” date for the rows.

A common database optimization is to provide a “time dimension” table. For each date, the associated date of the week, fiscal weeks, month, quarter, and year is provided as an attribute. Using this table saves computing any attributes of a date. It also allows the enterprise fiscal calendar to...

The rest of the chapter is locked

You have been reading a chapter from

Python Real-World Projects

Published in: Sep 2023Publisher: PacktISBN-13: 9781803246765

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at €14.99/month. Cancel anytime

Author (1)

Steven F. Lott

Steven Lott has been programming since computers were large, expensive, and rare. Working for decades in high tech has given him exposure to a lot of ideas and techniques, some bad, but most are helpful to others. Since the 1990s, Steven has been engaged with Python, crafting an array of indispensable tools and applications. His profound expertise has led him to contribute significantly to Packt Publishing, penning notable titles like "Mastering Object-Oriented," "The Modern Python Cookbook," and "Functional Python Programming." A self-proclaimed technomad, Steven's unconventional lifestyle sees him residing on a boat, often anchored along the vibrant east coast of the US. He tries to live by the words “Don't come home until you have a story.”
Read more about Steven F. Lott

Personalised recommendations for you

Based on your interests and search pattern

C++ Programming for Linux Systems

This book covers the essential system programming tools and helps you explore the features of C++20. It emphasizes important details to maintain code quality and tackle everyday challenges of developing software for high performance, optimization, and more.

BookSep 2023288 pages

Expert C++

Discover advanced programming techniques, the latest features of C++17 and C++20, and best practices for memory management, debugging, testing, and large-scale application design with Expert C++. Ideal for experienced developers advancing to proficient programmers and building professional-grade C++ applications.

BookAug 2023604 pages

iOS 17 Programming for Beginners

iOS 17 Programming for Beginners, Eighth Edition is your comprehensive guide to learning the art of iOS app development. Whether you dream of creating the next chart-topping app or simply want to enhance your programming skills, this book is your trusted companion on this exciting journey.

BookOct 2023604 pages4

Developer Career Masterplan

Written by industry experts that have spent the last 20+ years helping developers grow their career path towards senior developer positions and beyond. This book provides a comprehensive guide, sharing examples and stories from their global careers. By the end, you’ll have the knowledge to create a clear career progression plan as a technical professional.

BookSep 2023310 pages

Refactoring with C#

In Refactoring with C#, you’ll explore the process of safely refactoring modern .NET code using Visual Studio features, advanced unit tests, AI assistance, and custom Roslyn analyzers.

BookNov 2023434 pages

Python Real-World Projects

Amplify your developer journey by curating a dynamic project portfolio that outshines traditional resumes. Delve into the Python realm through immersive projects, mastering core concepts while constructing comprehensive modules and applications. From data acquisition prowess to impactful data visualization, Python Real-World Projects arms you with essential skills to beat the competition.

BookSep 2023478 pages5

The MVVM Pattern in .NET MAUI

The MVVM Pattern in .NET MAUI enables developers to master MVVM principles and effectively apply them to .NET MAUI. This book uses real-life examples and covers complex problems to help you successfully apply MVVM with .NET MAUI to confidently develop robust and high-performing cross-platform apps.

BookNov 2023386 pages

Extending Microsoft Business Central with Power Platform

Extending Business Central with the Power Platform is a step-by-step guide for Business Central professionals to create solutions that automate business processes, explain complex workflow approvals, and integrate with hundreds of other systems, without traditional development. It’ll guide you in customizing Business Central with Power Platform.

BookAug 2023458 pages5

Extending Microsoft Business Central with Power Platform

Extending Business Central with the Power Platform is a step-by-step guide for Business Central professionals to create solutions that automate business processes, explain complex workflow approvals, and integrate with hundreds of other systems, without traditional development. It’ll guide you in customizing Business Central with Power Platform.

BookAug 2023458 pages5

Quantum Computing Algorithms

The book emphasizes intuitive ideas behind quantum algorithms in ways that other books don’t cover, striking a careful balance between no math and too much math. To get the most from this book, you should be comfortable with basic algebra and writing simple computer code. No prior understanding of quantum physics is needed to get started.

BookSep 2023342 pages

Python – Complete Python, Django, Data Science and ML Guide

Unlock Python's full potential with this 50+ hour course! From programming to web and game development, data manipulation, and machine learning, gain the skills required to succeed in various Python-related careers. With practical tasks, hands-on experience, and a strong foundation in Python, you'll be ready to tackle real-world challenges and take advantage of the many opportunities this versatile language offers.

VideoNov 202350 hours 30 minutes5

Python – Complete Python, Django, Data Science and ML Guide

Unlock Python's full potential with this 50+ hour course! From programming to web and game development, data manipulation, and machine learning, gain the skills required to succeed in various Python-related careers. With practical tasks, hands-on experience, and a strong foundation in Python, you'll be ready to tackle real-world challenges and take advantage of the many opportunities this versatile language offers.

VideoNov 202350 hours 30 minutes5

You're reading from Python Real-World Projects

Chapter 11 Project 3.7: Interim Data Persistence

11.1 Description

11.2 Overall approach

11.3 Deliverables

11.3.1 Unit test

11.4 Summary

11.5 Extras

11.5.1 Using a SQL database

Unlock this book and the full library FREE for 7 days

Author (1)

C++ Programming for Linux Systems

This book covers the essential system programming tools and helps you explore the features of C++20. It emphasizes important details to maintain code quality and tackle everyday challenges of developing software for high performance, optimization, and more.

Expert C++

iOS 17 Programming for Beginners

iOS 17 Programming for Beginners, Eighth Edition is your comprehensive guide to learning the art of iOS app development. Whether you dream of creating the next chart-topping app or simply want to enhance your programming skills, this book is your trusted companion on this exciting journey.

Developer Career Masterplan

Refactoring with C#

In Refactoring with C#, you’ll explore the process of safely refactoring modern .NET code using Visual Studio features, advanced unit tests, AI assistance, and custom Roslyn analyzers.

Python Real-World Projects

The MVVM Pattern in .NET MAUI

The MVVM Pattern in .NET MAUI enables developers to master MVVM principles and effectively apply them to .NET MAUI. This book uses real-life examples and covers complex problems to help you successfully apply MVVM with .NET MAUI to confidently develop robust and high-performing cross-platform apps.

Extending Microsoft Business Central with Power Platform

Extending Microsoft Business Central with Power Platform

Quantum Computing Algorithms

Python – Complete Python, Django, Data Science and ML Guide

Python – Complete Python, Django, Data Science and ML Guide

Chapter 11
Project 3.7: Interim Data Persistence