You're reading from Python Real-World Projects

Product typeBook

Published inSep 2023

PublisherPackt

ISBN-139781803246765

Edition1st Edition

Concepts

Programming Language

Author (1)

Steven F. Lott

Chapter 15
Project 5.1: Modeling Base Application

The next step in the pipeline from acquisition to clean-and-convert is the analysis and some preliminary modeling of the data. This may lead us to use the data for a more complex model or perhaps machine learning. This chapter will guide you through creating another application in the three-stage pipeline to acquire, clean, and model a collection of data. This first project will create the application with placeholders for more detailed and application-specific modeling components. This makes it easier to insert small statistical models that can be replaced with more elaborate processing if needed.

In this chapter, we’ll look at two parts of data analysis:

CLI architecture and how to design a more complex pipeline of processes for gathering and analyzing data
The core concepts of creating a statistical model of the data

Viewed from a distance, all analytical work can be considered to be creating a simplified model of important...

15.1 Description

This application will create a report on a dataset presenting a number of statistics. This automates the ongoing monitoring aspect of an Analysis Notebook, reducing the manual steps and creating reproducible results. The automated computations stem from having a statistical model for the data, often created in an analysis notebook, where alternative models are explored. This reflects variables with values in an expected range.

For industrial monitoring, this is part of an activity called Gage repeatability and reproducibility. The activity seeks to confirm that measurements are repeatable and reproducible. This is described as looking at a “measurement instrument.” While we often think of an instrument as being a machine or a device, the definition is actually very broad. A survey or questionnaire is a measurement instrument focused on people’s responses to questions.

When these computed statistics deviate from expectations, it suggests something...

15.2 Approach

We’ll take some guidance from the C4 model ( https://c4model.com) when looking at our approach:

Context: For this project, a context diagram would show a user creating analytical reports. You may find it helpful to draw this diagram.
Containers: There only seems to be one container: the user’s personal computer.
Components: We’ll address the components below.
Code: We’ll touch on this to provide some suggested directions.

The heart of this application is a module to summarize data in a way that lets us test whether it fits the expectations of a model. The statistical model is a simplified reflection of the underlying real-world processes that created the source data. The model’s simplifications include assumptions about events, measurements, internal state changes, and other details of the processing being observed.

For very simple cases — like Anscombe’s Quartet data — there are only two variables, which leaves...

15.3 Deliverables

This project has the following deliverables:

Documentation in the docs folder.
Acceptance tests in the tests/features and tests/steps folders.
Unit tests for model module classes in the tests folder.
Mock objects for the csv_extract module tests will be part of the unit tests.
Unit tests for the csv_extract module components that are in the tests folder.
An application to summarize the cleaned data in a TOML file.
An application secondary feature to transform the TOML file to an HTML page or PDF file with the summary.

In some cases, especially for particularly complicated applications, the summary statistics may be best implemented as a separate module. This module can then be expanded and modified without making significant changes to the overall application.

The idea is to distinguish between these aspects of this application:

The CLI, which includes argument parsing and sensible handling of input and output paths.
The statistical model, which evolves as our understanding...

15.4 Summary

In this chapter we have created a foundation for building and using a statistical model of source data. We’ve looked at the following topics:

Designing and building a more complex pipeline of processes for gathering and analyzing data.
Some of the core concepts behind creating a statistical model of some data.
Use of the built-in statistics library.
Publishing the results of the statistical measures.

This application tends to be relatively small. The actual computations of the various statistical values leverage the built-in statistics library and tend to be very small. It often seems like there’s far more programming involved in parsing the CLI argument values, and creating the required output file, than doing the “real work” of this application.

This is a consequence of the way we’ve been separating the various concerns in data acquisition, cleaning, and analysis. We’ve partitioned the work into several, isolated stages along...

15.5 Extras

Here are some ideas for you to add to this project.

15.5.1 Measures of shape

The measurements of shape often involve two computations for skewness and kurtosis. These functions are not part of Python’s built-in statistics library.

It’s important to note that there are a very large number of distinct, well-understood distributions of data. The normal distribution is one of many different ways data can be distributed.

See https://www.itl.nist.gov/div898/handbook/eda/section3/eda366.htm.

One measure of skewness is the following:

∑(Y− ¯Y)3 ----iN---- g1 = s3

Where Ȳ is the mean, and s is the standard deviation.

A symmetric distribution will have a skewness, g₁, near zero. Larger numbers indicate a ”long tail” opposite a large concentration of data around the mean.

One measure of kurtosis is the following:

∑ (Y −Y¯)4 ---iN----- kurtosis = s4

The kurtosis for the standard normal distribution is 3. A value larger than 3 suggests more data is in the tails; it’s ”flatter” or ”...

The rest of the chapter is locked

You have been reading a chapter from

Python Real-World Projects

Published in: Sep 2023Publisher: PacktISBN-13: 9781803246765

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Steven F. Lott

Steven Lott has been programming since computers were large, expensive, and rare. Working for decades in high tech has given him exposure to a lot of ideas and techniques, some bad, but most are helpful to others. Since the 1990s, Steven has been engaged with Python, crafting an array of indispensable tools and applications. His profound expertise has led him to contribute significantly to Packt Publishing, penning notable titles like "Mastering Object-Oriented," "The Modern Python Cookbook," and "Functional Python Programming." A self-proclaimed technomad, Steven's unconventional lifestyle sees him residing on a boat, often anchored along the vibrant east coast of the US. He tries to live by the words “Don't come home until you have a story.”
Read more about Steven F. Lott

Personalised recommendations for you

Based on your interests and search pattern

C++ Programming for Linux Systems

This book covers the essential system programming tools and helps you explore the features of C++20. It emphasizes important details to maintain code quality and tackle everyday challenges of developing software for high performance, optimization, and more.

BookSep 2023288 pages

Expert C++

Discover advanced programming techniques, the latest features of C++17 and C++20, and best practices for memory management, debugging, testing, and large-scale application design with Expert C++. Ideal for experienced developers advancing to proficient programmers and building professional-grade C++ applications.

BookAug 2023604 pages

iOS 17 Programming for Beginners

iOS 17 Programming for Beginners, Eighth Edition is your comprehensive guide to learning the art of iOS app development. Whether you dream of creating the next chart-topping app or simply want to enhance your programming skills, this book is your trusted companion on this exciting journey.

BookOct 2023604 pages4

Developer Career Masterplan

Written by industry experts that have spent the last 20+ years helping developers grow their career path towards senior developer positions and beyond. This book provides a comprehensive guide, sharing examples and stories from their global careers. By the end, you’ll have the knowledge to create a clear career progression plan as a technical professional.

BookSep 2023310 pages

Refactoring with C#

In Refactoring with C#, you’ll explore the process of safely refactoring modern .NET code using Visual Studio features, advanced unit tests, AI assistance, and custom Roslyn analyzers.

BookNov 2023434 pages

Python Real-World Projects

Amplify your developer journey by curating a dynamic project portfolio that outshines traditional resumes. Delve into the Python realm through immersive projects, mastering core concepts while constructing comprehensive modules and applications. From data acquisition prowess to impactful data visualization, Python Real-World Projects arms you with essential skills to beat the competition.

BookSep 2023478 pages5

The MVVM Pattern in .NET MAUI

The MVVM Pattern in .NET MAUI enables developers to master MVVM principles and effectively apply them to .NET MAUI. This book uses real-life examples and covers complex problems to help you successfully apply MVVM with .NET MAUI to confidently develop robust and high-performing cross-platform apps.

BookNov 2023386 pages

Extending Microsoft Business Central with Power Platform

Extending Business Central with the Power Platform is a step-by-step guide for Business Central professionals to create solutions that automate business processes, explain complex workflow approvals, and integrate with hundreds of other systems, without traditional development. It’ll guide you in customizing Business Central with Power Platform.

BookAug 2023458 pages5

Extending Microsoft Business Central with Power Platform

Extending Business Central with the Power Platform is a step-by-step guide for Business Central professionals to create solutions that automate business processes, explain complex workflow approvals, and integrate with hundreds of other systems, without traditional development. It’ll guide you in customizing Business Central with Power Platform.

BookAug 2023458 pages5

Quantum Computing Algorithms

The book emphasizes intuitive ideas behind quantum algorithms in ways that other books don’t cover, striking a careful balance between no math and too much math. To get the most from this book, you should be comfortable with basic algebra and writing simple computer code. No prior understanding of quantum physics is needed to get started.

BookSep 2023342 pages

Python – Complete Python, Django, Data Science and ML Guide

Unlock Python's full potential with this 50+ hour course! From programming to web and game development, data manipulation, and machine learning, gain the skills required to succeed in various Python-related careers. With practical tasks, hands-on experience, and a strong foundation in Python, you'll be ready to tackle real-world challenges and take advantage of the many opportunities this versatile language offers.

VideoNov 202350 hours 30 minutes5

Python – Complete Python, Django, Data Science and ML Guide

Unlock Python's full potential with this 50+ hour course! From programming to web and game development, data manipulation, and machine learning, gain the skills required to succeed in various Python-related careers. With practical tasks, hands-on experience, and a strong foundation in Python, you'll be ready to tackle real-world challenges and take advantage of the many opportunities this versatile language offers.

VideoNov 202350 hours 30 minutes5

You're reading from Python Real-World Projects

Chapter 15 Project 5.1: Modeling Base Application

15.1 Description

15.2 Approach

15.3 Deliverables

15.4 Summary

15.5 Extras

15.5.1 Measures of shape

Unlock this book and the full library FREE for 7 days

Author (1)

C++ Programming for Linux Systems

This book covers the essential system programming tools and helps you explore the features of C++20. It emphasizes important details to maintain code quality and tackle everyday challenges of developing software for high performance, optimization, and more.

Expert C++

iOS 17 Programming for Beginners

iOS 17 Programming for Beginners, Eighth Edition is your comprehensive guide to learning the art of iOS app development. Whether you dream of creating the next chart-topping app or simply want to enhance your programming skills, this book is your trusted companion on this exciting journey.

Developer Career Masterplan

Refactoring with C#

In Refactoring with C#, you’ll explore the process of safely refactoring modern .NET code using Visual Studio features, advanced unit tests, AI assistance, and custom Roslyn analyzers.

Python Real-World Projects

The MVVM Pattern in .NET MAUI

The MVVM Pattern in .NET MAUI enables developers to master MVVM principles and effectively apply them to .NET MAUI. This book uses real-life examples and covers complex problems to help you successfully apply MVVM with .NET MAUI to confidently develop robust and high-performing cross-platform apps.

Extending Microsoft Business Central with Power Platform

Extending Microsoft Business Central with Power Platform

Quantum Computing Algorithms

Python – Complete Python, Django, Data Science and ML Guide

Python – Complete Python, Django, Data Science and ML Guide

Chapter 15
Project 5.1: Modeling Base Application