Reader small image

You're reading from  Python Real-World Projects

Product typeBook
Published inSep 2023
PublisherPackt
ISBN-139781803246765
Edition1st Edition
Right arrow
Author (1)
Steven F. Lott
Steven F. Lott
author image
Steven F. Lott

Steven Lott has been programming since computers were large, expensive, and rare. Working for decades in high tech has given him exposure to a lot of ideas and techniques, some bad, but most are helpful to others. Since the 1990s, Steven has been engaged with Python, crafting an array of indispensable tools and applications. His profound expertise has led him to contribute significantly to Packt Publishing, penning notable titles like "Mastering Object-Oriented," "The Modern Python Cookbook," and "Functional Python Programming." A self-proclaimed technomad, Steven's unconventional lifestyle sees him residing on a boat, often anchored along the vibrant east coast of the US. He tries to live by the words “Don't come home until you have a story.”
Read more about Steven F. Lott

Right arrow

Chapter 16
Project 5.2: Simple Multivariate Statistics

Are variables related? If so what’s the relationship? An analyst tries to answer these two questions. A negative answer — the null hypothesis — doesn’t require too many supporting details. A positive answer, on the other hand, suggests that a model can be defined to describe the relationship. In this chapter, we’ll look at simple correlation and linear regression as two elements of modeling a relationship between variables.

In this chapter, we’ll expand on some skills of data analysis:

  • Use of the built-in statistics library to compute correlation measures and linear regression coefficients.

  • Use of the matplotlib library to create images. This means creating plot images outside a Jupyter Lab environment.

  • Expanding on the base modeling application to add features.

This chapter’s project will expand on earlier projects. Look back at Chapter 13, Project 4.1: Visual Analysis...

16.1 Description

In Chapter 15, Project 5.1: Modeling Base Application we created an application to create a summary document with some core statistics. In that application, we looked at univariate statistics to characterize the data distributions. These statistics included measurements of the location, spread, and shape of a distribution. Functions like mean, median, mode, variance, and standard deviation were emphasized as ways to understand location and spread. The characterization of shape via skewness and kurtosis was left as an extra exercise for you.

The base application from the previous chapter needs to be expanded to include the multivariate statistics and diagrams that are essential for clarifying the relationships among variables. There are a vast number of possible functions to describe the relationships among two variables. See https://www.itl.nist.gov/div898/handbook/pmd/section8/pmd8.htm for some insight into the number of choices available.

We’ll limit ourselves...

16.2 Approach

As with the previous project, this application works in these two distinct parts:

  1. Compute the statistics and create the diagram files.

  2. Create a report file in a simplified markup language from a template with the details interpolated. A tool like Jinja is very helpful for this.

Once the report file in a markup language — like Markdown or RST — is available, then a tool like Pandoc can be used to create an HTML page or a PDF document from the markup file. Using a tool like Pandoc permits quite a bit of flexibility in choosing the final format. It also allows the insertion of style sheets and page templates in a tidy, uniform way.

The LaTeX language as markup provides the most comprehensive capabilities. It is challenging to work with, however. Languages like Markdown and RST are designed to offer fewer, easier-to-use capabilities.

This book is written with LaTeX.

We’ll look at three aspects of this application: the statistical computations...

16.3 Deliverables

This project has the following deliverables:

  • Documentation in the docs folder.

  • Acceptance tests in the tests/features and tests/steps folders.

  • Unit tests for model module classes in the tests folder.

  • Mock objects for the csv_extract module tests that will be part of the unit tests.

  • Unit tests for the csv_extract module components that are in the tests folder.

  • An application to extend the summary written to a TOML file, including figures with diagrams.

  • An application secondary feature to transform the TOML file to an HTML page or PDF file with the summary.

We’ll look at a few of these deliverables in a little more detail. We’ll start with some suggestions for creating the acceptance tests.

16.3.1 Acceptance tests

As we noted in the previous chapter’s section on acceptance testing, Acceptance testing, the output TOML document can be parsed and examined by the Then steps of a scenario. Because we’re looking at Anscombe’s Quartet...

16.4 Summary

In this chapter, we’ve extended the automated analysis and reporting to include more use of the built-in statistics library to compute correlation and linear regression coefficients. We’ve also made use of the matplotlib library to create images that reveal relationships among variables.

The objective of automated reporting is designed to reduce the number of manual steps and avoid places where omissions or errors can lead to unreliable data analysis. Few things are more embarrassing than a presentation that reuses a diagram from the previous period’s data. It’s far too easy to fail to rebuild one important notebook in a series of analysis products.

The level of automation needs to be treated with a great deal of respect. Once a reporting application is built and deployed, it must be actively monitored to be sure it’s working and producing useful, informative results. The analysis job shifts from developing an understanding to monitoring...

16.5 Extras

Here are some ideas for you to add to this project.

16.5.1 Use pandas to compute basic statistics

The pandas package offers a robust set of tools for doing data analysis. The core concept is to create a DataFrame that contains the relevant samples. The pandas package needs to be installed and added to the requirements.txt file.

There are methods for transforming a sequence of SeriesSample objects into a DataFrame. The best approach is often to convert each of the pydantic objects into a dictionary, and build the dataframe from the list of dictionaries.

The idea is something like the following:

import pandas as pd

df = pd.DataFrame([dict(s) for s in series_data])

In this example, the value of series_data is a sequence of SeriesSample instances.

Each column in the resulting dataframe will be one of the variables of the sample. Given this object, methods of the DataFrame object produce useful statistics.

The corr() function, for example, computes the correlation values...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Python Real-World Projects
Published in: Sep 2023Publisher: PacktISBN-13: 9781803246765
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Steven F. Lott

Steven Lott has been programming since computers were large, expensive, and rare. Working for decades in high tech has given him exposure to a lot of ideas and techniques, some bad, but most are helpful to others. Since the 1990s, Steven has been engaged with Python, crafting an array of indispensable tools and applications. His profound expertise has led him to contribute significantly to Packt Publishing, penning notable titles like "Mastering Object-Oriented," "The Modern Python Cookbook," and "Functional Python Programming." A self-proclaimed technomad, Steven's unconventional lifestyle sees him residing on a boat, often anchored along the vibrant east coast of the US. He tries to live by the words “Don't come home until you have a story.”
Read more about Steven F. Lott