Reader small image

You're reading from  Python Real-World Projects

Product typeBook
Published inSep 2023
PublisherPackt
ISBN-139781803246765
Edition1st Edition
Right arrow
Author (1)
Steven F. Lott
Steven F. Lott
author image
Steven F. Lott

Steven Lott has been programming since computers were large, expensive, and rare. Working for decades in high tech has given him exposure to a lot of ideas and techniques, some bad, but most are helpful to others. Since the 1990s, Steven has been engaged with Python, crafting an array of indispensable tools and applications. His profound expertise has led him to contribute significantly to Packt Publishing, penning notable titles like "Mastering Object-Oriented," "The Modern Python Cookbook," and "Functional Python Programming." A self-proclaimed technomad, Steven's unconventional lifestyle sees him residing on a boat, often anchored along the vibrant east coast of the US. He tries to live by the words “Don't come home until you have a story.”
Read more about Steven F. Lott

Right arrow

Chapter 17
Next Steps

The journey from raw data to useful information has only begun. There are often many more steps to getting insights that can be used to support enterprise decision-making. From here, the reader needs to take the initiative to extend these projects, or consider other projects. Some readers will want to demonstrate their grasp of Python while others will go more deeply into the area of exploratory data analysis.

Python is used for so many different things that it seems difficult to even suggest a direction for deeper understanding of the language, the libraries, and the various ways Python is used.

In this chapter, we’ll touch on a few more topics related to exploratory data analysis. The projects in this book are only a tiny fraction of the kinds of problems that need to be solved on a daily basis.

Every analyst needs to balance the time between understanding the enterprise data being processed, searching for better ways to model the data, and effective...

17.1 Overall data wrangling

The applications and notebooks are designed around the following multi-stage architecture:

  • Data acquisition

  • Inspection of data

  • Cleaning data; this includes validating, converting, standardizing, and saving intermediate results

  • Summarizing, and the start of modeling data

  • Creating deeper analysis and more sophisticated statistical models

The stages fit together as shown in Figure 17.1.

Figure 17.1: Data Analysis Pipeline
Figure 17.1: Data Analysis Pipeline

The last step in this pipeline isn’t — of course — final. In many cases, the project evolves from exploration to monitoring and maintenance. There will be a long tail where the model continues to be confirmed. Some enterprise management oversight is an essential part of this ongoing confirmation.

In some cases, the long tail is interrupted by a change. This may be reflected by a model’s inaccuracy. There may be a failure to pass basic statistical tests. Uncovering the change and the reasons for change is...

17.2 The concept of “decision support”

The core concept behind all data processing, including analytics and modeling, is to help some person make a decision. Ideally, a good decision will be based on sound data.

In many cases, decisions are made by software. Sometimes the decisions are simple rules that identify bad data, incomplete processes, or invalid actions. In other cases, the decisions are more nuanced, and we apply the term “artificial intelligence” to the software making the decision.

While many kinds of software applications make many automated decisions, a person is still — ultimately — responsible for those decisions being correct and consistent. This responsibility may be implemented as a person reviewing a periodic summary of decisions made.

This responsible stakeholder needs to understand the number and types of decisions being made by application software. They need to confirm the automated decisions reflect sound data as well...

17.3 Concept of metadata and provenance

The description of a dataset includes three important aspects:

  • The syntax or physical format and logical layout of the data

  • The semantics, or meaning, of the data

  • The provenance, or the origin and transformations applied to the data

The physical format of a dataset is often summarized using the name of a well-known file format. For example, the data may be in CSV format. The order of columns in a CSV file may change, leading to a need to have headings or some metadata describing the logical layout of the columns within a CSV file.

Much of this information can be enumerated in JSON schema definitions.

In some cases, the metadata might be yet another CSV file that has column numbers, preferred data types, and column names. We might have a secondary CSV file that looks like the following example:

1,height,height in inches
2,weight,weight in pounds
3,price,price in dollars

This metadata information describes the contents of a separate CSV file with...

17.4 Next steps toward machine learning

We can draw a rough boundary between statistical modeling and machine learning. This is a hot topic of debate because — viewed from a suitable distance — all statistical modeling can be described as machine learning.

In this book, we’ve drawn a boundary to distinguish methods based on algorithms that are finite, definite, and effective. For example, the process of using the linear least squares technique to find a function that matches data is generally reproducible with an exact closed-form answer that doesn’t require tuning hyperparameters.

Even within our narrow domain of “statistical modeling,” we can encounter data sets for which linear least squares don’t behave well. One notable assumption of the least squares estimates, for example, is that the independent variables are all known exactly. If the x values are subject to observational error, a more sophisticated approach is required.

The...

Why subscribe?

[nosep]Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals Improve your learning with Skill Plans built especially for you Get a free eBook or video every month Fully searchable for easy access to vital information Copy and paste, print, and bookmark content

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at packt.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at customercare@packtpub.com for more details.

At www.packtpub.com, you can also read a collection of free technical articles, sign up for a range of free

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Python Real-World Projects
Published in: Sep 2023Publisher: PacktISBN-13: 9781803246765
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Steven F. Lott

Steven Lott has been programming since computers were large, expensive, and rare. Working for decades in high tech has given him exposure to a lot of ideas and techniques, some bad, but most are helpful to others. Since the 1990s, Steven has been engaged with Python, crafting an array of indispensable tools and applications. His profound expertise has led him to contribute significantly to Packt Publishing, penning notable titles like "Mastering Object-Oriented," "The Modern Python Cookbook," and "Functional Python Programming." A self-proclaimed technomad, Steven's unconventional lifestyle sees him residing on a boat, often anchored along the vibrant east coast of the US. He tries to live by the words “Don't come home until you have a story.”
Read more about Steven F. Lott