Reader small image

You're reading from  Python Real-World Projects

Product typeBook
Published inSep 2023
PublisherPackt
ISBN-139781803246765
Edition1st Edition
Right arrow
Author (1)
Steven F. Lott
Steven F. Lott
author image
Steven F. Lott

Steven Lott has been programming since computers were large, expensive, and rare. Working for decades in high tech has given him exposure to a lot of ideas and techniques, some bad, but most are helpful to others. Since the 1990s, Steven has been engaged with Python, crafting an array of indispensable tools and applications. His profound expertise has led him to contribute significantly to Packt Publishing, penning notable titles like "Mastering Object-Oriented," "The Modern Python Cookbook," and "Functional Python Programming." A self-proclaimed technomad, Steven's unconventional lifestyle sees him residing on a boat, often anchored along the vibrant east coast of the US. He tries to live by the words “Don't come home until you have a story.”
Read more about Steven F. Lott

Right arrow

2.8 Summary

This data analysis pipeline moves data from sources through a series of stages to create clean, valid, standardized data. The general flow supports a variety of needs and permits a great deal of customization and extension.

For developers with an interest in data science or machine learning, these projects cover what is sometimes called the ”data wrangling” part of data science or machine learning. It can be a significant complication as data is understood and differences among data sources are resolved and explored. These are the — sometimes difficult — preparatory steps prior to building a model that can be used for AI decision-making.

For readers with an interest in the web, this kind of data processing and extraction is part of presenting data via a web application API or website. Project 3.7 creates a web server, and will be of particular interest. Because the web service requires clean data, the preceding projects are helpful for creating data that can be published.

For folks with an automation or IoT interest, Part 2 explains how to use Jupyter Notebooks to gather and inspect data. This is a common need, and the various steps to clean, validate, and standardize data become all the more important when dealing with real-world devices subject to the vagaries of temperature and voltage.

We’ve looked at the following multi-stage approach to doing data analysis:

  • Data Acquisition

  • Inspection of Data

  • Clean, Validate, Standardize, and Persist

  • Summarize and Analyze

  • Create a Statistical Model

This pipeline follows the Extract-Transform-Load (ETL) concept. The terms have been changed because the legacy words are sometimes misleading. Our acquisition stage overlaps with what is understood as the ”Extract” operation. For some developers, Extract is limited to database extracts; we’d like to go beyond that to include other data source transformations. Our cleaning, validating, and standardizing stages are usually combined into the ”Transform” operation. Saving the clean data is generally the objective of ”Load”; we’re not emphasizing a database load, but instead, we’ll use files.

Throughout the book, we’ll describe each project’s objective and provide the foundation of a sound technical approach. The details of the implementation are up to you. We’ll enumerate the deliverables; this may repeat some of the information from Chapter 1, Project Zero: A Template for Other Projects. The book provides a great deal of information on acceptance test cases and unit test cases — the definition of done. By covering the approach, we’ve left room for you to design and implement the needed application software.

In the next chapter, we’ll build the first data acquisition project. This will work with CSV-format files. Later projects will work with database extracts and web services.

Previous PageNext Chapter
You have been reading a chapter from
Python Real-World Projects
Published in: Sep 2023Publisher: PacktISBN-13: 9781803246765
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Steven F. Lott

Steven Lott has been programming since computers were large, expensive, and rare. Working for decades in high tech has given him exposure to a lot of ideas and techniques, some bad, but most are helpful to others. Since the 1990s, Steven has been engaged with Python, crafting an array of indispensable tools and applications. His profound expertise has led him to contribute significantly to Packt Publishing, penning notable titles like "Mastering Object-Oriented," "The Modern Python Cookbook," and "Functional Python Programming." A self-proclaimed technomad, Steven's unconventional lifestyle sees him residing on a boat, often anchored along the vibrant east coast of the US. He tries to live by the words “Don't come home until you have a story.”
Read more about Steven F. Lott