Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Python Real-World Projects

You're reading from  Python Real-World Projects

Product type Book
Published in Sep 2023
Publisher Packt
ISBN-13 9781803246765
Pages 478 pages
Edition 1st Edition
Languages
Author (1):
Steven F. Lott Steven F. Lott
Profile icon Steven F. Lott

Table of Contents (20) Chapters

Preface 1. Chapter 1: Project Zero: A Template for Other Projects 2. Chapter 2: Overview of the Projects 3. Chapter 3: Project 1.1: Data Acquisition Base Application 4. Chapter 4: Data Acquisition Features: Web APIs and Scraping 5. Chapter 5: Data Acquisition Features: SQL Database 6. Chapter 6: Project 2.1: Data Inspection Notebook 7. Chapter 7: Data Inspection Features 8. Chapter 8: Project 2.5: Schema and Metadata 9. Chapter 9: Project 3.1: Data Cleaning Base Application 10. Chapter 10: Data Cleaning Features 11. Chapter 11: Project 3.7: Interim Data Persistence 12. Chapter 12: Project 3.8: Integrated Data Acquisition Web Service 13. Chapter 13: Project 4.1: Visual Analysis Techniques 14. Chapter 14: Project 4.2: Creating Reports 15. Chapter 15: Project 5.1: Modeling Base Application 16. Chapter 16: Project 5.2: Simple Multivariate Statistics 17. Chapter 17: Next Steps 18. Other Books You Might Enjoy 19. Index

17.1 Overall data wrangling

The applications and notebooks are designed around the following multi-stage architecture:

  • Data acquisition

  • Inspection of data

  • Cleaning data; this includes validating, converting, standardizing, and saving intermediate results

  • Summarizing, and the start of modeling data

  • Creating deeper analysis and more sophisticated statistical models

The stages fit together as shown in Figure 17.1.

Figure 17.1: Data Analysis Pipeline
Figure 17.1: Data Analysis Pipeline

The last step in this pipeline isn’t — of course — final. In many cases, the project evolves from exploration to monitoring and maintenance. There will be a long tail where the model continues to be confirmed. Some enterprise management oversight is an essential part of this ongoing confirmation.

In some cases, the long tail is interrupted by a change. This may be reflected by a model’s inaccuracy. There may be a failure to pass basic statistical tests. Uncovering the change and the reasons for change is...

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}