You're reading from Python Real-World Projects

Product typeBook

Published inSep 2023

PublisherPackt

ISBN-139781803246765

Edition1st Edition

Concepts

Programming Language

Author (1)

Steven F. Lott

Chapter 6
Project 2.1: Data Inspection Notebook

We often need to do an ad hoc inspection of source data. In particular, the very first time we acquire new data, we need to see the file to be sure it meets expectations. Additionally, debugging and problem-solving also benefit from ad hoc data inspections. This chapter will guide you through using a Jupyter notebook to survey data and find the structure and domains of the attributes.

The previous chapters have focused on a simple dataset where the data types look like obvious floating-point values. For such a trivial dataset, the inspection isn’t going to be very complicated.

It can help to start with a trivial dataset and focus on the tools and how they work together. For this reason, we’ll continue using relatively small datasets to let you learn about the tools without having the burden of also trying to understand the data.

This chapter’s projects cover how to create and use a Jupyter notebook for data inspection...

6.1 Description

When confronted with raw data acquired from a source application, database, or web API, it’s prudent to inspect the data to be sure it really can be used for the desired analysis. It’s common to find that data doesn’t precisely match the given descriptions. It’s also possible to discover that the metadata is out of date or incomplete.

The foundational principle behind this project is the following:

We don’t always know what the actual data looks like.

Data may have errors because source applications have bugs. There could be ”undocumented features,” which are similar to bugs but have better explanations. There may have been actions made by users that have introduced new codes or status flags. For example, an application may have a ”comments” field on an accounts-payable record, and accounting clerks may have invented their own set of coded values, which they put in the last few characters of this field. This...

6.2 Approach

We’ll take some guidance from the C4 model ( https://c4model.com) when looking at our approach.

Context: For this project, the context diagram has two use cases: acquire and inspect
Containers: There’s one container for the various applications: the user’s personal computer
Components: There are two significantly different collections of software components: the acquisition program and inspection notebooks
Code: We’ll touch on this to provide some suggested directions

A context diagram for this application is shown in Figure 6.1.

The data analyst will use the CLI to run the data acquisition program. Then, the analyst will use the CLI to start a Jupyter Lab server. Using a browser, the analyst can then use Jupyter Lab to inspect the data.

The components fall into two overall categories. The component diagram is shown in Figure 6.2.

The diagram shows the interfaces...

6.3 Deliverables

This project has the following deliverables:

A pyproject.toml file that identifies the tools used. For this book, we used jupyterlab==3.5.3. Note that while the book was being prepared for publication, version 4.0 was released. This ongoing evolution of components makes it important for you to find the latest version, not the version quoted here.
Documentation in the docs folder.
Unit tests for any new application modules in the tests folder.
Any new application modules in the src folder with code to be used by the inspection notebook.
A notebook to inspect the raw data acquired from any of the sources.

The project directory structure suggested in Chapter 1, Project Zero: A Template for Other Projects mentions a notebooks directory. See List of deliverables for more information. Previous chapters haven’t used any notebooks, so this directory might not have been created in the first place. For this project, the snotebooks directory is needed.

Let’...

6.4 Summary

This chapter’s project covered the basics of creating and using a Jupyter Lab notebook for data inspection. This permits tremendous flexibility, something often required when looking at new data for the first time.

We also looked at adding doctest examples to functions and running the doctest tool in the last cell of a notebook. This lets us validate that the code in the notebook is very likely to work properly.

Now that we’ve got an initial inspection notebook, we can start to consider the specific kinds of data being acquired. In the next chapter, we’ll add features to this notebook.

6.5 Extras

Here are some ideas for you to add to this project.

6.5.1 Use pandas to examine data

A common tool for interactive data exploration is the pandas package.

See https://pandas.pydata.org for more information.

Also, see https://www.packtpub.com/product/learning-pandas/9781783985128 for resources for learning more about pandas.

The value of using pandas for examining text may be limited. The real value of pandas is for doing more sophisticated statistical and graphical analysis of the data.

We encourage you to load NDJSON documents using pandas and do some preliminary investigation of the data values.

The rest of the chapter is locked

You have been reading a chapter from

Python Real-World Projects

Published in: Sep 2023Publisher: PacktISBN-13: 9781803246765

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Steven F. Lott

Steven Lott has been programming since computers were large, expensive, and rare. Working for decades in high tech has given him exposure to a lot of ideas and techniques, some bad, but most are helpful to others. Since the 1990s, Steven has been engaged with Python, crafting an array of indispensable tools and applications. His profound expertise has led him to contribute significantly to Packt Publishing, penning notable titles like "Mastering Object-Oriented," "The Modern Python Cookbook," and "Functional Python Programming." A self-proclaimed technomad, Steven's unconventional lifestyle sees him residing on a boat, often anchored along the vibrant east coast of the US. He tries to live by the words “Don't come home until you have a story.”
Read more about Steven F. Lott

Personalised recommendations for you

Based on your interests and search pattern

C++ Programming for Linux Systems

This book covers the essential system programming tools and helps you explore the features of C++20. It emphasizes important details to maintain code quality and tackle everyday challenges of developing software for high performance, optimization, and more.

BookSep 2023288 pages

Expert C++

Discover advanced programming techniques, the latest features of C++17 and C++20, and best practices for memory management, debugging, testing, and large-scale application design with Expert C++. Ideal for experienced developers advancing to proficient programmers and building professional-grade C++ applications.

BookAug 2023604 pages

iOS 17 Programming for Beginners

iOS 17 Programming for Beginners, Eighth Edition is your comprehensive guide to learning the art of iOS app development. Whether you dream of creating the next chart-topping app or simply want to enhance your programming skills, this book is your trusted companion on this exciting journey.

BookOct 2023604 pages4

Developer Career Masterplan

Written by industry experts that have spent the last 20+ years helping developers grow their career path towards senior developer positions and beyond. This book provides a comprehensive guide, sharing examples and stories from their global careers. By the end, you’ll have the knowledge to create a clear career progression plan as a technical professional.

BookSep 2023310 pages

Refactoring with C#

In Refactoring with C#, you’ll explore the process of safely refactoring modern .NET code using Visual Studio features, advanced unit tests, AI assistance, and custom Roslyn analyzers.

BookNov 2023434 pages

Python Real-World Projects

Amplify your developer journey by curating a dynamic project portfolio that outshines traditional resumes. Delve into the Python realm through immersive projects, mastering core concepts while constructing comprehensive modules and applications. From data acquisition prowess to impactful data visualization, Python Real-World Projects arms you with essential skills to beat the competition.

BookSep 2023478 pages5

The MVVM Pattern in .NET MAUI

The MVVM Pattern in .NET MAUI enables developers to master MVVM principles and effectively apply them to .NET MAUI. This book uses real-life examples and covers complex problems to help you successfully apply MVVM with .NET MAUI to confidently develop robust and high-performing cross-platform apps.

BookNov 2023386 pages

Extending Microsoft Business Central with Power Platform

Extending Business Central with the Power Platform is a step-by-step guide for Business Central professionals to create solutions that automate business processes, explain complex workflow approvals, and integrate with hundreds of other systems, without traditional development. It’ll guide you in customizing Business Central with Power Platform.

BookAug 2023458 pages5

Extending Microsoft Business Central with Power Platform

Extending Business Central with the Power Platform is a step-by-step guide for Business Central professionals to create solutions that automate business processes, explain complex workflow approvals, and integrate with hundreds of other systems, without traditional development. It’ll guide you in customizing Business Central with Power Platform.

BookAug 2023458 pages5

Quantum Computing Algorithms

The book emphasizes intuitive ideas behind quantum algorithms in ways that other books don’t cover, striking a careful balance between no math and too much math. To get the most from this book, you should be comfortable with basic algebra and writing simple computer code. No prior understanding of quantum physics is needed to get started.

BookSep 2023342 pages

Python – Complete Python, Django, Data Science and ML Guide

Unlock Python's full potential with this 50+ hour course! From programming to web and game development, data manipulation, and machine learning, gain the skills required to succeed in various Python-related careers. With practical tasks, hands-on experience, and a strong foundation in Python, you'll be ready to tackle real-world challenges and take advantage of the many opportunities this versatile language offers.

VideoNov 202350 hours 30 minutes5

Python – Complete Python, Django, Data Science and ML Guide

Unlock Python's full potential with this 50+ hour course! From programming to web and game development, data manipulation, and machine learning, gain the skills required to succeed in various Python-related careers. With practical tasks, hands-on experience, and a strong foundation in Python, you'll be ready to tackle real-world challenges and take advantage of the many opportunities this versatile language offers.

VideoNov 202350 hours 30 minutes5

You're reading from Python Real-World Projects

Chapter 6 Project 2.1: Data Inspection Notebook

6.1 Description

6.2 Approach

6.3 Deliverables

6.4 Summary

6.5 Extras

6.5.1 Use pandas to examine data

Unlock this book and the full library FREE for 7 days

Author (1)

C++ Programming for Linux Systems

This book covers the essential system programming tools and helps you explore the features of C++20. It emphasizes important details to maintain code quality and tackle everyday challenges of developing software for high performance, optimization, and more.

Expert C++

iOS 17 Programming for Beginners

iOS 17 Programming for Beginners, Eighth Edition is your comprehensive guide to learning the art of iOS app development. Whether you dream of creating the next chart-topping app or simply want to enhance your programming skills, this book is your trusted companion on this exciting journey.

Developer Career Masterplan

Refactoring with C#

In Refactoring with C#, you’ll explore the process of safely refactoring modern .NET code using Visual Studio features, advanced unit tests, AI assistance, and custom Roslyn analyzers.

Python Real-World Projects

The MVVM Pattern in .NET MAUI

The MVVM Pattern in .NET MAUI enables developers to master MVVM principles and effectively apply them to .NET MAUI. This book uses real-life examples and covers complex problems to help you successfully apply MVVM with .NET MAUI to confidently develop robust and high-performing cross-platform apps.

Extending Microsoft Business Central with Power Platform

Extending Microsoft Business Central with Power Platform

Quantum Computing Algorithms

Python – Complete Python, Django, Data Science and ML Guide

Python – Complete Python, Django, Data Science and ML Guide

Chapter 6
Project 2.1: Data Inspection Notebook