You're reading from Applied Computational Thinking with Python Algorithm design for complex real-world problems

Product type Paperback

Published in Dec 2023

Publisher Packt

ISBN-13 9781837632305

Length 438 pages

Edition 2nd Edition

Languages

Python

Concepts

Programming Language

Authors (2):

Sofía De Jesús

Martinez

View More author details

Table of Contents (25) Chapters

Preface

1. Part 1: An Introduction to Computational Thinking

2. Chapter 1: Fundamentals of Computer Science FREE CHAPTER

3. Chapter 2: Elements of Computational Thinking

4. Chapter 3: Understanding Algorithms and Algorithmic Thinking

5. Chapter 4: Understanding Logical Reasoning

6. Chapter 5: Errors

7. Chapter 6: Exploring Problem Analysis

8. Chapter 7: Designing Solutions and Solution Processes

9. Chapter 8: Identifying Challenges within Solutions

10. Part 2: Applying Python and Computational Thinking

11. Chapter 9: Introduction to Python

12. Chapter 10: Understanding Input and Output to Design a Solution Algorithm

13. Chapter 11: Control Flow

14. Chapter 12: Using Computational Thinking and Python in Simple Challenges

15. Chapter 13: Debugging

16. Part 3: Data Processing, Analysis, and Applications Using Computational Thinking and Python

17. Chapter 14: Using Python in Experimental and Data Analysis Problems

18. Chapter 15: Introduction to Machine Learning

19. Chapter 16: Using Computational Thinking and Python in Statistical Analysis

20. Chapter 17: Applied Computational Thinking Problems

21. Chapter 18: Advanced Applied Computational Thinking Problems

22. Chapter 19: Integrating Python with Amazon Web Services (AWS)

23. Index

Why subscribe?

24. Other Books You May Enjoy

Preprocessing data

Preprocessing data is a technique that transforms raw data into a usable and efficient format. It is, in fact, the most important step in the data mining and machine learning (ML) process.

When we are preprocessing data, we are really cleaning it, transforming it, or doing a data reduction. In this section, we will take a look at what these all mean.

Data cleaning

Data cleaning refers to the process of making our dataset more efficient. If we go through data cleaning in really large datasets, we can expedite the algorithm, avoid errors, and get better results. There are a few things we deal with when data cleaning:

Missing data: Address this by removing, imputing, or using domain-specific methods to handle missing values
Duplicate data: Detect and remove duplicates to ensure each observation is unique
Data types: Use appropriate functions to convert data types as needed
Noisy data: This can be fixed/improved by using binning, regression...