You're reading from Python Data Analysis Master Python Analytics with Machine Learning, Deep Learning, GenAI, LLMs, and Data Engineering

Product type Paperback

Published in Jun 2026

Publisher Packt

ISBN-13 9781806022878

Length 766 pages

Edition 4th Edition

Languages

Python

Tools

Plotly

Concepts

Data Analysis

Authors (2):

Avinash Navlani

Cornellius Yudha Wijaya

View More author details

Table of Contents (25) Chapters

Preface

1. Part 1: Foundations for Data Analysis

2. Getting Started with Python Libraries FREE CHAPTER

3. NumPy and Pandas

4. Statistics for Data Insights

5. Linear Algebra

6. Part 2: Exploratory Data Analysis and Data Cleaning

7. Data Visualization

8. Retrieving, Processing, and Storing Data

9. Cleaning Messy Data

10. Time-Series Analysis

11. Part 3: Deep Dive into Machine Learning

12. Supervised Learning: Regression and Classification

13. Unsupervised Learning: Dimensionality Reduction, Clustering, Anomaly Detection

14. Ensemble Methods: Bagging and Boosting Methods

15. Artificial Neural Networks and Deep Learning

16. Part 4: NLP, Image Analytics, and Parallel Computing

17. Analyzing Text Data

18. Analyzing Image Data

19. LLMs and Gen AI

20. Parallel Computing Using Dask, Modin, and Ray

21. Big Data Analytics Using PySpark

22. Unlock Access to the Code Bundle and the PDF Version

Unlock this Book’s Free Benefits in 3 Easy Steps

23. Other Books You May Enjoy

Share Your Thoughts

24. Index

Summary

In this chapter, we showed how parallel computing helps when single-core pandas and scikit-learn become slow or run out of memory. We introduced Dask as both a parallel execution engine and a set of familiar collections, and we explained how the scheduler, workers, and dashboard work together to execute a task graph created by lazy operations. We then applied this foundation through Dask Arrays for numeric workloads, Dask DataFrames for partitioned tabular processing, and Dask Bags for semi-structured data such as JSON Lines.

Next, we demonstrated practical patterns for parallel data loading, preprocessing, and feature preparation at scale, followed by machine learning workflows that either parallelize scikit-learn steps or use dask-ml to train on Dask-backed data. We closed by comparing Dask with Modin and Ray. Modin aims for minimal changes to pandas code, while Ray provides a general runtime for orchestrating many tasks and models with more explicit control over execution...

The rest of the chapter is locked

Tech Concepts

Programming languages

Tech Tools

Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.

Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.

50+ new titles added per month and exclusive early access to books as they are being written.

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at €18.99/month. Cancel anytime

Authors (2)

Avinash Navlani

Avinash Navlani, PhD in Data Science, is a senior data scientist, researcher, and educator with 14 years of experience in data science, including 9 years in industry, 4 years in academia, and 1 year in research. He has developed machine learning models, optimization solutions, NLP systems, scalable data pipelines, and cloud-based MLOps platforms across healthcare, retail, finance, oil & gas, and manufacturing. His expertise includes Python, PySpark, Airflow, Databricks, Azure ML, MLflow, and Data Engineering. A former lecturer and speaker, he is passionate about applying analytics to solve real-world problems.

See other products by Avinash Navlani

Cornellius Yudha Wijaya

Cornellius Yudha Wijaya has over eight years of experience in data science, machine learning, and artificial intelligence. He currently works as a data scientist manager, where he leads AI initiatives, manages team members, and helps drive the development of practical data and AI solutions. Over the course of his career, he has worked across data science, AI product development, and technical education, with experience in building machine learning systems, supporting business decision-making, and making advanced analytics more usable in real-world settings. He has also written extensively on data science, Python, machine learning, and generative AI, with a strong focus on practical learning and applied problem-solving.

See other products by Cornellius Yudha Wijaya