You're reading from Python Data Analysis Master Python Analytics with Machine Learning, Deep Learning, GenAI, LLMs, and Data Engineering

Product type Paperback

Published in Jun 2026

Publisher Packt

ISBN-13 9781806022878

Length 766 pages

Edition 4th Edition

Languages

Python

Tools

Plotly

Concepts

Data Analysis

Authors (2):

Avinash Navlani

Cornellius Yudha Wijaya

View More author details

Table of Contents (25) Chapters

Preface

1. Part 1: Foundations for Data Analysis FREE CHAPTER

2. Getting Started with Python Libraries

3. NumPy and Pandas

4. Statistics for Data Insights

5. Linear Algebra

6. Part 2: Exploratory Data Analysis and Data Cleaning

7. Data Visualization

8. Retrieving, Processing, and Storing Data

9. Cleaning Messy Data

10. Time-Series Analysis

11. Part 3: Deep Dive into Machine Learning

12. Supervised Learning: Regression and Classification

13. Unsupervised Learning: Dimensionality Reduction, Clustering, Anomaly Detection

14. Ensemble Methods: Bagging and Boosting Methods

15. Artificial Neural Networks and Deep Learning

16. Part 4: NLP, Image Analytics, and Parallel Computing

17. Analyzing Text Data

18. Analyzing Image Data

19. LLMs and Gen AI

20. Parallel Computing Using Dask, Modin, and Ray

21. Big Data Analytics Using PySpark

22. Unlock Access to the Code Bundle and the PDF Version

Unlock this Book’s Free Benefits in 3 Easy Steps

23. Other Books You May Enjoy

Share Your Thoughts

24. Index

Apache Spark

Generally, Big Data consists of large-scale and complex data that needs to be processed quickly and efficiently. There are various tools, such as Hadoop, Spark, and Flink, for processing large data. Spark allows in-memory computation, which makes it faster and more efficient compared to Hadoop. Spark’s in-memory computation architecture optimizes performance by retaining intermediate results in RAM whenever possible, effectively eliminating the disk-I/O bottlenecks inherent in traditional frameworks and significantly accelerating execution speeds. Because of these features, Spark can be up to 100 times faster than Hadoop in certain workloads. Spark was developed using Scala.

We can say Spark is another Big Data framework that is based on In-Memory processing. Apache Hadoop mainly processes data by reading from and writing to disk, which can slow down the execution because disk operations consume more time. Spark addresses this limitation by keeping intermediate...

The rest of the chapter is locked

Tech Concepts

Programming languages

Tech Tools

Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.

Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.

50+ new titles added per month and exclusive early access to books as they are being written.

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at €18.99/month. Cancel anytime

Authors (2)

Avinash Navlani

Avinash Navlani, PhD in Data Science, is a senior data scientist, researcher, and educator with 14 years of experience in data science, including 9 years in industry, 4 years in academia, and 1 year in research. He has developed machine learning models, optimization solutions, NLP systems, scalable data pipelines, and cloud-based MLOps platforms across healthcare, retail, finance, oil & gas, and manufacturing. His expertise includes Python, PySpark, Airflow, Databricks, Azure ML, MLflow, and Data Engineering. A former lecturer and speaker, he is passionate about applying analytics to solve real-world problems.

See other products by Avinash Navlani

Cornellius Yudha Wijaya

Cornellius Yudha Wijaya has over eight years of experience in data science, machine learning, and artificial intelligence. He currently works as a data scientist manager, where he leads AI initiatives, manages team members, and helps drive the development of practical data and AI solutions. Over the course of his career, he has worked across data science, AI product development, and technical education, with experience in building machine learning systems, supporting business decision-making, and making advanced analytics more usable in real-world settings. He has also written extensively on data science, Python, machine learning, and generative AI, with a strong focus on practical learning and applied problem-solving.

See other products by Cornellius Yudha Wijaya