Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Events
Videos
Audiobooks
Packt Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Arrow up icon
GO TO TOP
Python Data Analysis

You're reading from   Python Data Analysis Master Python Analytics with Machine Learning, Deep Learning, GenAI, LLMs, and Data Engineering

Arrow left icon
Product type Paperback
Published in Jun 2026
Publisher Packt
ISBN-13 9781806022878
Length 766 pages
Edition 4th Edition
Languages
Tools
Arrow right icon
Authors (2):
Arrow left icon
Avinash Navlani Avinash Navlani
Author Profile Icon Avinash Navlani
Avinash Navlani
Cornellius Yudha Wijaya Cornellius Yudha Wijaya
Author Profile Icon Cornellius Yudha Wijaya
Cornellius Yudha Wijaya
Arrow right icon
View More author details
Toc

Table of Contents (25) Chapters Close

Preface 1. Part 1: Foundations for Data Analysis FREE CHAPTER
2. Getting Started with Python Libraries 3. NumPy and Pandas 4. Statistics for Data Insights 5. Linear Algebra 6. Part 2: Exploratory Data Analysis and Data Cleaning
7. Data Visualization 8. Retrieving, Processing, and Storing Data 9. Cleaning Messy Data 10. Time-Series Analysis 11. Part 3: Deep Dive into Machine Learning
12. Supervised Learning: Regression and Classification 13. Unsupervised Learning: Dimensionality Reduction, Clustering, Anomaly Detection 14. Ensemble Methods: Bagging and Boosting Methods 15. Artificial Neural Networks and Deep Learning 16. Part 4: NLP, Image Analytics, and Parallel Computing
17. Analyzing Text Data 18. Analyzing Image Data 19. LLMs and Gen AI 20. Parallel Computing Using Dask, Modin, and Ray 21. Big Data Analytics Using PySpark 22. Unlock Access to the Code Bundle and the PDF Version 23. Other Books You May Enjoy 24. Index

Apache Spark

Generally, Big Data consists of large-scale and complex data that needs to be processed quickly and efficiently. There are various tools, such as Hadoop, Spark, and Flink, for processing large data. Spark allows in-memory computation, which makes it faster and more efficient compared to Hadoop. Spark’s in-memory computation architecture optimizes performance by retaining intermediate results in RAM whenever possible, effectively eliminating the disk-I/O bottlenecks inherent in traditional frameworks and significantly accelerating execution speeds. Because of these features, Spark can be up to 100 times faster than Hadoop in certain workloads. Spark was developed using Scala.

We can say Spark is another Big Data framework that is based on In-Memory processing. Apache Hadoop mainly processes data by reading from and writing to disk, which can slow down the execution because disk operations consume more time. Spark addresses this limitation by keeping intermediate...

lock icon The rest of the chapter is locked
CONTINUE READING
83
Tech Concepts
36
Programming languages
73
Tech Tools
Icon Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.
Icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Icon 50+ new titles added per month and exclusive early access to books as they are being written.
Python Data Analysis
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €18.99/month. Cancel anytime
Modal Close icon
Modal Close icon