Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Optimizing Databricks Workloads

You're reading from  Optimizing Databricks Workloads

Product type Book
Published in Dec 2021
Publisher Packt
ISBN-13 9781801819077
Pages 230 pages
Edition 1st Edition
Languages
Authors (3):
Anirudh Kala Anirudh Kala
Profile icon Anirudh Kala
Anshul Bhatnagar Anshul Bhatnagar
Profile icon Anshul Bhatnagar
Sarthak Sarbahi Sarthak Sarbahi
Profile icon Sarthak Sarbahi
View More author details

Table of Contents (13) Chapters

Preface Section 1: Introduction to Azure Databricks
Chapter 1: Discovering Databricks Chapter 2: Batch and Real-Time Processing in Databricks Chapter 3: Learning about Machine Learning and Graph Processing in Databricks Section 2: Optimization Techniques
Chapter 4: Managing Spark Clusters Chapter 5: Big Data Analytics Chapter 6: Databricks Delta Lake Chapter 7: Spark Core Section 3: Real-World Scenarios
Chapter 8: Case Studies Other Books You May Enjoy

Learning about Apache Arrow in Pandas

Apache Arrow is an in-memory columnar data format that helps to efficiently store data between clustered Java Virtual Machines (JVMs) and Python processes. This is highly beneficial for data scientists working with Pandas and NumPy in Databricks. Apache Arrow does not produce different results in terms of the data. It is helpful when we are converting Spark DataFrames to Pandas DataFrames, and vice versa. Let's try to better understand the utility of Apache Arrow with an analogy.

Let's say you were traveling to Europe before the establishment of the European Union (EU). To visit 10 countries in 7 days, you would have has to spend some time at every border for passport control, and money would have always been lost due to currency exchange. Similarly, without using Apache Arrow, inefficiencies exist due to serialization and deserialization processes wasting memory and CPU resources (such as converting a Spark DataFrame to a Pandas DataFrame...

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}