Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Essential PySpark for Scalable Data Analytics

You're reading from  Essential PySpark for Scalable Data Analytics

Product type Book
Published in Oct 2021
Publisher Packt
ISBN-13 9781800568877
Pages 322 pages
Edition 1st Edition
Languages
Concepts
Author (1):
Sreeram Nudurupati Sreeram Nudurupati
Profile icon Sreeram Nudurupati

Table of Contents (19) Chapters

Preface Section 1: Data Engineering
Chapter 1: Distributed Computing Primer Chapter 2: Data Ingestion Chapter 3: Data Cleansing and Integration Chapter 4: Real-Time Data Analytics Section 2: Data Science
Chapter 5: Scalable Machine Learning with PySpark Chapter 6: Feature Engineering – Extraction, Transformation, and Selection Chapter 7: Supervised Machine Learning Chapter 8: Unsupervised Machine Learning Chapter 9: Machine Learning Life Cycle Management Chapter 10: Scaling Out Single-Node Machine Learning Using PySpark Section 3: Data Analysis
Chapter 11: Data Visualization with PySpark Chapter 12: Spark SQL Primer Chapter 13: Integrating External Tools with Spark SQL Chapter 14: The Data Lakehouse Other Books You May Enjoy

Building analytical data stores using cloud data lakes

In this section, you will explore the advantages afforded by cloud-based data lakes for big data analytics systems, and then understand some of the challenges facing big data analytics systems while leveraging cloud-based data analytics systems. You will also write a few PySpark code examples to experience these challenges first-hand.

Challenges with cloud data lakes

Cloud-based data lakes offer unlimited, scalable, and relatively inexpensive data storage. They are offered as managed services by the individual cloud providers and offer availability, scalability, efficiency, and lower total cost of ownership. This helps organizations accelerate their digital innovation and achieve faster time to market. However, cloud data lakes are object storages that evolved primarily to solve the problem of storage scalability. They weren't designed to store highly structured, strongly typed, analytical data. Given this, there are...

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}