Reader small image

You're reading from  Simplifying Data Engineering and Analytics with Delta

Product typeBook
Published inJul 2022
PublisherPackt
ISBN-139781801814867
Edition1st Edition
Concepts
Right arrow
Author (1)
Anindita Mahapatra
Anindita Mahapatra
author image
Anindita Mahapatra

Anindita Mahapatra is a Solutions Architect at Databricks in the data and AI space helping clients across all industry verticals reap value from their data infrastructure investments. She teaches a data engineering and analytics course at Harvard University as part of their extension school program. She has extensive big data and Hadoop consulting experience from Thinkbig/Teradata prior to which she was managing development of algorithmic app discovery and promotion for both Nokia and Microsoft AppStores. She holds a Masters degree in Liberal Arts and Management from Harvard Extension School, a Masters in Computer Science from Boston University and a Bachelors in Computer Science from BITS Pilani, India.
Read more about Anindita Mahapatra

Right arrow

Summary

As data grows exponentially over time, query performance is an important ask from all stakeholders. Delta is based on the columnar Parquet format, which is highly compressible, consuming less storage and memory and automatically creating and maintaining indices on data. Data skipping helps with getting faster access to data and is achieved by maintaining file statistics so that only the relevant files are read, avoiding full scans. Delta caching improves the performance of common queries that repeat. optimize compacts smaller files and zorder colocates relevant details that are usually queried together, leading to fewer file reads.

The Delta architecture pattern has empowered data engineers not only by simplifying a lot of their daily activities but also by also improving the query performance for data analysts who consume the hard work and output produced by these upstream data engineers. In this chapter, we looked at some common techniques to apply to our Delta tables...

lock icon
The rest of the page is locked
Previous PageNext Chapter
You have been reading a chapter from
Simplifying Data Engineering and Analytics with Delta
Published in: Jul 2022Publisher: PacktISBN-13: 9781801814867

Author (1)

author image
Anindita Mahapatra

Anindita Mahapatra is a Solutions Architect at Databricks in the data and AI space helping clients across all industry verticals reap value from their data infrastructure investments. She teaches a data engineering and analytics course at Harvard University as part of their extension school program. She has extensive big data and Hadoop consulting experience from Thinkbig/Teradata prior to which she was managing development of algorithmic app discovery and promotion for both Nokia and Microsoft AppStores. She holds a Masters degree in Liberal Arts and Management from Harvard Extension School, a Masters in Computer Science from Boston University and a Bachelors in Computer Science from BITS Pilani, India.
Read more about Anindita Mahapatra