Reader small image

You're reading from  Simplifying Data Engineering and Analytics with Delta

Product typeBook
Published inJul 2022
PublisherPackt
ISBN-139781801814867
Edition1st Edition
Concepts
Right arrow
Author (1)
Anindita Mahapatra
Anindita Mahapatra
author image
Anindita Mahapatra

Anindita Mahapatra is a Solutions Architect at Databricks in the data and AI space helping clients across all industry verticals reap value from their data infrastructure investments. She teaches a data engineering and analytics course at Harvard University as part of their extension school program. She has extensive big data and Hadoop consulting experience from Thinkbig/Teradata prior to which she was managing development of algorithmic app discovery and promotion for both Nokia and Microsoft AppStores. She holds a Masters degree in Liberal Arts and Management from Harvard Extension School, a Masters in Computer Science from Boston University and a Bachelors in Computer Science from BITS Pilani, India.
Read more about Anindita Mahapatra

Right arrow

Optimizing with Delta

Delta's support for ACID transactions and quality guarantees helps ensure data reliability, thereby reducing superfluous validation steps and shortening the end-to-end time. This involves less downtime and triage cycles. Delta's support of fine-grained updates, deletes, and merges applies at a file level instead of to the entire partition, leading to less data manipulation and faster operations. This also leads to fewer compute resources, leading to cost savings.

Changing the data layout in storage

Optimizing the layout of the data can help speed up query performance, and there are two ways to do so, namely the following:

  • Compaction, also known as bin-packing
    • Here, lots of smaller files are combined into fewer large ones.
    • Depending on how many files are involved, this can be an expensive operation and it is a good idea to run it either during off-peak hours or on a separate cluster from the main pipeline to avoid unnecessary delays to the...
lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Simplifying Data Engineering and Analytics with Delta
Published in: Jul 2022Publisher: PacktISBN-13: 9781801814867

Author (1)

author image
Anindita Mahapatra

Anindita Mahapatra is a Solutions Architect at Databricks in the data and AI space helping clients across all industry verticals reap value from their data infrastructure investments. She teaches a data engineering and analytics course at Harvard University as part of their extension school program. She has extensive big data and Hadoop consulting experience from Thinkbig/Teradata prior to which she was managing development of algorithmic app discovery and promotion for both Nokia and Microsoft AppStores. She holds a Masters degree in Liberal Arts and Management from Harvard Extension School, a Masters in Computer Science from Boston University and a Bachelors in Computer Science from BITS Pilani, India.
Read more about Anindita Mahapatra