Reader small image

You're reading from  Data Engineering with Apache Spark, Delta Lake, and Lakehouse

Product typeBook
Published inOct 2021
PublisherPackt
ISBN-139781801077743
Edition1st Edition
Right arrow
Author (1)
Manoj Kukreja
Manoj Kukreja
author image
Manoj Kukreja

Manoj Kukreja is a Principal Architect at Northbay Solutions who specializes in creating complex Data Lakes and Data Analytics Pipelines for large-scale organizations such as banks, insurance companies, universities, and US/Canadian government agencies. Previously, he worked for Pythian, a large managed service provider where he was leading the MySQL and MongoDB DBA group and supporting large-scale data infrastructure for enterprises across the globe. With over 25 years of IT experience, he has delivered Data Lake solutions using all major cloud providers including AWS, Azure, GCP, and Alibaba Cloud. On weekends, he trains groups of aspiring Data Engineers and Data Scientists on Hadoop, Spark, Kafka and Data Analytics on AWS and Azure Cloud.
Read more about Manoj Kukreja

Right arrow

Introducing data lakes

Over the last few years, the markers for effective data engineering and data analytics have shifted. Up to now, organizational data has been dispersed over several internal systems (silos), each system performing analytics over its own dataset.

Additionally, it has been difficult to interface with external datasets for extending the spectrum of analytic workloads. As a result, it has been difficult for these organizations to get a unified view of their data and gain global insights.

In a world where organizations are seeking revenue diversification by fine-tuning existing processes and generating organic growth, a globally unified repository of data has become a core necessity. Data lakes solve this need by providing a unified view of data into the hands of users who can use this data to devise innovative techniques for the betterment of mankind.

The following diagram outlines the characteristics of a data lake:

Figure 2.1 – Characteristics of a data lake

Figure 2...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Data Engineering with Apache Spark, Delta Lake, and Lakehouse
Published in: Oct 2021Publisher: PacktISBN-13: 9781801077743

Author (1)

author image
Manoj Kukreja

Manoj Kukreja is a Principal Architect at Northbay Solutions who specializes in creating complex Data Lakes and Data Analytics Pipelines for large-scale organizations such as banks, insurance companies, universities, and US/Canadian government agencies. Previously, he worked for Pythian, a large managed service provider where he was leading the MySQL and MongoDB DBA group and supporting large-scale data infrastructure for enterprises across the globe. With over 25 years of IT experience, he has delivered Data Lake solutions using all major cloud providers including AWS, Azure, GCP, and Alibaba Cloud. On weekends, he trains groups of aspiring Data Engineers and Data Scientists on Hadoop, Spark, Kafka and Data Analytics on AWS and Azure Cloud.
Read more about Manoj Kukreja