Reader small image

You're reading from  Data Lakehouse in Action

Product typeBook
Published inMar 2022
Reading LevelBeginner
PublisherPackt
ISBN-139781801815932
Edition1st Edition
Languages
Tools
Concepts
Right arrow
Author (1)
Pradeep Menon
Pradeep Menon
author image
Pradeep Menon

Pradeep Menon is a seasoned data analytics professional with more than 18 years of experience in data and AI. Pradeep can balance business and technical aspects of any engagement and cross-pollinate complex concepts across many industries and scenarios. Currently, Pradeep works as a data and AI strategist at Microsoft. In this role, he is responsible for driving big data and AI adoption for Microsoft’s strategic customers across Asia. Pradeep is also a distinguished speaker and blogger and has given numerous keynotes on cloud technologies, data, and AI.
Read more about Pradeep Menon

Right arrow

Discovering the enterprise data warehouse era

The Enterprise Data Warehouse (EDW) pattern, popularized by Ralph Kimball and Bill Inmon, was predominant in the 1990s and 2000s. The needs of this era were relatively straightforward (at least compared to the current context). The focus was predominantly on optimizing database structures to satisfy reporting requirements. Analytics was synonymous with reporting. Machine learning was a specialized field and was not ubiquitous in enterprises.

A typical EDW pattern is depicted in the following figure:

Figure 1.1 – A typical EDW pattern

Figure 1.1 – A typical EDW pattern

As shown in Figure 1.1, the pattern entailed source systems composed of databases or flat-file structures. The data sources are predominantly structured, that is, rows and columns. A process called Extract-Transform-Load (ETL) first extracts the data from the source systems. Then, the process transforms the data into a shape and form that is conducive for analysis. Once the data is transformed, it is loaded into an EDW. From there, the subsets of data are then populated to downstream data marts. Data marts can be conceived of as mini data warehouses that cater to the business requirements of a specific department.

As you can imagine, this pattern primarily was focused on the following:

  • Creating a data structure that is optimized for storage and modeled for reporting
  • Focusing on the reporting requirements of the business
  • Harnessing the structured data into actionable insights

Every coin has two sides. The EDW pattern is not an exception. It has its pros and it has its cons. This pattern has survived the test of time. It was widespread and well adopted because of the following key advantages:

  • Since most of the analytical requirements were related to reporting, this pattern effectively addressed many organizations' reporting requirements.
  • Large enterprise data models were able to structure an organization's data into logical and physical models. This pattern gave a structure to manage the organization's data in a modular and efficient manner.
  • Since this pattern catered only to structured data, the technology required to harness structured data was evolved and readily available. Relational Database Management Systems (RDBMSes) evolved and were juxtaposed appropriately to harness its features for reporting.

However, it also had its own set of challenges that surfaced as the data volumes grew and new data formats started emerging. A few challenges associated with the EDW pattern are as follows:

  • This pattern was not as agile as the changing business requirements wanted it to be. Any change in the reporting requirement had to go through a long-winded process of data model changes, ETL code changes, and respective changes to the reporting system. Often, the ETL process was a specialized skill and became a bottleneck for reducing data to insight turnover time. The nature of analytics is unique. The more you see the output, the more you demand. Many EDW projects were deemed a failure. The failure was not from a technical perspective, but from a business perspective. Operationally, the design changes required to cater to these fast-evolving requirements were too difficult to handle.
  • As the data volumes grew, this pattern proved too cost prohibitive. Massive parallel-processing database technologies started evolving that specialized in data warehouse workloads. The cost of maintaining these databases was prohibitive as well. It involved expensive software prices, frequent hardware refreshes, and a substantial staffing cost. The return on investment was no longer justifiable.
  • As the format of data started evolving, the challenges associated with the EDW became more evident. Database technologies were developed to cater to semi-structured data (JSON). However, the fundamental concept was still RDBMS-based. The underlying technology was not able to effectively cater to these new types of data. There was more value in analyzing data that was not structured. The sheer variety of data was too complex for EDWs to handle.
  • The EDW was focused predominantly on Business Intelligence (BI). It facilitated the creation of scheduled reports, ad hoc data analysis, and self-service BI. Although it catered to most of the personas who performed analysis, it was not conducive to AI/ML use cases. The data in the EDW was already cleansed and structured with a razor-sharp focus on reporting. This left little room for a data scientist (statistical modelers at that time) to explore data and create a new hypothesis. In short, the EDW was primarily focused on BI.

While the EDW pattern was becoming mainstream, a perfect storm was flourishing that changed the landscape. The following section will focus on five different factors that came together to change the data architecture pattern for good.

Previous PageNext Page
You have been reading a chapter from
Data Lakehouse in Action
Published in: Mar 2022Publisher: PacktISBN-13: 9781801815932
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Pradeep Menon

Pradeep Menon is a seasoned data analytics professional with more than 18 years of experience in data and AI. Pradeep can balance business and technical aspects of any engagement and cross-pollinate complex concepts across many industries and scenarios. Currently, Pradeep works as a data and AI strategist at Microsoft. In this role, he is responsible for driving big data and AI adoption for Microsoft’s strategic customers across Asia. Pradeep is also a distinguished speaker and blogger and has given numerous keynotes on cloud technologies, data, and AI.
Read more about Pradeep Menon