Reader small image

You're reading from  Optimizing Databricks Workloads

Product typeBook
Published inDec 2021
PublisherPackt
ISBN-139781801819077
Edition1st Edition
Right arrow
Authors (3):
Anirudh Kala
Anirudh Kala
author image
Anirudh Kala

Anirudh Kala is an expert in machine learning techniques, artificial intelligence, and natural language processing. He has helped multiple organizations to run their large-scale data warehouses with quantitative research, natural language generation, data science exploration, and big data implementation. He has worked in every aspect of data analytics using the Azure data platform. Currently, he works as the director of Celebal Technologies, a data science boutique firm dedicated to large-scale analytics. Anirudh holds a computer engineering degree from the University of Rajasthan and his work history features the likes of IBM and ZS Associates.
Read more about Anirudh Kala

Anshul Bhatnagar
Anshul Bhatnagar
author image
Anshul Bhatnagar

Anshul Bhatnagar is an experienced, hands-on data architect involved in the architecture, design, and implementation of data platform architectures, and distributed systems. He has worked in the IT industry since 2015 in a range of roles such as Hadoop/Spark developer, data engineer, and data architect. He has also worked in many other sectors including energy, media, telecoms, and e-commerce. He is currently working for a data and AI boutique company, Celebal Technologies, in India. He is always keen to hear about new ideas and technologies in the areas of big data and AI, so look him up on LinkedIn to ask questions or just to say hi.
Read more about Anshul Bhatnagar

Sarthak Sarbahi
Sarthak Sarbahi
author image
Sarthak Sarbahi

Sarthak Sarbahi is a certified data engineer and analyst with a wide technical breadth and a deep understanding of Databricks. His background has led him to a variety of cloud data services with an eye toward data warehousing, big data analytics, robust data engineering, data science, and business intelligence. Sarthak graduated with a degree in mechanical engineering.
Read more about Sarthak Sarbahi

View More author details
Right arrow

Chapter 8: Case Studies

Data teams across the world are using Databricks to solve the toughest data problems. Every Databricks success story brings a unique set of challenges and new learning for architects and data professionals. Databricks can be used as a transformation layer, a real-time streaming engine, or a solution for machine learning and advanced analytics. In this chapter, we will look at several real-world case study examples and learn how Databricks is used to help drive innovation across various industries around the world.

In this chapter, we will learn about use cases from the following industries:

  • Learning case studies from the manufacturing industry
  • Learning case studies from the media and entertainment industry
  • Learning case studies from the retail and FMCG industry
  • Learning case studies from the pharmaceutical industry
  • Learning case studies from the e-commerce industry
  • Learning case studies from the logistics and supply chain...

Learning case studies from the manufacturing industry

Data and statistical analysis help manufacturing organizations make accurate decisions and streamline processes. This makes manufacturing processes become more efficient and prevents unwanted losses for the organizations.

Case study 1 – leading automobile manufacturing company

An organization was looking for a cloud-scale analytics platform to support growing online analytical processing (OLAP) requirements, a modernized visualization capability to support business intelligence needs, and advanced analytical and artificial intelligence (AI) solutions for existing data.

The proposed solution architecture was as follows:

  • Data from the Oracle database and flat files was extracted using Azure Data Factory and loaded into Azure Data Lake.
  • Azure Databricks was used to transform the historical data. Then, the data would be loaded into the Azure Synapse Data Warehouse.
  • A lead scoring system was built using...

Learning case studies from the media and entertainment industry

Data plays a crucial role for media and entertainment organizations as it helps them understand viewer behavior and identify the true market value of the content being shared. This helps in improving the quality of content being delivered and at the same time opens up new monetization avenues for the production houses.

Case study 5 – HD Insights to Databricks migration for a media giant

In this case study, the prime requirement of the organization was processing and number crunching datasets that were 2-3 TB in size every day. This was required to perform analytics on on-demand advertising video service's user data to generate reports and dashboards for the marketing team. Also, the organization was not able to automate the extract, transform, and load (ETL) process of their web and mobile platform viewer's data. This ETL process was being executed using Azure HD Insights. Moreover, managing HD...

Learning case studies from the retail and FMCG industry

Data is more important than ever for the retail and FMCG industry. It can be very helpful for maintaining a lean inventory. In addition, data is critical for optimizing the prices of products on demand. Also, a data-driven approach can boost relationships with business partners, thereby helping to smoothen the supply chain.

Case study 6 – real-time analytics using IoT Hub for a retail giant

An organization wanted to build an end-to-end solution wherein edge devices gathered metrics at a certain frequency from all the instruments on the floor shop. These metrics were to be utilized to conduct edge analytics for real-time issues. Thereon, the data would be pushed to a cloud platform where near-real-time data transformations would be done and delivered to a dashboard for visualization. The same data would be persisted for batch processing and leveraged machine learning to gain insights.

The proposed solution architecture...

Learning case studies from the pharmaceutical industry

Data analytics and AI in the pharmaceutical industry play a crucial role in optimizing clinical trials, analyzing patients' behavior, improving logistics, and reducing costs.

Case study 7 – pricing analytics for a pharmaceutical company

The organization required a pricing decision support framework to get insights on gross margin increment based on historical events, the prioritization of SKUs, review indicators, and more. The framework was to be designed in a way so that the smart machine learning models could be transferred and scaled to retain the quality and depth of the information gathered.

A pricing decision framework was developed using machine learning on Azure Databricks, which helped to predict the SKU that should go for pricing review. The system was also capable of predicting the next month's volume, which helped in deciding the correct price for a specific SKU.

The solution architecture...

Learning case studies from the e-commerce industry

Big data analytics in the e-commerce industry helps businesses understand consumer purchase patterns, improve user experience, and increase revenue.

Case study 8 – migrating interactive analytical apps from Redshift to Postgres

An organization in the e-commerce space was using AWS Redshift as their data warehouse and Databricks as their ETL engine. The setup was deployed across different data centers in different regions on Amazon Web Services (AWS) and Google Cloud Platform (GCP). They were also running into performance bottlenecks and were incurring egress costs unnecessarily. The data was growing faster than the compute required to process that data. AWS Redshift was unable to independently scale storage and compute. Hence, the organization decided to migrate its data and analytics landscape to Azure.

AWS Redshift's data was migrated to Azure Database for PostgreSQL Hyperscale (Citus). Citus is an open source...

Learning case studies from the logistics and supply chain industry

Data analytics and machine learning play a crucial role in the functioning of the logistics and supply chain industry. Data can help reduce inefficiencies in the supply chain processes and optimize deliveries at the same time. Machine learning and predictive analytics help in better planning, procurement, and consumer fulfillment.

Case study 9 – accelerating intelligent insights with tailored big data analytics

An organization wanted to create an end-to-end data warehousing platform on Azure. Their original process involved manually collecting data from siloed sources and creating necessary reports from it. There was a need to integrate all the data sources and implement a single source of truth, which would be on the Azure cloud. The proposed solution architecture was as follows:

  • Full load and incremental data pipelines were developed using Azure Data Factory to ingest data into Azure Synapse...

Summary

In this chapter, we learned about several Databricks case studies, ranging from manufacturing and media to logistics and the supply chain. All these solution architectures have employed Databricks in different ways. Irrespective of its role in an organization's data journey, Databricks has always emerged as a game-changer in the world of big data analytics.

This brings us to the end of this book. We learned quite a lot about Spark and Databricks, starting with the fundamentals and quickly moving toward optimization techniques and best practices. We learned about how Delta Lake, MLflow, and Koalas help make Databricks a complete and cloud-first data platform for all data engineering and data science needs.

Why subscribe?

  • Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals
  • Improve your learning with Skill Plans built especially for you
  • Get a free eBook or video every month
  • Fully searchable for easy access to vital information
  • Copy and paste, print, and bookmark content

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at packt.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at customercare@packtpub.com for more details.

At www.packt.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Optimizing Databricks Workloads
Published in: Dec 2021Publisher: PacktISBN-13: 9781801819077
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (3)

author image
Anirudh Kala

Anirudh Kala is an expert in machine learning techniques, artificial intelligence, and natural language processing. He has helped multiple organizations to run their large-scale data warehouses with quantitative research, natural language generation, data science exploration, and big data implementation. He has worked in every aspect of data analytics using the Azure data platform. Currently, he works as the director of Celebal Technologies, a data science boutique firm dedicated to large-scale analytics. Anirudh holds a computer engineering degree from the University of Rajasthan and his work history features the likes of IBM and ZS Associates.
Read more about Anirudh Kala

author image
Anshul Bhatnagar

Anshul Bhatnagar is an experienced, hands-on data architect involved in the architecture, design, and implementation of data platform architectures, and distributed systems. He has worked in the IT industry since 2015 in a range of roles such as Hadoop/Spark developer, data engineer, and data architect. He has also worked in many other sectors including energy, media, telecoms, and e-commerce. He is currently working for a data and AI boutique company, Celebal Technologies, in India. He is always keen to hear about new ideas and technologies in the areas of big data and AI, so look him up on LinkedIn to ask questions or just to say hi.
Read more about Anshul Bhatnagar

author image
Sarthak Sarbahi

Sarthak Sarbahi is a certified data engineer and analyst with a wide technical breadth and a deep understanding of Databricks. His background has led him to a variety of cloud data services with an eye toward data warehousing, big data analytics, robust data engineering, data science, and business intelligence. Sarthak graduated with a degree in mechanical engineering.
Read more about Sarthak Sarbahi