Search icon
Subscription
0
Cart icon
Close icon
You have no products in your basket yet
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Simplifying Data Engineering and Analytics with Delta

You're reading from  Simplifying Data Engineering and Analytics with Delta

Product type Book
Published in Jul 2022
Publisher Packt
ISBN-13 9781801814867
Pages 334 pages
Edition 1st Edition
Languages
Concepts
Author (1):
Anindita Mahapatra Anindita Mahapatra
Profile icon Anindita Mahapatra

Table of Contents (18) Chapters

Preface 1. Section 1 – Introduction to Delta Lake and Data Engineering Principles
2. Chapter 1: Introduction to Data Engineering 3. Chapter 2: Data Modeling and ETL 4. Chapter 3: Delta – The Foundation Block for Big Data 5. Section 2 – End-to-End Process of Building Delta Pipelines
6. Chapter 4: Unifying Batch and Streaming with Delta 7. Chapter 5: Data Consolidation in Delta Lake 8. Chapter 6: Solving Common Data Pattern Scenarios with Delta 9. Chapter 7: Delta for Data Warehouse Use Cases 10. Chapter 8: Handling Atypical Data Scenarios with Delta 11. Chapter 9: Delta for Reproducible Machine Learning Pipelines 12. Chapter 10: Delta for Data Products and Services 13. Section 3 – Operationalizing and Productionalizing Delta Pipelines
14. Chapter 11: Operationalizing Data and ML Pipelines 15. Chapter 12: Optimizing Cost and Performance with Delta 16. Chapter 13: Managing Your Data Journey 17. Other Books You May Enjoy

Business justification for tech spending

Tech enthusiasts with their love for bleeding-edge tools sometimes forget why they are building a data product. Research and exploration are important for innovation, but it needs to be disciplined and controlled. Not keeping the business counterparts in the loop results in miscommunication and misunderstandings regarding where the effort is going. Ego battles hinder project progress and result in wasted money, time, and people resources, which hurts the business. Tech should always add value and growth to a business rather than being viewed as a cost allocation. So, it is important to demonstrate the value of tech investment.

A joint business-technology strategy helps clarify the role of technology in driving business value to provide a transformation agenda. Key performance indicators (KPIs) and metrics including growth, return on investment (ROI), profitability, market share, earnings per share, margins, and revenue help quantify this investment.

The execution time of these projects is usually significant, so it is important to achieve the end goal in an agile manner in well-articulated baby steps. Some of the benefits may not be immediately realized, so it is important to balance infrastructure gains with productivity and capability gains and consider capital expenditure on initial infrastructure investment (CAPEX) versus ongoing operating expenses (OPEX) over a certain period. In addition, it is always good to do frequent risk assessments and have backup plans. Despite the best projections, costs can escalate to uncomfortable and unpredictable heights, so it is important to invest in a platform with tunable costs so that it can easily be monitored and adjusted when needed. Data is an asset and must be governed and protected from inappropriate access or breaches. Not only are such threats expensive, but they also damage the reputation of the organization:

Figure 1.20 – Mapping the impact of technology on business outcomes

Figure 1.20 – Mapping the impact of technology on business outcomes

Strategy for business transformation to use data as an asset

Data-driven organizations exhibit a culture of analytics. This cannot be confined to just a few premiere groups but rather to the entire organization. There are both cultural and technical challenges to overcome and this is where people, processes, and tools need to come together to bring around sustainable changes. Every business needs a strategy for business transformation. Here are some best practices for managing a big data initiative:

  • Understand the objectives and goals to come up with an overall enterprise strategy.
  • Assess the current state and document all the use cases and data sources.
  • Develop a roadmap that can be shared for collaborating and deciding on the main tools and frameworks to leverage organization-wide.
  • Design for data democratization to allow people to have access to data they have access to.
  • Establish rules around data governance so that workflows can be automated correctly without fear of data exfiltration.
  • Manage change as a continuous cycle of improvement. This means that there should be a center of excellence team that can serve as a hub and spoke model that interfaces with the individual lines of business. Adequate emphasis should be placed on training to engage and educate the team.

Big data trends and best practices

"The old order changeth, yielding place to new…"

(The Passing of Arthur, Alfred Lord Tennyson, 1809–1892)

We are living in an age of fast innovation and technology changes that are happening in the blink of an eye. We can learn from history and learn from the mistakes of those before us. However, we don't have the luxury to analyze everything around us and understand the top trends, though this will give us a better appreciation of the landscape and help us gravitate toward the right technology for our needs.

There is an increase in the adoption of cloud infrastructure because of the following points:

  • It provides affordable and scalable storage.
  • It's an elastic distributed compute infrastructure with pay-as-you-go flexibility.
  • It's a multi-cloud strategy and some on-premises presence to hedge risks.
  • It provides an increase in data consolidation to break down individual data silos in data lakes.
  • Other data stores such as data warehouses continue to live on, while newer ones such as lakehouses and data meshes are being introduced.
  • Unstructured data usage is on the rise.
  • Improved speed to insights.
  • Convergence of big data and ML.
  • Detecting and responding to pattern signals in real time as opposed to batch.
  • Analytics has moved from simple BI reporting to ML and AI as industries move from descriptive analytics to prescriptive and finally predictive.
  • Improved governance and security
  • Data discovery using business and operational enterprise-level meta stores.
  • Data governance to control who has access to what data.
  • Data lineage and data quality to determine how reliable the data is.

Let's summarize some of the best practices for building robust and reliable data platforms:

  • Build decoupled (storage and compute) systems because storage is far cheaper than compute. So, the ability to turn off compute when it's not in use will be a big cost saving. Having a microservices architecture will help manage such changes.
  • Leverage cloud storage, preferably in an open format.
  • Use the right tool for the right job.
  • Break down the data silos and create a single view of the data so that multiple use cases can leverage the same data with different tools.
  • Design data solutions with due consideration to use case-specific trade-offs such as latency, throughput, and access patterns.
  • Log design patterns where you maintain immutable logs for audit, compliance, and traceability requirements.
  • Expose multiple views of the data for consumers with different access privileges instead of copying the datasets multiple times to make slight changes to the data access requirements.
  • There will always be a point where a team will have to decide between whether they build or buy. Speed to insights should guide this decision, irrespective of how smart the team is or whether there is a window of opportunity, and you should not lose it in the pursuit of tech pleasures. The cost of building a solution to cater to an immediate need should be compared with the cost of a missed opportunity.
You have been reading a chapter from
Simplifying Data Engineering and Analytics with Delta
Published in: Jul 2022 Publisher: Packt ISBN-13: 9781801814867
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}