You're reading from Modern Data Architecture on AWS

Product typeBook

Published inAug 2023

PublisherPackt

ISBN-139781801813396

Edition1st Edition

Concepts

Data Science

Author (1)

Behram Irani

Performant and Cost-Effective Data Platform

In this chapter, we will look at the following key topics:

Why does a performant and cost-effective data platform matter?
Data storage optimizations
Compute resource optimizations
Cost optimization tools
Tool-specific performance tuning

Why does a performant and cost-effective data platform matter?

One of the key pillars of a modern data architecture on AWS is around the performance and cost of the data platform being built. Users of the platform are not going to wait 5 minutes for a report to load. Also, if an organization were to measure the return on investment from the data platform, getting a dollar’s worth of benefit is not sustainable if it costs them two dollars to get the result.

The performant and cost-effective pillar of modern data architecture on AWS matters for several reasons:

Cost-efficiency: Optimizing costs is crucial for any organization. By implementing a cost-optimized data architecture, you can minimize unnecessary expenses and achieve a better return on investment. AWS provides a wide range of services and tools to help you control and optimize your data-related costs.
Scalability: AWS offers highly scalable services that allow you to scale your data infrastructure based...

Data storage optimizations

In any data platform, the storage layer is the foundation since all the data across different systems inside the platform is stored in different types of storage. Even though the data storage cost is often not the most dominant part of the overall expenditure on the data platform, it can start to creep up if the best practices are not followed.

Let’s bring up a scenario that requires a deep dive into storage optimization.

Use case for storage optimization

GreatFin has established a data platform on AWS and uses many of the data and analytics services provided by AWS to operate different areas of the platform. After onboarding data from a variety of sources, the combined platform storage across all LOBs has grown to a petabyte scale. GreatFin’s storage infrastructure on AWS lacks optimization, leading to potential challenges such as high storage costs, performance bottlenecks, limited scalability, and inadequate data protection.

The...

Compute resource optimizations

In any typical data modern data platform that’s been built using AWS data and analytics services, the platform infrastructure expenses will be dominated by the compute expenses provided by many of the services. Take any service we discussed in this book, be it DMS for data ingestion, Glue and EMR for data processing, Kinesis and MSK for streaming data, Redshift for data warehouses, Athena for ad hoc analytics on the data lake, different SageMaker tools for ML, OpenSearch Service for operational analytics, QuickSight for business intelligence and many other supporting services – if you look at the overall cost of each of these services, you will find that the vast majority of the expense comes from the compute resources supporting these services. The reason is simple – CPUs/GPUs are significantly more expensive than storage, memory, and networking.

Compute resources are also one of the most important dimensions regarding the optimal...

Cost optimization tools

AWS provides several cost optimization tools that can help you manage and optimize your AWS spending. The following sections show some key cost optimization tools offered by AWS.

AWS Cost Explorer

AWS Cost Explorer is a built-in cost management tool that provides visibility into your AWS costs and usage. It allows you to analyze your costs, view historical spending patterns, and forecast future costs. You can drill down into specific cost categories, services, or regions to identify areas where cost optimizations can be made.

Cost Explorer allows you to look at different service spends for each month, as shown in the following screenshot. This gives you a good understanding of rising costs that might require optimization reviews:

Figure 16.14 – AWS Cost Explorer

AWS Budgets

AWS Budgets enables you to set cost and usage budgets for your AWS resources. You can define spending thresholds and receive alerts when your...

Tool-specific performance tuning

We covered a lot about optimizing the service infrastructure in this chapter. However, often, optimizations happen at the service or tool level. This is caused by changing the configurations of the service or fine-tuning the logic that runs on these services. Typically, tuning is done to improve performance, which also helps save costs. It will not be possible to cover every aspect of performance tuning in this section, but we will try to cover some of the obvious ones from some of the key services that help build a data platform on AWS.

Performance tuning measures on Amazon Redshift

Many aspects of performance tuning depend on root cause analysis; hence, we may not be able to cover every tunable in Redshift. Also, recent Redshift autonomics advancements have made a lot of tunable settings automatic now; things such as data distribution, sorting, and analyze and vacuum operations can all be made automatic by the service. However, some common tunable...

Summary

This chapter was all about ensuring that the data platform that’s built is performant as well as cost-effective. We started by understanding the need for a data platform that operates optimally. If any part of the data platform is either not performing well or is very expensive, it often creates a snowball effect and affects the business negatively.

A lot of cost optimization can be achieved by optimizing the infrastructure used by the AWS services under the covers. By optimizing storage and compute resources, we can save significant costs. We also looked at some of the tools AWS provides that help in the cost optimization process.

Finally, we looked at some of the service-specific tuning settings that can help with performance improvements. The list of such improvements can be quite long for each service, but the key message was to leverage the best practices for each service and always perform a WAR before deploying workloads in production.

In the next and...

References

Amazon Well-Architected Framework – Data Analytics Lens: https://docs.aws.amazon.com/wellarchitected/latest/analytics-lens/analytics-lens.html.

The rest of the chapter is locked

You have been reading a chapter from

Modern Data Architecture on AWS

Published in: Aug 2023Publisher: PacktISBN-13: 9781801813396

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Behram Irani

Behram Irani is currently a technology leader with Amazon Web Services (AWS) specializing in data, analytics and AI/ML. He has spent over 18 years in the tech industry helping organizations, from start-ups to large-scale enterprises, modernize their data platforms. In the last 6 years working at AWS, Behram has been a thought leader in the data, analytics and AI/ML space; publishing multiple papers and leading the digital transformation efforts for many organizations across the globe. Behram has completed his Bachelor of Engineering in Computer Science from the University of Pune and has an MBA degree from the University of Florida.
Read more about Behram Irani

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages