Reader small image

You're reading from  Modern Data Architecture on AWS

Product typeBook
Published inAug 2023
PublisherPackt
ISBN-139781801813396
Edition1st Edition
Concepts
Right arrow
Author (1)
Behram Irani
Behram Irani
author image
Behram Irani

Behram Irani is currently a technology leader with Amazon Web Services (AWS) specializing in data, analytics and AI/ML. He has spent over 18 years in the tech industry helping organizations, from start-ups to large-scale enterprises, modernize their data platforms. In the last 6 years working at AWS, Behram has been a thought leader in the data, analytics and AI/ML space; publishing multiple papers and leading the digital transformation efforts for many organizations across the globe. Behram has completed his Bachelor of Engineering in Computer Science from the University of Pune and has an MBA degree from the University of Florida.
Read more about Behram Irani

Right arrow

Performant and Cost-Effective Data Platform

In this chapter, we will look at the following key topics:

  • Why does a performant and cost-effective data platform matter?
  • Data storage optimizations
  • Compute resource optimizations
  • Cost optimization tools
  • Tool-specific performance tuning

Why does a performant and cost-effective data platform matter?

One of the key pillars of a modern data architecture on AWS is around the performance and cost of the data platform being built. Users of the platform are not going to wait 5 minutes for a report to load. Also, if an organization were to measure the return on investment from the data platform, getting a dollar’s worth of benefit is not sustainable if it costs them two dollars to get the result.

The performant and cost-effective pillar of modern data architecture on AWS matters for several reasons:

  • Cost-efficiency: Optimizing costs is crucial for any organization. By implementing a cost-optimized data architecture, you can minimize unnecessary expenses and achieve a better return on investment. AWS provides a wide range of services and tools to help you control and optimize your data-related costs.
  • Scalability: AWS offers highly scalable services that allow you to scale your data infrastructure based...

Data storage optimizations

In any data platform, the storage layer is the foundation since all the data across different systems inside the platform is stored in different types of storage. Even though the data storage cost is often not the most dominant part of the overall expenditure on the data platform, it can start to creep up if the best practices are not followed.

Let’s bring up a scenario that requires a deep dive into storage optimization.

Use case for storage optimization

GreatFin has established a data platform on AWS and uses many of the data and analytics services provided by AWS to operate different areas of the platform. After onboarding data from a variety of sources, the combined platform storage across all LOBs has grown to a petabyte scale. GreatFin’s storage infrastructure on AWS lacks optimization, leading to potential challenges such as high storage costs, performance bottlenecks, limited scalability, and inadequate data protection.

The...

Compute resource optimizations

In any typical data modern data platform that’s been built using AWS data and analytics services, the platform infrastructure expenses will be dominated by the compute expenses provided by many of the services. Take any service we discussed in this book, be it DMS for data ingestion, Glue and EMR for data processing, Kinesis and MSK for streaming data, Redshift for data warehouses, Athena for ad hoc analytics on the data lake, different SageMaker tools for ML, OpenSearch Service for operational analytics, QuickSight for business intelligence and many other supporting services – if you look at the overall cost of each of these services, you will find that the vast majority of the expense comes from the compute resources supporting these services. The reason is simple – CPUs/GPUs are significantly more expensive than storage, memory, and networking.

Compute resources are also one of the most important dimensions regarding the optimal...

Cost optimization tools

AWS provides several cost optimization tools that can help you manage and optimize your AWS spending. The following sections show some key cost optimization tools offered by AWS.

AWS Cost Explorer

AWS Cost Explorer is a built-in cost management tool that provides visibility into your AWS costs and usage. It allows you to analyze your costs, view historical spending patterns, and forecast future costs. You can drill down into specific cost categories, services, or regions to identify areas where cost optimizations can be made.

Cost Explorer allows you to look at different service spends for each month, as shown in the following screenshot. This gives you a good understanding of rising costs that might require optimization reviews:

Figure 16.14 – AWS Cost Explorer

Figure 16.14 – AWS Cost Explorer

AWS Budgets

AWS Budgets enables you to set cost and usage budgets for your AWS resources. You can define spending thresholds and receive alerts when your...

Tool-specific performance tuning

We covered a lot about optimizing the service infrastructure in this chapter. However, often, optimizations happen at the service or tool level. This is caused by changing the configurations of the service or fine-tuning the logic that runs on these services. Typically, tuning is done to improve performance, which also helps save costs. It will not be possible to cover every aspect of performance tuning in this section, but we will try to cover some of the obvious ones from some of the key services that help build a data platform on AWS.

Performance tuning measures on Amazon Redshift

Many aspects of performance tuning depend on root cause analysis; hence, we may not be able to cover every tunable in Redshift. Also, recent Redshift autonomics advancements have made a lot of tunable settings automatic now; things such as data distribution, sorting, and analyze and vacuum operations can all be made automatic by the service. However, some common tunable...

Summary

This chapter was all about ensuring that the data platform that’s built is performant as well as cost-effective. We started by understanding the need for a data platform that operates optimally. If any part of the data platform is either not performing well or is very expensive, it often creates a snowball effect and affects the business negatively.

A lot of cost optimization can be achieved by optimizing the infrastructure used by the AWS services under the covers. By optimizing storage and compute resources, we can save significant costs. We also looked at some of the tools AWS provides that help in the cost optimization process.

Finally, we looked at some of the service-specific tuning settings that can help with performance improvements. The list of such improvements can be quite long for each service, but the key message was to leverage the best practices for each service and always perform a WAR before deploying workloads in production.

In the next and...

References

Amazon Well-Architected Framework – Data Analytics Lens: https://docs.aws.amazon.com/wellarchitected/latest/analytics-lens/analytics-lens.html.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Modern Data Architecture on AWS
Published in: Aug 2023Publisher: PacktISBN-13: 9781801813396
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Behram Irani

Behram Irani is currently a technology leader with Amazon Web Services (AWS) specializing in data, analytics and AI/ML. He has spent over 18 years in the tech industry helping organizations, from start-ups to large-scale enterprises, modernize their data platforms. In the last 6 years working at AWS, Behram has been a thought leader in the data, analytics and AI/ML space; publishing multiple papers and leading the digital transformation efforts for many organizations across the globe. Behram has completed his Bachelor of Engineering in Computer Science from the University of Pune and has an MBA degree from the University of Florida.
Read more about Behram Irani