Reader small image

You're reading from  Modern Data Architecture on AWS

Product typeBook
Published inAug 2023
PublisherPackt
ISBN-139781801813396
Edition1st Edition
Concepts
Right arrow
Author (1)
Behram Irani
Behram Irani
author image
Behram Irani

Behram Irani is currently a technology leader with Amazon Web Services (AWS) specializing in data, analytics and AI/ML. He has spent over 18 years in the tech industry helping organizations, from start-ups to large-scale enterprises, modernize their data platforms. In the last 6 years working at AWS, Behram has been a thought leader in the data, analytics and AI/ML space; publishing multiple papers and leading the digital transformation efforts for many organizations across the globe. Behram has completed his Bachelor of Engineering in Computer Science from the University of Pune and has an MBA degree from the University of Florida.
Read more about Behram Irani

Right arrow

Automate, Operationalize, and Monetize

In this chapter, we will look at the following key topics:

  • The need for automation
  • The DevOps process
  • The DataOps process
  • The MLOps process
  • Data monetization
  • Wrap-up

The need for automation

Even though we have come to the last chapter of the book, a data platform cannot be sustainable in the long run if a large number of teams manually manage all the day-to-day operations. In a mature organization, personas who help build and operate the data platform do not get access to the AWS console in production. So, the main question arises: how do they manage and operate a modern data platform? The answer is simple – each and every aspect of the data platform is managed and operated through automation scripts and pipelines.

Before we dive into what automation entails, let’s quickly highlight why automation is needed in the first place.

Automation plays a crucial role in an analytics platform on AWS for several reasons:

  • Efficiency: Automation eliminates manual, repetitive tasks, allowing analytics processes to run more efficiently. It reduces the time and effort required to perform data ingestion, transformation, modeling, and...

The DevOps process

DevOps, short for development and operations, is an approach to software development and deployment that aims to bridge the gap between development teams, which are responsible for creating the data platform, and operations teams, which are responsible for deploying and managing the service in production environments. DevOps emphasizes collaboration, communication, and automation to streamline the software development life cycle and improve the speed, efficiency, and quality of software delivery.

DevOps aims to stabilize the priorities of two competing forces in the business. The following figure highlights this friction between the development and operations teams.

Figure 17.1 – Competing forces between the development and operations teams

Figure 17.1 – Competing forces between the development and operations teams

Before we get to the use cases and the tools and services used for DevOps, let’s first understand the key principles of DevOps:

  • Collaboration and communication: DevOps process...

The DataOps process

DataOps in AWS refers to the application of DevOps principles and practices to data-related workflows and processes. It focuses on optimizing the development, deployment, and management of data pipelines, data integration, and data analytics solutions.

DataOps aims to improve the speed, quality, and reliability of data operations by fostering collaboration, automation, and repeatability across the data life cycle. It combines data engineering, data integration, data governance, and data analytics with the principles of CI/CD, version control, and IaC.

On AWS, several services and tools can be leveraged to implement DataOps practices:

  • AWS Glue: The AWS Glue ETL service simplifies data preparation and integration. It allows you to create and manage data pipelines using workflows, perform data transformations, and automate ETL jobs.
  • AWS Lake Formation: AWS Lake Formation is a service that simplifies the process of building, securing, and managing...

The MLOps process

Machine Learning Operations (MLOps) in AWS refers to the practices and tools employed to manage and operationalize ML workflows and models on the AWS platform. MLOps aims to streamline and automate the deployment, monitoring, and management of ML models, ensuring their reliability, scalability, and reproducibility.

MLOps has a direct impact in the following ways:

  • It boosts data scientists’ productivity by simplifying the ML process
  • It helps maintain high model accuracy
  • It helps enhance the security and compliance of the ML platform

ML is an iterative process and without MLOps, creating an end-to-end ML process would be a challenge. Every stage in the ML life cycle has its own set of activities, and specific tools in Amazon SageMaker assist at every stage.

The following figure highlights all the different stages the whole ML process goes through.

Figure 17.16 – ML life cycle

Figure 17.16 – ML life cycle

Using DevOps tools...

Data monetization

All the time and effort spent by organizations to build a modern data platform on AWS is for a reason; to get the best return on investment (ROI). Typically, ROI can be measured in monetary terms and, most of the time, we think of external monetization, where we get profit from the data sold outside the organization. However, data monetization has many other forms including direct, indirect, and internal monetization.

All organizations want to treat data as a product, which refers to the concept of treating data as a valuable asset that can be packaged, managed, and monetized. AWS provides various services and tools that enable organizations to leverage their data and create data-driven products for internal as well as external use.

There are several ways to monetize data using the data platform built on AWS. Here are some common data monetization types:

  • Selling data products on AWS Marketplace: AWS Marketplace allows you to package and sell data products...

Wrap-up

Finally, we will wrap up this book with a final reference architecture for a data platform on AWS. Not all the services are represented here, but the most common ones used are shown in their own section. The Data Consumption section represents a variety of purpose-built stores, ML platforms, as well as query and visualization services. You can add many more services depending on the use case being solved and can also leverage third-party partner solutions.

The following figure represents the data and analytics reference architecture built on AWS.

Figure 17.24 – Reference architecture of the data platform on AWS

Figure 17.24 – Reference architecture of the data platform on AWS

Finally, I want to leave you with the following thoughts. The future evolution of data and analytics platforms is expected to be driven by several key trends. These include the following:

  • Increased adoption of cloud: Cloud-based data and analytics platforms will continue to gain prominence, offering scalability, agility...

Summary

In this chapter, we concluded the book by providing you with options for automating your data platform. We looked at DevOps, DataOps, and MLOps as the three ways to completely automate and operationalize your data platform.

In the DevOps process, we looked at how CI/CD and Iac help organizations with an automated, repeatable, and organized way to operationalize their AWS infrastructure, services, and the features inside those services. DataOps focuses on simplifying the data pipelines by leveraging orchestration services such as Amazon MWAA and AWS Step functions. MLOps on the other hand helps to manage the entire life cycle of the ML process and Amazon SageMaker provides capabilities to make MLOps a seamless process.

Finally, we looked at how organizations can monetize their data by either using DaaS, insights-as-a-service, or API-as-a-service. All organizations have the common goal of deriving value from their data platform, either directly by monetizing the data or...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Modern Data Architecture on AWS
Published in: Aug 2023Publisher: PacktISBN-13: 9781801813396
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime

Author (1)

author image
Behram Irani

Behram Irani is currently a technology leader with Amazon Web Services (AWS) specializing in data, analytics and AI/ML. He has spent over 18 years in the tech industry helping organizations, from start-ups to large-scale enterprises, modernize their data platforms. In the last 6 years working at AWS, Behram has been a thought leader in the data, analytics and AI/ML space; publishing multiple papers and leading the digital transformation efforts for many organizations across the globe. Behram has completed his Bachelor of Engineering in Computer Science from the University of Pune and has an MBA degree from the University of Florida.
Read more about Behram Irani