You're reading from Modern Data Architecture on AWS

Product typeBook

Published inAug 2023

PublisherPackt

ISBN-139781801813396

Edition1st Edition

Concepts

Data Science

Author (1)

Behram Irani

Automate, Operationalize, and Monetize

In this chapter, we will look at the following key topics:

The need for automation
The DevOps process
The DataOps process
The MLOps process
Data monetization
Wrap-up

The need for automation

Even though we have come to the last chapter of the book, a data platform cannot be sustainable in the long run if a large number of teams manually manage all the day-to-day operations. In a mature organization, personas who help build and operate the data platform do not get access to the AWS console in production. So, the main question arises: how do they manage and operate a modern data platform? The answer is simple – each and every aspect of the data platform is managed and operated through automation scripts and pipelines.

Before we dive into what automation entails, let’s quickly highlight why automation is needed in the first place.

Automation plays a crucial role in an analytics platform on AWS for several reasons:

Efficiency: Automation eliminates manual, repetitive tasks, allowing analytics processes to run more efficiently. It reduces the time and effort required to perform data ingestion, transformation, modeling, and...

The DevOps process

DevOps, short for development and operations, is an approach to software development and deployment that aims to bridge the gap between development teams, which are responsible for creating the data platform, and operations teams, which are responsible for deploying and managing the service in production environments. DevOps emphasizes collaboration, communication, and automation to streamline the software development life cycle and improve the speed, efficiency, and quality of software delivery.

DevOps aims to stabilize the priorities of two competing forces in the business. The following figure highlights this friction between the development and operations teams.

Figure 17.1 – Competing forces between the development and operations teams

Before we get to the use cases and the tools and services used for DevOps, let’s first understand the key principles of DevOps:

Collaboration and communication: DevOps process...

The DataOps process

DataOps in AWS refers to the application of DevOps principles and practices to data-related workflows and processes. It focuses on optimizing the development, deployment, and management of data pipelines, data integration, and data analytics solutions.

DataOps aims to improve the speed, quality, and reliability of data operations by fostering collaboration, automation, and repeatability across the data life cycle. It combines data engineering, data integration, data governance, and data analytics with the principles of CI/CD, version control, and IaC.

On AWS, several services and tools can be leveraged to implement DataOps practices:

AWS Glue: The AWS Glue ETL service simplifies data preparation and integration. It allows you to create and manage data pipelines using workflows, perform data transformations, and automate ETL jobs.
AWS Lake Formation: AWS Lake Formation is a service that simplifies the process of building, securing, and managing...

The MLOps process

Machine Learning Operations (MLOps) in AWS refers to the practices and tools employed to manage and operationalize ML workflows and models on the AWS platform. MLOps aims to streamline and automate the deployment, monitoring, and management of ML models, ensuring their reliability, scalability, and reproducibility.

MLOps has a direct impact in the following ways:

It boosts data scientists’ productivity by simplifying the ML process
It helps maintain high model accuracy
It helps enhance the security and compliance of the ML platform

ML is an iterative process and without MLOps, creating an end-to-end ML process would be a challenge. Every stage in the ML life cycle has its own set of activities, and specific tools in Amazon SageMaker assist at every stage.

The following figure highlights all the different stages the whole ML process goes through.

Figure 17.16 – ML life cycle

Using DevOps tools...

Data monetization

All the time and effort spent by organizations to build a modern data platform on AWS is for a reason; to get the best return on investment (ROI). Typically, ROI can be measured in monetary terms and, most of the time, we think of external monetization, where we get profit from the data sold outside the organization. However, data monetization has many other forms including direct, indirect, and internal monetization.

All organizations want to treat data as a product, which refers to the concept of treating data as a valuable asset that can be packaged, managed, and monetized. AWS provides various services and tools that enable organizations to leverage their data and create data-driven products for internal as well as external use.

There are several ways to monetize data using the data platform built on AWS. Here are some common data monetization types:

Selling data products on AWS Marketplace: AWS Marketplace allows you to package and sell data products...

Wrap-up

Finally, we will wrap up this book with a final reference architecture for a data platform on AWS. Not all the services are represented here, but the most common ones used are shown in their own section. The Data Consumption section represents a variety of purpose-built stores, ML platforms, as well as query and visualization services. You can add many more services depending on the use case being solved and can also leverage third-party partner solutions.

The following figure represents the data and analytics reference architecture built on AWS.

Figure 17.24 – Reference architecture of the data platform on AWS

Finally, I want to leave you with the following thoughts. The future evolution of data and analytics platforms is expected to be driven by several key trends. These include the following:

Increased adoption of cloud: Cloud-based data and analytics platforms will continue to gain prominence, offering scalability, agility...

Summary

In this chapter, we concluded the book by providing you with options for automating your data platform. We looked at DevOps, DataOps, and MLOps as the three ways to completely automate and operationalize your data platform.

In the DevOps process, we looked at how CI/CD and Iac help organizations with an automated, repeatable, and organized way to operationalize their AWS infrastructure, services, and the features inside those services. DataOps focuses on simplifying the data pipelines by leveraging orchestration services such as Amazon MWAA and AWS Step functions. MLOps on the other hand helps to manage the entire life cycle of the ML process and Amazon SageMaker provides capabilities to make MLOps a seamless process.

Finally, we looked at how organizations can monetize their data by either using DaaS, insights-as-a-service, or API-as-a-service. All organizations have the common goal of deriving value from their data platform, either directly by monetizing the data or...

References

CI/CD on AWS workshop: https://catalog.workshops.aws/cicdonaws/en-US
AWS CloudFormation workshop: https://catalog.workshops.aws/cfn101/en-US
AWS CDK workshop: https://catalog.us-east-1.prod.workshops.aws/workshops/10141411-0192-4021-afa8-2436f3c66bd8/en-US
AWS Step Functions workshop: https://catalog.workshops.aws/stepfunctions/en-US
Amazon Managed Workflows for Apache Airflow workshop: https://catalog.workshops.aws/amazon-mwaa-for-analytics/en-US
Modern data architecture workshop: https://catalog.workshops.aws/modern-data-architecture/en-US

The rest of the chapter is locked

You have been reading a chapter from

Modern Data Architecture on AWS

Published in: Aug 2023Publisher: PacktISBN-13: 9781801813396

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at €14.99/month. Cancel anytime

Author (1)

Behram Irani

Behram Irani is currently a technology leader with Amazon Web Services (AWS) specializing in data, analytics and AI/ML. He has spent over 18 years in the tech industry helping organizations, from start-ups to large-scale enterprises, modernize their data platforms. In the last 6 years working at AWS, Behram has been a thought leader in the data, analytics and AI/ML space; publishing multiple papers and leading the digital transformation efforts for many organizations across the globe. Behram has completed his Bachelor of Engineering in Computer Science from the University of Pune and has an MBA degree from the University of Florida.
Read more about Behram Irani

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages