Reader small image

You're reading from  The Machine Learning Solutions Architect Handbook - Second Edition

Product typeBook
Published inApr 2024
PublisherPackt
ISBN-139781805122500
Edition2nd Edition
Right arrow
Author (1)
David Ping
David Ping
author image
David Ping

David Ping is an accomplished author and industry expert with over 28 years of experience in the field of data science and technology. He currently serves as the leader of a team of highly skilled data scientists and AI/ML solutions architects at AWS. In this role, he assists organizations worldwide in designing and implementing impactful AI/ML solutions to drive business success. David's extensive expertise spans a range of technical domains, including data science, ML solution and platform design, data management, AI risk, and AI governance. Prior to joining AWS, David held positions in renowned organizations such as JPMorgan, Credit Suisse, and Intel Corporation, where he contributed to the advancements of science and technology through engineering and leadership roles. With his wealth of experience and diverse skill set, David brings a unique perspective and invaluable insights to the field of AI/ML.
Read more about David Ping

Right arrow

Designing an Enterprise ML Architecture with AWS ML Services

Many organizations opt to build enterprise ML platforms to support numerous fast-moving initiatives. These platforms are designed to facilitate the entire ML lifecycle and accommodate various usage patterns, all while emphasizing automation and scalability. As a practitioner, I often get asked to provide architectural guidance for creating such enterprise ML platforms. In this chapter, we will explore the fundamental requirements for designing enterprise ML platforms. We will cover a range of topics, such as workflow automation, infrastructure scalability, and system monitoring.

Throughout the discussion, you will gain insights into architecture patterns that enable the development of technology solutions to automate the end-to-end ML workflow and ensure seamless deployment at a large scale. Additionally, we will delve deep into essential components of enterprise ML architecture, such as model training, model hosting...

Technical requirements

We will continue to use the AWS environment for the hands-on portion of this chapter. All the source code mentioned in this chapter can be found at https://github.com/PacktPublishing/The-Machine-Learning-Solutions-Architect-and-Risk-Management-Handbook-Second-Edition/tree/main/Chapter09.

Key considerations for ML platforms

Designing, building, and operating ML platforms are complex endeavors as there are many different considerations, including the personas, key ML process workflows, and various technical capability requirements for the different personas and workflows. In this section, we will delve into each of these key considerations in depth. Let’s dive in!

The personas of ML platforms and their requirements

In the previous chapter, we talked about building a data science environment for the data scientists and ML engineers who mainly focus on experimentation and model development. In an enterprise setting where an ML platform is needed, there are other personas involved, each with their own specific requirements. At a high level, there are two types of personas associated with the ML platform: ML platform builders and ML platform users.

ML platform builders

ML platform builders have the crucial responsibility of constructing the infrastructure...

Key requirements for an enterprise ML platform

To deliver business benefits through ML at scale, organizations must have the capability to rapidly experiment with diverse scientific approaches, ML technologies, and extensive datasets. Once ML models are trained and validated, they need to seamlessly transition to production deployment. While some similarities exist between a traditional enterprise software system and an ML platform, such as scalability and security concerns, an enterprise ML platform presents distinctive challenges. These include the need to integrate with the data platform and high-performance computing infrastructure to facilitate large-scale model training.

Let’s delve into some specific core requirements of an enterprise ML platform to meet the needs of different users and operators:

  • Support for the end-to-end ML lifecycle: An enterprise ML platform must cater to both data science experimentation and production-grade operations and deployments...

Enterprise ML architecture pattern overview

Building an enterprise ML platform on AWS starts with creating different environments to enable different data science and operation functions. The following diagram shows the core environments that normally make up an enterprise ML platform. From an isolation perspective, in the context of the AWS cloud, each environment in the following diagram is a separate AWS account:

Figure 9.1 – Enterprise ML architecture environments

Figure 9.1: Enterprise ML architecture environments

As we discussed in Chapter 8, Building a Data Science Environment Using AWS ML Services, data scientists utilize the data science environment for experimentation, model building, and tuning. Once these experiments are completed, the data scientists commit their work to the proper code and data repositories. The next step is to train and tune the ML models in a controlled and automated environment using the algorithms, data, and training scripts that were created by the data scientists. This controlled and...

Adopting MLOps for ML workflows

Similar to the DevOps practice, which has been widely adopted for the traditional software development and deployment process, the MLOps practice is intended to streamline the building and deployment processes of ML pipelines while enhancing the collaborations between data scientists/ML engineers, data engineering, and the operations team. Specifically, the primary objective of MLOps practice is to yield the following main benefits throughout the entire ML lifecycle:

  • Process consistency: The MLOps practice aims to create consistency in the ML model-building and deployment process. A consistent process improves the efficiency of the ML workflow and ensures a high degree of certainty in the input and output of the ML workflow.
  • Tooling and process reusability: One of the core objectives of the MLOps practice is to create reusable technology tooling and templates for faster adoption and deployment of new ML use cases. These can include...

Best practices in building and operating an ML platform

Constructing an enterprise ML platform is a multifaceted undertaking. It often requires significant time, with organizations taking six months or more to implement the initial phase of their ML platform. Continuous efforts are needed to incorporate new functionalities and enhancements for many years to come. Onboarding users and ML projects onto the new platform is another demanding aspect, involving extensive education for the user base and providing direct technical support.

In some cases, platform adjustments might be necessary to ensure smooth onboarding and successful utilization. Having collaborated with many customers in building their enterprise ML platform, I have identified some best practices for the construction and adoption of an ML platform.

ML platform project execution best practices

  • Assemble cross-functional teams: Bring together data engineers, ML researchers, DevOps engineers, application...

Summary

In this chapter, we explored the key requirements and best practices for building an enterprise ML platform. We discussed how to design a platform that supports the end-to-end ML lifecycle, process automation, and separation of environments. Architectural patterns were reviewed, including how to leverage AWS services to build a robust ML platform on the cloud.

The core capabilities of different ML environments were covered, such as training, hosting, and shared services. Best practices around platform design, operations, governance, and integration were also discussed. You should now have a solid understanding of what an enterprise-grade ML platform entails and key considerations for building one on AWS leveraging proven patterns.

In the next chapter, we will dive deeper into advanced ML engineering topics. This includes distributed training techniques to scale model development and low-latency serving methods for optimizing inference.

Join our community on Discord...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
The Machine Learning Solutions Architect Handbook - Second Edition
Published in: Apr 2024Publisher: PacktISBN-13: 9781805122500
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
David Ping

David Ping is an accomplished author and industry expert with over 28 years of experience in the field of data science and technology. He currently serves as the leader of a team of highly skilled data scientists and AI/ML solutions architects at AWS. In this role, he assists organizations worldwide in designing and implementing impactful AI/ML solutions to drive business success. David's extensive expertise spans a range of technical domains, including data science, ML solution and platform design, data management, AI risk, and AI governance. Prior to joining AWS, David held positions in renowned organizations such as JPMorgan, Credit Suisse, and Intel Corporation, where he contributed to the advancements of science and technology through engineering and leadership roles. With his wealth of experience and diverse skill set, David brings a unique perspective and invaluable insights to the field of AI/ML.
Read more about David Ping