Reader small image

You're reading from  The Machine Learning Solutions Architect Handbook - Second Edition

Product typeBook
Published inApr 2024
PublisherPackt
ISBN-139781805122500
Edition2nd Edition
Right arrow
Author (1)
David Ping
David Ping
author image
David Ping

David Ping is an accomplished author and industry expert with over 28 years of experience in the field of data science and technology. He currently serves as the leader of a team of highly skilled data scientists and AI/ML solutions architects at AWS. In this role, he assists organizations worldwide in designing and implementing impactful AI/ML solutions to drive business success. David's extensive expertise spans a range of technical domains, including data science, ML solution and platform design, data management, AI risk, and AI governance. Prior to joining AWS, David held positions in renowned organizations such as JPMorgan, Credit Suisse, and Intel Corporation, where he contributed to the advancements of science and technology through engineering and leadership roles. With his wealth of experience and diverse skill set, David brings a unique perspective and invaluable insights to the field of AI/ML.
Read more about David Ping

Right arrow

Designing Generative AI Platforms and Solutions

Deploying generative AI at scale in an enterprise introduces new complexities around infrastructure, tooling, and operational processes required to harness its potential while managing risks. This chapter explores the essential components for building robust generative AI platforms and examines Retrieval-Augmented Generation (RAG), an effective architecture pattern for generative applications. Additionally, we highlight near-term generative AI solution opportunities ripe for business adoption across industries. With the right platform foundations and a pragmatic approach focused on delivering tangible value, enterprises can start realizing benefits from generative AI today while paving the way for increasing innovation as this technology matures. Readers will gain insight into the practical building blocks and strategies that accelerate generative AI adoption.

Specifically, this chapter is going to cover the following topics:

...

Operational considerations for generative AI platforms and solutions

In an enterprise, deploying generative AI solutions at scale requires robust infrastructure, tools, and operations. Organizations should contemplate establishing a dedicated generative AI platform to meet these evolving project demands.

Architecturally and operationally, a generative AI platform builds on top of an ML platform with additional new and enhanced technology infrastructure for large-scale model training, large model hosting, model evaluation, guardrails, and model monitoring. As such, the core operation and automation requirements for a generative AI platform are similar to those of a traditional MLOps practice. However, the unique aspects of generative AI projects such as model selection, model tuning, and integration with external data sources, require several new process workflows to be established, and as a result, new technology components need to be incorporated into the operation and automation...

The retrieval-augmented generation pattern

Foundation models are frozen in time and limited to the knowledge they were trained on, lacking access to an organization’s private data or changing public domain information. To enhance the accuracy of responses, especially when using proprietary or up-to-date data, we require a mechanism to integrate external information into the model’s response generation process.

This is where retrieval-augmented generation (RAG) can step in. RAG is a new architecture pattern introduced to support generative AI-based solutions such as enterprise knowledge search and document question answering where external data sources are required. There are two main stages to RAG:

  1. The indexing stage for preparing a knowledge base with data ingestion and indexes.
  2. The query stage for retrieving relevant context from the knowledge base and passing it to the LLM to generate a response.

Architecturally, RAG architecture consists...

Choosing an LLM adaptation method

We have covered various LLM adaptation methods, including prompt engineering, domain adaptation pre-training, fine-tuning, and RAG. All these methods are intended to get better responses from the pre-trained LLMs. With all these options, it leaves one wondering: how do we choose which method to use?

Let’s break down some of the considerations when choosing these different methods.

Response quality

Response quality measures how accurately the LLM response is aligned with the intent of the user queries. The evaluation of response quality can be intricate for different use cases, as there are different considerations for evaluating response quality, such as knowledge domain affinity, task accuracy, up-to-date data, source data transparency, and hallucination.

For knowledge domain affinity, domain adaptation pre-training can be used to effectively teach LLM domain-specific knowledge and terminology. RAG is efficient in retrieving...

Bringing it all together

Having delved into the various technical components separately within the generative AI technical stack, let’s now consolidate them into a unified perspective.

Figure 16.8: Generative AI tech stack

In summary, a generative AI platform is an extension of an ML platform by introducing additional capabilities such as prompt management, input/output filtering, and tools for FM evaluation and RLHF workflows. To accommodate these enhancements, the ML platform’s pipeline capability will need to include new generative AI workflows. The new RAG infrastructure will form the foundational backbone of RAG-based LLM applications and will be closely integrated with the underlying generative AI platform.

The development of generative AI applications will continue to leverage other core application architecture components, including streaming, batch processing, message queuing, and workflow tools.

Although many of the core components will...

Considerations for deploying generative AI applications in production

Deploying generative AI applications in production environments introduces a new set of challenges that go beyond the considerations for traditional software and ML deployments. While aspects such as functional correctness, system/application security, security scan of artifacts such as model files and code, infrastructure scalability, documentation, and operational readiness (e.g., observability, change management, incident management, and audit) remain essential, there are additional factors to consider when deploying generative AI models.

The following are some of the key additional considerations when deciding on the production deployment of generative AI applications.

Model readiness

When deciding whether a generative AI model is ready for production deployment, the focus should be on its accuracy for the target use cases. These models can solve a wide range of problems, but attempting to test...

Practical generative AI business solutions

In the previous chapter, we talked about the business potential of generative AI and potential use cases in various industries. We then followed that with a detailed discussion of the lifecycle of a generative project from business use case identification to deployment. In this chapter, we have covered operational considerations, building enterprise generative AI platforms, and one of the most important architecture patterns for building generative AI applications, RAG.

In this section, we will highlight some of the more practical generative AI solution opportunities ready for business adoption in the near term. While research continues on aspirational applications, prudent enterprises should evaluate proven pilot use cases to drive measurable impact from generative AI’s rapid advances. With these examples, we will present the recommended approach to identify generative AI opportunities by understanding challenges associated with...

Are we close to having artificial general intelligence?

Artificial General Intelligence (AGI) is a field within theoretical AI research working to create AI systems with cognitive functions comparable to human capabilities. AGI remains a theoretical concept that’s not well defined, and its definition and opinions on its eventual realization vary. Nevertheless, loosely speaking, AGI involves AI systems/agents equipped with a broad capacity to understand and learn across many diverse domains and address diverse problems in various contexts, not just narrow expertise in one field. These systems should have the ability to generalize the knowledge they gain, transfer learning from one domain, and apply knowledge and skills to novel situations and problems like humans do.

The impressive capabilities displayed by LLMs and diffusion models have generated a lot of excitement about the potential to achieve AGI. Their ability to perform reasonably well across a wide variety of natural...

Summary

We are now coming to the end of this book spanning the breadth of machine learning – from foundational concepts to cutting-edge generative AI. We started the book by covering core ML techniques, algorithms, and industry applications to provide a strong base. We then progressed to data architectures, ML tools like TensorFlow and PyTorch, and engineering best practices to put skills into practice. Architecting robust ML infrastructure on AWS and optimization methods prepared you for real-world systems.

Securing and governing AI responsibly is critical, so we delved into risk management. To guide organizations on the ML journey, we discussed maturity models and evolutionary steps.

Closing the chapter by looking at generative AI and AGI, we explored the immense possibilities of the most disruptive new capability currently. Specifically, we delved into the intricacies of generative AI platforms, RAG architecture, and considerations for generative AI production deployment...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
The Machine Learning Solutions Architect Handbook - Second Edition
Published in: Apr 2024Publisher: PacktISBN-13: 9781805122500
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
David Ping

David Ping is an accomplished author and industry expert with over 28 years of experience in the field of data science and technology. He currently serves as the leader of a team of highly skilled data scientists and AI/ML solutions architects at AWS. In this role, he assists organizations worldwide in designing and implementing impactful AI/ML solutions to drive business success. David's extensive expertise spans a range of technical domains, including data science, ML solution and platform design, data management, AI risk, and AI governance. Prior to joining AWS, David held positions in renowned organizations such as JPMorgan, Credit Suisse, and Intel Corporation, where he contributed to the advancements of science and technology through engineering and leadership roles. With his wealth of experience and diverse skill set, David brings a unique perspective and invaluable insights to the field of AI/ML.
Read more about David Ping