You're reading from Database Design and Modeling with Google Cloud

Product type Book

Published in Dec 2023

Publisher Packt

ISBN-13 9781804611456

Pages 234 pages

Edition 1st Edition

Languages

Concepts

Databases

Author (1):

Abirami Sukumaran

Table of Contents (18) Chapters

Preface

Part 1:Database Model: Business and Technical Design Considerations

Chapter 1: Data, Databases, and Design

Chapter 2: Handling Data on the Cloud

Part 2:Structured Data

Chapter 3: Database Modeling for Structured Data

Chapter 4: Setting Up a Fully Managed RDBMS

Chapter 5: Designing an Analytical Data Warehouse

Part 3:Semi-Structured, Unstructured Data, and NoSQL Design

Chapter 6: Designing for Semi-Structured Data

Chapter 7: Unstructured Data Management

Part 4:DevOps and Databases

Chapter 8: DevOps and Databases

Part 5:Data to AI

Chapter 9: Data to AI – Modeling Your Databases for Analytics and ML

Chapter 10: Looking Ahead – Designing for LLM Applications

Index

Why subscribe?

Other Books You May Enjoy

Looking Ahead – Designing for LLM Applications

Imagine a digital solution that can comprehend, generate, and manipulate human language with great fluency and accuracy that has expanded the world of natural language processing (NLP) and artificial intelligence (AI): enter large language models (LLMs), where algorithms have evolved to transform the way we interact with systems utilizing the power of language.

So, what are these LLMs? An LLM is a type of AI model that can generate text, translate from and to languages, create different formats of content, and answer your questions informatively. It is trained on a massive amount of data, and it can learn the statistical relationships between words and phrases. This allows it to generate text that is similar to the text it was trained on. LLMs are still under development, but they have the potential to revolutionize the way we interact with computers.

In this final chapter, we will set the stage for data modeling for LLM applications...

Capturing the evolution of LLMs

LLMs are not mere algorithms; they are innovations, fueled by decades of research and breakthroughs. From their humble origins to today’s awe-inspiring models, LLMs have surpassed expectations, with unparalleled computational power and vast datasets. They are gateways to a new era of language processing, enabling machines to comprehend, generate, and manipulate text like never before. LLMs have evolved but they started as early as the 1950s.

The following diagram takes us through a high-level journey regarding the evolution of LLMs:

Figure 10.1: Evolution of LLMs

Some key research contributions have shaped the evolution of LLMs, such as the backpropagation algorithm (for training neural networks), the transformer architecture (for deriving context and meaning in sequential data), and the development of really large datasets. The evolution of LLMs is an ongoing process. As LLMs become more powerful, they will be...

Getting started with LLMs

Throughout this chapter, we will cover components and terminologies around LLMs and concepts that are crucial for data modeling for LLM-based applications. However, the detailed architecture involved in creating LLM-based applications is outside the scope of this chapter. Here is an overview of the architecture and functioning of LLMs, which are typically composed of three main components:

Encoder: The encoder is responsible for converting the input text into a sequence of numbers. This is done by representing each word in the input text as a vector of numbers.
Decoder: The decoder is responsible for generating the output text from the sequence of numbers. This is done by predicting the next word in the output text, given the previous words.
Transformer: The transformer is a neural network that is used to train the encoder and decoder. It can learn long-range dependencies between words.

To give a high-level summary, LLMs work by learning...

Comparing real-world applications of LLMs and traditional analytics

To understand the applications of LLMs in the real world, let’s do a comparative study of the applications of LLMs with traditional analytics systems.

Here are some examples of traditional analytical applications:

Customer segmentation is the process of dividing customers into groups based on their shared characteristics. This can be done to target marketing campaigns or to develop new products and services.
Risk assessment is the process of identifying and assessing the potential risks to an organization. This can be done to develop mitigation strategies or to make informed decisions.
Fraud detection is the process of identifying and preventing fraudulent transactions. This is implemented to protect users and reduce financial losses.

Now, let’s discuss some real-world LLM-based applications:

Chatbots are computer programs that can simulate conversations with humans....

Understanding the differences in data modeling for traditional analytics and LLMs

Data modeling for traditional analytical applications focuses on creating models that can be used to understand and predict trends in data. This type of modeling typically involves creating tables and relationships between tables to represent the data in a way that is easy to understand and query. Data modeling for LLM-based applications, on the other hand, focuses on preparing data for applications that can be used to generate text, translate languages, answer questions, and create different kinds of content.

There are some key differences between data modeling for traditional analytical applications and data modeling for applications that utilize LLMs, as depicted in Table 10.1:

Data model design considerations for applications that use LLMs

The best-suited data modeling techniques and principles for data model design for applications that use LLMs will vary depending on the specific application. However, some general considerations include (but are not limited to) the following:

The type of data that will be used: The data modeling technique that is chosen will need to be able to represent the different types of data that will be used in the application.
The scalability of the app: The data modeling technique that is chosen will need to be able to scale as the app grows. For example, if the app is expected to have a large number of users and growing attributes, then a NoSQL database may be a better choice than a relational database.
Data security and privacy: The data modeling technique that is chosen will need to be able to protect the data from unauthorized access. For example, the data may need to be encrypted or stored in a secure location...

Learning about data modeling principles and techniques

Data modeling techniques don’t just help you with technology choices and frameworks – they also enable you to prepare the premises for the industry-specific use case that you are going to address with the dataset. Some data modeling techniques and principles can be effective in maximizing the potential of using LLMs in such applications while modeling the data for them:

Data quality and preprocessing: Ensure data quality by performing rigorous preprocessing steps, including data cleaning, normalization, and deduplication. High-quality data improves the performance and reliability of LLMs and prevents them from learning spurious patterns.
Fine-tuning: Leverage pre-trained LLMs as a starting point and fine-tune them on domain-specific or task-specific data. Fine-tuning allows the model to adapt and specialize for specific applications, reducing the need for extensive training from scratch.
Data augmentation...

Ethical and responsible practices

In today’s world, where AI plays a pivotal role, it’s crucial to incorporate ethical practices to ensure that businesses and developers use LLMs in a way that benefits society and minimizes harm. Ethics in AI involves making responsible and morally sound choices when developing and deploying AI systems. In this section, we will discuss some of the core ethical and responsible data model design considerations for applications that use LLMs:

Ensure that data that’s used for training LLMs is collected and stored following ethical and legal guidelines, respecting user privacy rights. Implement robust security measures to protect sensitive data from unauthorized access:
- It is imperative to obtain informed consent from users when collecting their data, clearly stating how their information will be used. Transparency in data collection practices builds trust with users and safeguards their privacy.
- Regularly update data handling...

Hands-on time – building an LLM application

All the databases we have discussed in this book so far support Generative AI in some form or the other. Either store, manage, and process the data that you end up using for your LLM application or provide remote methods and APIs that directly support LLMs on your data. There are also vector databases. I have explained these briefly toward the end of this chapter.

In this section, we will take a hands-on approach to creating an LLM application in BigQuery with BigQuery Machine Learning (BQML) only using SQL queries, directly on the data stored in a BigQuery table.

Note:

Please be advised that certain features and services described in the following sections may have undergone modifications since the time of drafting. The screenshots may look different from what you see in the book. APIs and versions may have been updated by the time you are reading this. As such, kindly exercise flexibility and adapt your steps accordingly...

Vector databases

As I promised, let me brief you a little about vector databases. I have taken the approach of explaining it to a 10-year-old; you can skip to the last paragraph of this section if you are not comfortable with this explanation.

Imagine that you have a big box of toys. Each toy has many different features, such as its shape, color, size, and material. You could describe each toy in words, but that would be very time-consuming and difficult to search through. A vector database is a way to represent each toy as a set of numbers, called a vector. Each number in the vector represents a different feature of the toy.

For example, the first number might represent the toy’s shape, the second number might represent its color, and so on. Vector databases are very efficient for searching. For example, you could search for all the toys that are red and ball-shaped. The vector database would simply compare the vectors of all of the toys to find the ones that match your...

Summary

In this chapter, we walked through the phenomenon that has taken the world by storm – LLMs. We covered some fundamental topics around the evolution, basics, and principles of LLMs, the differences between data model design for good old analytical applications, and looked at LLM applications with some real-world use cases. Then, we discussed some data model considerations, techniques, best practices, and ethical considerations for data modeling design for LLM use cases. We also extended our learning to a simple hands-on LLM application development exercise.

LLMs and Generative AI services such as BigQuery and Vertex AI open up countless opportunities and potential business insights across industries by enabling us to work with diverse data formats and sources. I hope this chapter was able to equip you with a foundational insight into the world of LLMs and their applications while helping you prepare your data and design to handle the scaling use cases, challenges,...

Onward and upward!

I hope you enjoyed learning about cloud database design and modeling, hands-on, through these chapters. Try to apply this learning and awareness to projects at work, learning, or business applications. If you end up getting ground-breaking ideas or solving complex day-to-day data problems with ideas and concepts you’ve learned, feel free to reach out to me on my socials at https://abirami.dev. I would be thrilled to feature your experience in some of our developer community programs. You can learn about this at https://codevipassana.dev.

The rest of the chapter is locked

You have been reading a chapter from

Database Design and Modeling with Google Cloud

Published in: Dec 2023 Publisher: Packt ISBN-13: 9781804611456

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime}

Authors (1)

Abirami Sukumaran

Abirami Sukumaran is a lead developer advocate at Google, focusing on databases and data to AI journey with Google Cloud. She has over 17 years of experience in data management, data governance, and analytics across several industries in various roles from engineering to leadership, and has 3 patents filed in the data area. She believes in driving social and business impact with technology. She is also an international keynote, tech panel, and motivational speaker, including key events like Google I/O, Cloud NEXT, MLDS, GDS, Huddle Global, India Startup Festival, Women Developers Academy, and so on. She founded Code Vipassana, an award-winning, non-profit, tech-enablement program powered by Google and she runs with the support of Google Developer Communities GDG Cloud Kochi, Chennai, Mumbai, and a few developer leads. She is pursuing her doctoral research in business administration with artificial intelligence, is a certified Yoga instructor, practitioner, and an Indian above everything else.

See other products by Abirami Sukumaran

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

Aug 2023 7 hours 40 minutes

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Dec 2023 12 hours 0 minutes

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Dec 2023 12 hours 0 minutes

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Dec 2023 12 hours 0 minutes

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Dec 2023 12 hours 0 minutes

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

Aug 2023 22 hours 48 minutes

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

Sep 2023 8 hours 36 minutes

Building AI Applications with ChatGPT APIs

Sep 2023 8 hours 36 minutes

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

Oct 2023 21 hours 12 minutes

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

Aug 2023 14 hours 0 minutes

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

Dec 2023 8 hours 0 minutes

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

Nov 2023 22 hours 8 minutes

Considerations	Traditional Analytical Applications	LLM Applications
Data structure...