Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Database Design and Modeling with Google Cloud

You're reading from  Database Design and Modeling with Google Cloud

Product type Book
Published in Dec 2023
Publisher Packt
ISBN-13 9781804611456
Pages 234 pages
Edition 1st Edition
Languages
Concepts
Author (1):
Abirami Sukumaran Abirami Sukumaran
Profile icon Abirami Sukumaran

Table of Contents (18) Chapters

Preface Part 1:Database Model: Business and Technical Design Considerations
Chapter 1: Data, Databases, and Design Chapter 2: Handling Data on the Cloud Part 2:Structured Data
Chapter 3: Database Modeling for Structured Data Chapter 4: Setting Up a Fully Managed RDBMS Chapter 5: Designing an Analytical Data Warehouse Part 3:Semi-Structured, Unstructured Data, and NoSQL Design
Chapter 6: Designing for Semi-Structured Data Chapter 7: Unstructured Data Management Part 4:DevOps and Databases
Chapter 8: DevOps and Databases Part 5:Data to AI
Chapter 9: Data to AI – Modeling Your Databases for Analytics and ML Chapter 10: Looking Ahead – Designing for LLM Applications Index Other Books You May Enjoy

Looking Ahead – Designing for LLM Applications

Imagine a digital solution that can comprehend, generate, and manipulate human language with great fluency and accuracy that has expanded the world of natural language processing (NLP) and artificial intelligence (AI): enter large language models (LLMs), where algorithms have evolved to transform the way we interact with systems utilizing the power of language.

So, what are these LLMs? An LLM is a type of AI model that can generate text, translate from and to languages, create different formats of content, and answer your questions informatively. It is trained on a massive amount of data, and it can learn the statistical relationships between words and phrases. This allows it to generate text that is similar to the text it was trained on. LLMs are still under development, but they have the potential to revolutionize the way we interact with computers.

In this final chapter, we will set the stage for data modeling for LLM applications...

Capturing the evolution of LLMs

LLMs are not mere algorithms; they are innovations, fueled by decades of research and breakthroughs. From their humble origins to today’s awe-inspiring models, LLMs have surpassed expectations, with unparalleled computational power and vast datasets. They are gateways to a new era of language processing, enabling machines to comprehend, generate, and manipulate text like never before. LLMs have evolved but they started as early as the 1950s.

The following diagram takes us through a high-level journey regarding the evolution of LLMs:

Figure 10.1: Evolution of LLMs

Figure 10.1: Evolution of LLMs

Some key research contributions have shaped the evolution of LLMs, such as the backpropagation algorithm (for training neural networks), the transformer architecture (for deriving context and meaning in sequential data), and the development of really large datasets. The evolution of LLMs is an ongoing process. As LLMs become more powerful, they will be...

Getting started with LLMs

Throughout this chapter, we will cover components and terminologies around LLMs and concepts that are crucial for data modeling for LLM-based applications. However, the detailed architecture involved in creating LLM-based applications is outside the scope of this chapter. Here is an overview of the architecture and functioning of LLMs, which are typically composed of three main components:

  • Encoder: The encoder is responsible for converting the input text into a sequence of numbers. This is done by representing each word in the input text as a vector of numbers.
  • Decoder: The decoder is responsible for generating the output text from the sequence of numbers. This is done by predicting the next word in the output text, given the previous words.
  • Transformer: The transformer is a neural network that is used to train the encoder and decoder. It can learn long-range dependencies between words.

To give a high-level summary, LLMs work by learning...

Comparing real-world applications of LLMs and traditional analytics

To understand the applications of LLMs in the real world, let’s do a comparative study of the applications of LLMs with traditional analytics systems.

Here are some examples of traditional analytical applications:

  • Customer segmentation is the process of dividing customers into groups based on their shared characteristics. This can be done to target marketing campaigns or to develop new products and services.
  • Risk assessment is the process of identifying and assessing the potential risks to an organization. This can be done to develop mitigation strategies or to make informed decisions.
  • Fraud detection is the process of identifying and preventing fraudulent transactions. This is implemented to protect users and reduce financial losses.

Now, let’s discuss some real-world LLM-based applications:

  • Chatbots are computer programs that can simulate conversations with humans....

Understanding the differences in data modeling for traditional analytics and LLMs

Data modeling for traditional analytical applications focuses on creating models that can be used to understand and predict trends in data. This type of modeling typically involves creating tables and relationships between tables to represent the data in a way that is easy to understand and query. Data modeling for LLM-based applications, on the other hand, focuses on preparing data for applications that can be used to generate text, translate languages, answer questions, and create different kinds of content.

There are some key differences between data modeling for traditional analytical applications and data modeling for applications that utilize LLMs, as depicted in Table 10.1:

Data model design considerations for applications that use LLMs

The best-suited data modeling techniques and principles for data model design for applications that use LLMs will vary depending on the specific application. However, some general considerations include (but are not limited to) the following:

  • The type of data that will be used: The data modeling technique that is chosen will need to be able to represent the different types of data that will be used in the application.
  • The scalability of the app: The data modeling technique that is chosen will need to be able to scale as the app grows. For example, if the app is expected to have a large number of users and growing attributes, then a NoSQL database may be a better choice than a relational database.
  • Data security and privacy: The data modeling technique that is chosen will need to be able to protect the data from unauthorized access. For example, the data may need to be encrypted or stored in a secure location...

Learning about data modeling principles and techniques

Data modeling techniques don’t just help you with technology choices and frameworks – they also enable you to prepare the premises for the industry-specific use case that you are going to address with the dataset. Some data modeling techniques and principles can be effective in maximizing the potential of using LLMs in such applications while modeling the data for them:

  • Data quality and preprocessing: Ensure data quality by performing rigorous preprocessing steps, including data cleaning, normalization, and deduplication. High-quality data improves the performance and reliability of LLMs and prevents them from learning spurious patterns.
  • Fine-tuning: Leverage pre-trained LLMs as a starting point and fine-tune them on domain-specific or task-specific data. Fine-tuning allows the model to adapt and specialize for specific applications, reducing the need for extensive training from scratch.
  • Data augmentation...

Ethical and responsible practices

In today’s world, where AI plays a pivotal role, it’s crucial to incorporate ethical practices to ensure that businesses and developers use LLMs in a way that benefits society and minimizes harm. Ethics in AI involves making responsible and morally sound choices when developing and deploying AI systems. In this section, we will discuss some of the core ethical and responsible data model design considerations for applications that use LLMs:

  • Ensure that data that’s used for training LLMs is collected and stored following ethical and legal guidelines, respecting user privacy rights. Implement robust security measures to protect sensitive data from unauthorized access:
    • It is imperative to obtain informed consent from users when collecting their data, clearly stating how their information will be used. Transparency in data collection practices builds trust with users and safeguards their privacy.
    • Regularly update data handling...

Hands-on time – building an LLM application

All the databases we have discussed in this book so far support Generative AI in some form or the other. Either store, manage, and process the data that you end up using for your LLM application or provide remote methods and APIs that directly support LLMs on your data. There are also vector databases. I have explained these briefly toward the end of this chapter.

In this section, we will take a hands-on approach to creating an LLM application in BigQuery with BigQuery Machine Learning (BQML) only using SQL queries, directly on the data stored in a BigQuery table.

Note:

Please be advised that certain features and services described in the following sections may have undergone modifications since the time of drafting. The screenshots may look different from what you see in the book. APIs and versions may have been updated by the time you are reading this. As such, kindly exercise flexibility and adapt your steps accordingly...

Vector databases

As I promised, let me brief you a little about vector databases. I have taken the approach of explaining it to a 10-year-old; you can skip to the last paragraph of this section if you are not comfortable with this explanation.

Imagine that you have a big box of toys. Each toy has many different features, such as its shape, color, size, and material. You could describe each toy in words, but that would be very time-consuming and difficult to search through. A vector database is a way to represent each toy as a set of numbers, called a vector. Each number in the vector represents a different feature of the toy.

For example, the first number might represent the toy’s shape, the second number might represent its color, and so on. Vector databases are very efficient for searching. For example, you could search for all the toys that are red and ball-shaped. The vector database would simply compare the vectors of all of the toys to find the ones that match your...

Summary

In this chapter, we walked through the phenomenon that has taken the world by storm – LLMs. We covered some fundamental topics around the evolution, basics, and principles of LLMs, the differences between data model design for good old analytical applications, and looked at LLM applications with some real-world use cases. Then, we discussed some data model considerations, techniques, best practices, and ethical considerations for data modeling design for LLM use cases. We also extended our learning to a simple hands-on LLM application development exercise.

LLMs and Generative AI services such as BigQuery and Vertex AI open up countless opportunities and potential business insights across industries by enabling us to work with diverse data formats and sources. I hope this chapter was able to equip you with a foundational insight into the world of LLMs and their applications while helping you prepare your data and design to handle the scaling use cases, challenges,...

Onward and upward!

I hope you enjoyed learning about cloud database design and modeling, hands-on, through these chapters. Try to apply this learning and awareness to projects at work, learning, or business applications. If you end up getting ground-breaking ideas or solving complex day-to-day data problems with ideas and concepts you’ve learned, feel free to reach out to me on my socials at https://abirami.dev. I would be thrilled to feature your experience in some of our developer community programs. You can learn about this at https://codevipassana.dev.

lock icon The rest of the chapter is locked
You have been reading a chapter from
Database Design and Modeling with Google Cloud
Published in: Dec 2023 Publisher: Packt ISBN-13: 9781804611456
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}

Considerations

Traditional Analytical Applications

LLM Applications

Data structure...