Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Events
Videos
Audiobooks
Packt Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials - LLM

79 Articles
article-image-using-langchain-for-large-language-model-powered-applications
Avratanu Biswas
15 Jun 2023
5 min read
Save for later

Using LangChain for Large Language Model — Powered Applications

Avratanu Biswas
15 Jun 2023
5 min read
This article is the second part of a series of articles, please refer to Part 2 for learning how to Get to grips with LangChain framework and how to utilize it for building LLM-powered AppsIntroductionLangChain is a powerful and open-source Python library specifically designed to enhance the usability, accessibility, and versatility of Large Language Models (LLMs) such as GPT-3 (Generative Pre-trained Transformer 3), BERT(Bidirectional Encoder Representations from Transformers), BLOOM (BigScience Large Open-science Open-access Multilingual Language Model). It provides developers with a comprehensive set of tools to seamlessly combine multiple prompts, creating a harmonious orchestra for working with LLMs effortlessly. The project was initiated by Harrison Chase, with the first commit made in late October 2022. In just a few weeks, LangChain gained immense popularity within the open-source community. Image 1: The popularity of the LangChain Python libraryLangChain for LLMsTo fully grasp the fundamentals of LangChain and utilize it effectively — understanding the fundamentals of LLMs is essential. In simple terms, LLMs are sophisticated language models or AI systems that have been extensively trained on massive amounts of text data to comprehend and generate human-like language. Albeit their powerful capabilities, LLMs are generic in nature i.e. lacking domain-specific knowledge or expertise. For instance, when addressing queries in fields like medicine or law, while an LLM can provide general insights, it, however, may struggle to offer in-depth or nuanced responses that require specialized expertise. Alongside such limitations, LLMs are susceptible to biases and inaccuracies present in training data which can yield contextually plausible, yet incorrect outputs. This is where LangChain shines — serving as an open-source library that leverages the power of LLMs and mitigates their drawbacks by providing abstractions and a diverse range of modules, akin to Lego blocks, thus facilitating intuitive integration with other tools and knowledge bases.In brief, LangChain presents a useful approach for handling text data, wherein the initial step involves preprocessing of the large corpus by segmenting it into smaller chunks or summaries. These chunks are then transformed into vector representations, enabling efficient comparisons and retrieval of similar chunks when questions are posed. This approach of preprocessing, real-time data collection, and interaction with the LLM is not only applicable to the specific context but can also be effectively utilized in other scenarios like code and semantic search.Image 2 - Typical workflow of Langchain ( Image created by Author)A typical workflow of LangChain involves several steps that enable efficient interaction between the user, the preprocessed text corpus, and the LLM. Notably, the strengths of LangChain lie in its provision of an abstraction layer, streamlining the intricate process of composing and integrating these text components, thereby enhancing overall efficiency and effectiveness.Key Attributes offered by LangChainThe core concept behind LangChain is its ability to connect a “Chain of thoughts” around LLMs, as evident from its name. However, LangChain is not limited to just a few LLMs —  it provides a wide range of components that work together as building blocks for advanced use cases involving LLMs. Now, let’s delve into the various components that the LangChain library offers, making our work with LLMs easier and more efficient.Image 3:  LangChain features at a glance. (Image created by Author)Prompts and Prompt Templates: Prompts refer to the inputs or queries we send to LLMs. As we have experienced with ChatGPT, the quality of the response depends heavily on the prompt. LangChain provides several functionalities to simplify the construction and handling of prompts. A prompt template consists of multiple parts, including instructions, content, and queries.Models: While LangChain itself does not provide LLMs, it leverages various Language Models (such as GPT3 and BLOOM, discussed earlier), Chat Models (like get-3.5-turbo), and Text Embedding Models (offered by CohereAI, HuggingFace, OpenAI).Chains: Chains are an end-to-end wrapper around multiple individual components, playing a major role in LangChain. The two most common types of chains are LLM chains and vector index chains.Memory: By default, Chains in LangChain are stateless, treating each incoming query or input independently without retaining context (i.e., lacking memory). To overcome this limitation, LangChain assists in both short-term memory (using previous conversational messages or summarised messages) and long-term memory (managing the retrieval and updating of information between conversations).Indexes: Index modules provide various document loaders to connect with different data resources and utility functions to seamlessly integrate with external vector databases like Pinecone, ChromoDB, and Weaviate, enabling smooth handling of large arrays of vector embeddings. The types of vector indexes include Document Loaders, Text Splitters, Retriever, and Vectorstore.Agents: While the sequence of chains is often deterministic, in certain applications, the sequence of calls may not be deterministic, with the next step depending on the user input and previous responses. Agents utilize LLMs to determine the appropriate actions and their orders. Agents perform these tasks using a suite of tools.Limitations on LangChain usageAbstraction challenge for debugging: The comprehensive abstraction provided by LangChain poses challenges for debugging as it becomes difficult to comprehend the underlying processes.Higher token consumption due to prompt coupling: Coupling a chain of prompts when executing multiple chains for a specific task often leads to higher token consumption, making it less cost-effective. Increased latency and slower performance: The latency period experienced when using LangChain in applications with agents or tools is higher, resulting in slower performance.Overall, LangChain provides a broad spectrum of features and modules that greatly enhance our interaction with LLMs. In the subsequent sections, we will explore the practical usage of LangChain and demonstrate how to build simple demo web applications using its capabilities.Referenceshttps://docs.langchain.com/docs/https://github.com/hwchase17/langchain https://medium.com/databutton/getting-started-with-langchain-a-powerful-tool-for-working-with-large-language-models-286419ba0842https://medium.com/@avra42/how-to-build-a-personalized-pdf-chat-bot-with-conversational-memory-965280c160f8AuthorAvratanu Biswas, Ph.D. Student ( Biophysics ), Educator, and Content Creator, ( Data Science, ML & AI ).Twitter    YouTube    Medium     GitHub
Read more
  • 0
  • 0
  • 19384

article-image-optimized-llm-serving-with-tgi
Louis Owen
25 Sep 2023
7 min read
Save for later

Optimized LLM Serving with TGI

Louis Owen
25 Sep 2023
7 min read
Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights and books. Don't miss out – sign up today!IntroductionThe release of LLaMA-2 LLM to the public has been a tremendous contribution by Meta to the open-source community. It’s very powerful that many developers are fine-tuning it for so many different use cases. Closed-source LLMs such as GPT3.5, GPT4, or Claude are really convenient to utilize since we just need to send API requests to power up our application. On the other hand, utilizing open-source LLMs such as LLaMA-2 in production is not an easy task. Although there are several algorithms that support deploying an LLM with CPU, most of the time we still need to utilize GPU for better throughput.At a high level, there are two main things that need to be taken care of when deploying your LLM in production: memory and throughput. LLM is very big in size, the smallest version of LLaMA-2 has 7 billion parameters, which requires around 28GB GPU RAM for loading the model. Google Colab offers free NVIDIA T4 GPU to be used with 16GB memory, meaning that we can’t load even the smallest version of LLaMA-2 freely using the GPU provided by Google Colab.There are two popular ways to solve this memory issue: half-precision and 4/8-bit quantization. Half-precision is a very simple optimization method that we can adopt. In PyTorch, we just need to add the following line and we can load the 7B model by using only around 13GB GPU RAM. Basically, half-precision means that we’re representing the weights of the model with 16-bit floating points (fp16) instead of 32-bit floating points (fp32).torch.set_default_tensor_type(torch.cuda.HalfTensor)Another way to solve the memory issue is by performing 4/8-bits quantization. There are two quantization algorithms that are widely adopted by the developers: GPT-Q and BitsandBytes. These two algorithms are also supported within the `transformers` package. Quantization is a way to reduce the model size by representing the model weights in lower precision data types, such as 8-bit integers (int8) or 4-bit integers (int4) instead of 32-bit floating point (fp32) or 16-bit (fp16).As for comparison, after applying the 4-bit quantization with the GPT-Q algorithm, we can load the 7B model by using only around 4GB GPU RAM! This is indeed a huge drop - from 28GB to 4GB!Once we’re able to load the model, the simplest way to serve the model is by using the native HuggingFace workflow along with Flask or FastAPI. However, serving our model with this simple solution is not scalable and reliable enough to be used in production since it can’t handle parallel requests decently. This is related to the throughput problem mentioned earlier. There are many ways to solve this throughput problem, starting from continuous batching, tensor parallelism, prompt caching, paged attention, prefill parallel computing, KV cache, and many more. Discussing each of the algorithms will result in a very long article. However, luckily there’s an inference serving library for LLM that applies all of these techniques to ensure the reliability of serving our LLM in production. This library is called Text Generation Inference (TGI).Throughout this article, we’ll discuss in more detail what TGI is and how to utilize it to serve your LLM. We’ll see what are the top features of TGI and the step-by-step tutorial on how to spin up TGI as the inference server.Without wasting any more time, let’s take a deep breath, make yourselves comfortable, and be ready to learn how to utilize TGI as an inference server!What’s TGI?Text Generation Inference (TGI) is like the `transformers` package for inference servers. In fact, TGI is used in production at HuggingFace to power many applications. It provides a lot of useful features:Provide a straightforward launcher for the most popular Large Language Models.Implement Tensor Parallelism to enhance inference speed across multiple GPUs.Enable token streaming through Server-Sent Events (SSE).Implement continuous batching of incoming requests to boost overall throughput.Enhance transformer code for inference using flash-attention and Paged Attention on leading architectures.Apply quantization techniques with bitsandbytes and GPT-Q.Ensure fast and safe persistence format of weight loading with Safetensors.Incorporate watermarking using A Watermark for Large Language Models.Integrate a logit warper for various operations like temperature scaling, top-p, top-k, and repetition penalty.Implement stop sequences and log probabilities.Ensure production readiness with distributed tracing through Open Telemetry and Prometheus metrics.Offer Custom Prompt Generation, allowing users to easily generate text by providing custom prompts for guiding the model's output.Provide support for fine-tuning, enabling the use of fine-tuned models for specific tasks to achieve higher accuracy and performance.Implement RoPE scaling to extend the context length during inferenceTGI is surely a one-stop solution for serving LLMs. However, one important note about TGI that you need to know is regarding the project license. Starting from v1.0, HuggingFace has updated the license for TGI using the HFOIL 1.0 license. For more details, please refer to this GitHub issue. If you’re using TGI only for research purposes, then there’s nothing to worry about. If you’re using TGI for commercial purposes, please make sure to read the license carefully since there are cases where it can and can’t be used for commercial purposes.How to Use TGI?Using TGI is relatively easy and straightforward. We can use Docker to spin up the server. Note that you need to install the NVIDIA Container Toolkit to use GPU.docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:1.0.3 --model-id $modelOnce the server is up, we can easily send API requests to the server using the following command.curl 127.0.0.1:8080/generate \    -X POST \    -d '{"inputs":"What is LLM?","parameters":{"max_new_tokens":120}}' \    -H 'Content-Type: application/json'For more details regarding the API documentation, please refer to this page. Another way to send API requests to the TGI server is by sending it from Python. To do that, we need to install the Python library firstpip install text-generationOnce the library is installed, we can send API requests like the following.from text_generation import Client client = Client("http://127.0.0.1:8080") print(client.generate("What is LLM?", max_new_tokens=120).generated_text) text = "" for response in client.generate_stream("What is Deep Learning?", max_new_tokens=20):    if not response.token.special:        text += response.token.text print(text)ConclusionCongratulations on keeping up to this point! Throughout this article, you have learned what are the important things to be considered when you’re trying to serve your own LLM. You have also learned about TGI, starting from what is it, what’s the features, and also step-by-step examples on how to get the TGI server up. Hope the best for your LLM deployment journey and see you in the next article!Author BioLouis Owen is a data scientist/AI engineer from Indonesia who is always hungry for new knowledge. Throughout his career journey, he has worked in various fields of industry, including NGOs, e-commerce, conversational AI, OTA, Smart City, and FinTech. Outside of work, he loves to spend his time helping data science enthusiasts to become data scientists, either through his articles or through mentoring sessions. He also loves to spend his spare time doing his hobbies: watching movies and conducting side projects.Currently, Louis is an NLP Research Engineer at Yellow.ai, the world’s leading CX automation platform. Check out Louis’ website to learn more about him! Lastly, if you have any queries or any topics to be discussed, please reach out to Louis via LinkedIn.
Read more
  • 0
  • 0
  • 18816

article-image-building-and-deploying-web-app-using-langchain
Avratanu Biswas
26 Jun 2023
12 min read
Save for later

Building and deploying Web App using LangChain

Avratanu Biswas
26 Jun 2023
12 min read
So far, we've explored the LangChain modules and how to use them (refer to the earlier blog post on LangChain Modules here). In this section, we'll focus on the LangChain Indexes and Agent module and also walk through the process of creating and launching a web application that everyone can access. To make things easier, we'll be using Databutton, an all-in-one online workspace to build and deploy web apps, integrated with Streamlit, a Python web- development framework known for its support in building interactive web applications.What are LangChain Agents?In simpler terms, LangChain Agents are tools that enable Large Language Models (LLMs) to perform various actions, such as accessing Google search, executing Python calculations, or making SQL queries, thereby empowering LLMs to make informed decisions and interact with users by using tools and observing their outputs. The official documentation of LangChain describes Agents as:" …there is an agent which has access to a suite of tools. Depending on the user input, the agent can then decide which, if any, of these tools to call… In building agents, there are several abstractions involved. The Agent abstraction contains the application logic, receiving user input and previous steps to return either an AgentAction (tool and input) or AgentFinish (completion information). Agent covers another aspect, called Tools, which represents the actions agents can take, while Toolkits group tools for specific use cases (e.g., SQL querying). Lastly, the Agent Executor manages the iterative execution of the agent with the available tools. Thus, in this section, we will briefly explore such abstractions while using the Agent functionality to integrate tools and primarily focus on building a real-world easily deployable web application.IndexesThis module provides utility functions for structuring documents using indexes and allowing LLMs to interact with them effectively. We will focus on one of the most commonly used retrieval systems, where indexes are used to find the most relevant documents based on a user's query. Additionally, LangChain supports various index and retrieval types, with a focus on vector databases for unstructured data. We will explore this component in detail as it can be leveraged in a wide number of real-world applications.Image 1 Langchain workflow by AuthorWorkflow of a question & answer generation interface using Retrieval index, where we leverage all types of Indexes which LangChain provides. Indexes are primarily of four types, namely : Document Loaders, Text Splitters, VectorStores, and Retrievers. Briefly, (a) the documents fetched from any datasource is split into chunks using text splitter modules (b) Embeddings are created (c)Stored over a vector store index ( vector databases such as chromadb / pinecone / weaviate, etc ) (d) Queries from the user is retrieved via retrieval QA chain We will use the  WikipediaLoader to load Wikipedia documents related to our query "LangChain" and retrieve the metadata and a portion of the page content of the first document.from langchain.document_loaders import WikipediaLoader docs = WikipediaLoader(query='LangChain', load_max_docs=2).load() docs[0].metadata docs[0].page_content[:400]CharacterTextSplitter is used to split the loaded documents into smaller chunks for further processing.from langchain.text_splitter import CharacterTextSplitter text_splitter = CharacterTextSplitter(chunk_size=4000, chunk_overlap=0) texts = text_splitter.split_documents(docs)The OpenAIEmbeddings the module is then employed to generate embeddings for the text chunks.from langchain.embeddings import OpenAIEmbeddings embeddings = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY)We will use Chroma vector store, which is created from the generated text chunks and embeddings, allowing for efficient storage and retrieval of vectorized data.Next, the RetrievalQA module is instantiated with an OpenAI LLM and the created retriever, setting up a question-answering system.from langchain.vectorstores import Chroma db = Chroma.from_documents(texts, embeddings) retriever = db.as_retriever() from langchain.chains import RetrievalQA from langchain.llms import OpenAI Qa  = RetrievalQA.from_chain_type(llm=OpenAI(openai_api_key=OPENAI_API_KEY), chain_type="stuff", retriever=retriever)At this stage, we can easily seek answers from the stored indexed data. For instance, query = "What is LangChain?" qa.run(query)LangChain is a framework designed to simplify the creation of applications using large language models (LLMs).query = "When was LangChain founded?" qa.run(query)LangChain was founded in October 2022.query = "When was LangChain founded?" qa.run(query)LangChain was founded in October 2022.query = "Who is the founder?" qa.run(query) The founder of LangChain is Harrison Chase.The Q&A functionality implemented using the retrieval chain provides reasonable answers to most of our queries. Different types of indexes provided by LangChain, can be leveraged for various real-world use cases involving data structuring and retrieval. Moving forward, we will delve into the next section, where we will focus on the final component called the "Agent." During this section, we will not only gain a hands-on understanding of its usage but also build and deploy a web app using an online workspace called Databutton.Building Web App using DatabuttonPrerequisitesTo begin using Databutton, all that is required is to sign up through their official website. Once logged in, we can either create a blank template app from scratch or choose from the pre-existing templates provided by Databutton.Image by Author | Screen grasp showing on how to start working with a new blank appOnce the blank app is created, we generate our online workspace consisting of several features for building and deploying the app. We can immediately begin writing our code within the online editor. The only requirement at this stage is to include the necessary packages or dependencies that our app requires.Image by Author | Screen grasp showing the different components available within the Databutton App's online workspace. Databutton's workspace initialization includes some essential packages by default. However, for our specific app, we need to add two additional packages - openai and langchain. This can be easily accomplished within the "configuration" workspace of Databutton.Image by Author | Screen grasp of the configuration options within Databutton's online workspace. Here we can add the additional packages which we need for working with our app. The workspace is generated with few pre-installed dependencies.Writing the codeNow that we have a basic understanding of Agents and their abstraction methods, let's put them to use, alongside incorporating some basic Streamlit syntax for the front end.Importing the required modules: For building the web app, we will require the Streamlit library and several LangChain modules. Additionally, we will utilise a helper function that relies on the sys and io libraries for capturing and displaying function outputs. We will discuss the significance of this helper function towards the end to better understand its purpose.# Modules to Import import streamlit as st import sys import io import re from typing import Callable, Any from langchain.agents.tools import Tool from langchain.agents import initialize_agent from langchain.llms import OpenAI from langchain.chains import LLMChain from langchain import LLMMathChain from langchain import PromptTemplateUsing the LangChain modules and building the main user interface: We set the title of the app using st.title() syntax and also enables the user to enter their OpenAI API key using the st.text_input() widget.# Set the title of the app st.title("LangChain `Agent` Module Usage Demo App") # Get the OpenAI API key from the user OPENAI_API_KEY = st.text_input( "Enter your OpenAI API Key to get started", type="password" )As we discussed in the previous sections, we need to define a template for the prompt that incorporates a placeholder for the user's query.# Define a template for the prompt template = """You are a friendly and polite AI Chat Assistant. You must try to provide accurate but concise answers. If you don't know the answer, just say "I don't know." Question: {query} Answer: """ # Create a prompt template object with the template prompt = PromptTemplate(template=template, input_variables=["query"])Next, we implement a conditional loop. If the user has provided an OpenAI API key, we proceed with the flow of the app. The user is asked to enter their query using the st.text_input() widget.# Check if the user has entered an OpenAI API key if OPENAI_API_KEY: # Get the user's query query = st.text_input("Ask me anything")Once the user has the correct API keys inserted, from this point onward, we will proceed with the implementation of LangChain modules. Some of these modules may be new to us, while others may have already been covered in our previous sections.Next, we create instances of the OpenAI language model, OpenAI, the LLMMathChain for maths-related queries, and the LLMChain for general-purpose queries.# Check if the user has entered a query if query: # Create an instance of the OpenAI language model llm = OpenAI(temperature=0, openai_api_key=OPENAI_API_KEY) # Create an instance of the LLMMathChain for math-related queries llm_math_chain = LLMMathChain.from_llm(llm=llm, verbose=True) # Create an instance of the LLMChain for general-purpose queries llm_chain = LLMChain(llm=llm, prompt=prompt)Following that, we create a list of tools that the agent will utilize. Each tool comprises a name, a corresponding function to handle the query, and a brief description.# Create a list of tools for the agent tools = [ Tool( name="Search", func=llm_chain.run, description="Useful for when you need to answer general purpose questions", ), Tool( name="Calculator", func=llm_math_chain.run, description="Useful for when you need to answer questions about math", ), ]Further, we need to initialize a zero-shot agent with the tools and other parameters. This agent employs the ReAct framework to determine which tool to utilize based solely on the description associated with each tool. It is essential to provide a description of each tool.# Initialize the zero-shot agent with the tools and parameters zero_shot_agent = initialize_agent( agent="zero-shot-react-description", tools=tools, llm=llm, verbose=True, max_iterations=3, ) And now finally, we can easily call the zero-shot agent with the user's query using the run(query) method.# st.write(zero_shot_agent.run(query))However, this would only yield the final outcome of the result within our Streamlit UI, without providing access to the underlying LangChain thought process (i.e. the verbose ) that we typically observe in a Notebook environment. This information is crucial to understand which tools our agent is opting for based on the user query. To address this, a helper function called capture_and_display_output was created.# Helper function to dump LangChain Verbose / Though Process # Function to capture and display the output of a function def capture_and_display_output(func: Callable[..., Any], args, **kwargs) -> Any: # Redirect stdout to a string buffer original_stdout = sys.stdout sys.stdout = output_catcher = io.StringIO() # Call the function and capture the response response = func(args, *kwargs) # Restore the original stdout and get the captured output sys.stdout = original_stdout output_text = output_catcher.getvalue() # Clean the output text by removing escape sequences cleaned_text = re.sub(r"\x1b\[([0-9]{1,2}(;[0-9]{1,2})?)?[m|K]", "", output_text) # Split the cleaned text into lines and concatenate them with line breaks lines = cleaned_text.split("\n") concatenated_string = "\n".join([s if s else "\n" for s in lines]) # Display the captured output in an expander with st.expander("Thoughts", expanded=True): st.write(concatenated_string)This function allows users to monitor the actions undertaken by the agent. Consequently, the response from the agent is displayed within the UI.# Call the zero-shot agent with the user's query and capture the output response = capture_and_display_output(zero_shot_agent.run, query) Image by Author | Screen grasp of the app in local deployment displays the entire verbose or rather the thought process  Deploy and Testing of the AppThe app can now be easily deployed by clicking the "Deploy" button on the top left-hand side. The deployed app will provide us with a unique URL that can be shared with everyone!Image by Author | Screen grasp of the Databutton online workspace showing the Deploy options. Yay! We have successfully built and deployed a LangChain-based web app from scratch. Here's the link to the app ! The app also consists of a view code page , which can be accessed via this link.To test the web app, we will employ two different types of prompts. One will be a general question that can be answered by any LLMs, while the other will be a maths-related question. Our hypothesis is that the LangChain agents will intelligently determine which agents to execute and provide the most appropriate response. Let's proceed with the testing to validate our assumption.Image by Author | Screen grasped from the deployed web app.Two different prompts were used to validate our assumptions. Based on the thought process ( displayed in the UI under the thoughts expander ), we can easily interpret which Tool has been chosen by the Agent. (Left) Usage of LLMMath chain incorporating Tool (Right) Usage of a simple LLM Chain incorporating Tool.ConclusionTo summarise, we have not only explored various aspects of working with LangChain and LLMs but have also successfully built and deployed a web app powered by LangChain. This demonstrates the versatility and capabilities of LangChain in enabling the development of powerful applications.ReferencesLangChain Agents official documentation : https://python.langchain.com/en/latest/modules/agents.htmlDatabutton : https://www.databutton.io/Streamlit :  https://streamlit.io/ Build a Personal Search Engine Web App using Open AI Text Embeddings : https://medium.com/@avra42/build-a-personal-search-engine-web-app-using-open-ai-text-embeddings-d6541f32892dPart 1: Using LangChain for Large Language Model — powered Applications: https://www.packtpub.com/article-hub/using-langchain-for-large-language-model-powered-applicationsDeployed Web app - https://databutton.com/v/23ks6sem Source code for the app - https://databutton.com/v/23ks6sem/View_CodeAuthor BioAvratanu Biswas, Ph.D. Student ( Biophysics ), Educator, and Content Creator, (Data Science, ML & AI ).Twitter    YouTube    Medium     GitHub
Read more
  • 0
  • 0
  • 18743

article-image-demystifying-azure-openai-service
Olivier Mertens, Breght Van Baelen
15 Sep 2023
16 min read
Save for later

Demystifying Azure OpenAI Service

Olivier Mertens, Breght Van Baelen
15 Sep 2023
16 min read
Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!This article is an excerpt from the book, Azure Data and AI Architect Handbook, by Olivier Mertens and Breght Van Baelen. Master core data architecture design concepts and Azure Data & AI services to gain a cloud data and AI architect’s perspective to developing end-to-end solutionsIntroductionOpenAI has risen immensely in popularity with the arrival of ChatGPT. The company, which started as an on-profit organization, has been the driving force behind the GPT and DALL-E model families, with intense research at a massive scale. The speed at which new models get released and become available on Azure has become impressive lately.Microsoft has a close partnership with OpenAI, after heavy investments in the company from Microsoft . The models created by OpenAI use Azure infrastructure for development and deployment. Within this partnership, OpenAI carries the responsibility of research and innovation, coming up with new models and new versions of their existing models. Microsoft manages the enterprise-scale go-to-market. It provides infrastructure and technical guidance, along with reliable SLAs, to get large organizations started with the integrations of these models, fine-tuning them on their own data, and hosting a private deployment of the models.Like the face recognition model in Azure Cognitive Services, powerful LLMs such as the ones in Azure OpenAI Service could be used to cause harm at scale. Therefore, this service is also gated according to Microsoft ’s guidelines on responsible AI.At the time of writing, Azure OpenAI Service offers access to the following models: GPT model family * GPT-3.5   * GPT-3.5-Turbo (the model behind ChatGPT  * GPT-4CodexDALL-E 2Let’s dive deeper into these models.The GPT model familyGPT models, which stands for generative pre-trained transformer models, made their first appearance in 2018, with GPT-1, trained on a dataset of roughly 7,000 books. This made good advancements in performance at the time, but the model was already vastly outdated a couple of years later. GPT-2 followed in 2019, trained on the WebText dataset (a collection of 8 million web pages). In 2020, GPT-3 was released, trained on the WebText dataset, two book corpora, and  English Wikipedia.In these years, there were no major breakthroughs in terms of efficient algorithms , but rather, in the scale of the architecture and datasets. This becomes easily visible when we look at the growing number of parameters used for every new generation of the model, as shown in the following figure.Figure 9.3 – A visual comparison between the sizes of the different generations of GPT models, based on their trainable parametersThe question is often raised of how to interpret this concept of parameters. An easy analogy is the number of neurons in a brain. Although parameters in a neural network are not equivalent to its artificial neurons, the number of parameters and neurons are heavily correlated – more parameters = more neurons. The more neurons there are in the brain, the more knowledge it can grasp.Since the arrival of GPT-3, we have seen two major adaptations of the third-generation model being made. The first one is GPT-3.5. This model has a similar architecture as the GPT-3 model but is trained on text and code, whereas the original GPT-3 only saw text data during training. Therefore, GPT-3.5 is capable of generating and understanding code. GPT-3.5, in turn, became the basis for the next adaptation, the vastly popular ChatGPT model. This model has been fine-tuned for conversational usage while using additional reinforcement learning to get a sense of ethical behavior.GPT model sizesThe OpenAI models are available in different sizes, which are all named after remarkable scientists. The GPT-3.5 model specifically, is  available in four versions:AdaBabbage CurieDavinciThe Ada model is the smallest, most lightweight model, while Davinci is the most complex and most performant model. The larger the model, the more expensive it is to use, host, and fine-tune, as shown in Figure 9.4. As a side note, when you hear about the absurd number of parameters of new GPT models, this usually refers to the Davinci model.Figure 9.4 – A trade-off exists between lightweight, cheap models and highly performant, complex modelsWith a trade-off between costs and performance available, an architect can start thinking about which model size may best fit a solution. In reality, this often comes down to empirical testing. If the cheaper model can perform the job at an acceptable performance, then this is the more cost-effective solution. Note that when talking about performance in this scenario, we mean predictive power, not the speed at which the model makes predictions. The larger models will be slower to output a prediction than the lightweight models.Understanding the difference between GPT-3.5 and GPT-3.5-Turbo (ChatGPT)GPT-3.5 and GPT-3.5-Turbo are both models used to generate natural language text, but they are used in different ways. GPT-3.5 is classified as a text completion model, whereas GPT-3.5-Turbo is referred to as conversational AI.To better understand the contrast between the two models, we first need to introduce the concept of contextual learning. These models are trained to understand the structure of the input prompt to provide a meaningful answer. Contextual learning is often split up into few-shot learning, one-shot learning, and zero-shot learning. Shot, in this context, refers to an example given in the input prompt. With few-shot learning, we provide multiple examples in the input prompt, one-shot learning provides a single example, and zero-shot indicates that no examples are given. In the case of the latter, the model will have to figure out a different way to understand what is being asked of it (such as interpreting the goal of a question).Consider the following example:Figure 9.5 – Few-shot learning takes up the most amount of tokens and requires more effort but often results in model outputs of higher qualityWhile it takes more prompt engineering effort to apply few-shot learning, it will usually yield better results. A text completion model, such as GPT-3.5, will perform vastly better on few-shot learning than one-shot or zero-shot. As the name suggests, the model figures out the structure of the input prompt (i.e., the examples) and completes the text accordingly.Conversational AI, such as ChatGPT, is more performant in zero-shot learning. In the case of the preceding example, both models are able to output the correct answer, but as questions become more and more complex, there will be a noticeable difference in predictive performance. Additionally, GPT-3.5-Turbo will remember information from previous input prompts, whereas GPT-3.5 prompts are handled independently.Innovating with GPT-4With the arrival of GPT-4, the focus has shifted toward multimodality. Multimodality in AI refers to the ability of an AI system to process and interpret information from multiple modalities, such as text, speech, images, and videos. Essentially, it is the capability of AI models to understand and combine data from different sources and formats.GPT-4 is capable of additionally taking images as input and interpreting them. It has stronger reasoning and overall performance than its predecessors. There was a famous example where GPT-4 was able to deduce that balloons would fly upward when asked what would happen if someone cut the balloons' strings, as shown in the following photo.Figure 9.6 – The image in question that was used in the experiment. When asked what would happen if the strings were cut, GPT-4 replied that the balloons would start flying awaySome adaptations of GPT-4, such as the one used in Bing Chat, have the extra feature of citing sources in generated answers. This is a welcome addition, as hallucination was a significant flaw in earlier GPT models.HallucinationHallucination in the context of AI refers to generating wrong predictions with high confidence. It is obvious that this can cause a lot more harm than the model indicating it is not sure how to respond or knowing the answer.Next, we will look at the Codex model.CodexCodex is a model that is architecturally similar to GPT-3, but it fully focuses on code generation and understanding. Furthermore, an adaptation of Codex forms the underlying model for GitHub Copilot, a tool that provides suggestions and auto-completion for code based on context and natural language inputs, available for various integrated development environments (IDEs) such as Visual Studio Code. Instead of a ready-to-use solution, Codex is (like the other models in Azure OpenAI) available as a model endpoint and should be used for integration in custom apps.The Codex model is initially trained on a collection of 54 million code repositories, resulting in billions of lines of code, with the majority of training data written in Python.Codex can generate code in different programming languages based on an input prompt in natural language (text-to-code), explain the function of blocks of code (code-to-text), add comments to code, and debug existing code.Codex is available as a C (cushman) and D (Davinci) model. Lightweight Codex models (A series or B series) currently do not exist.Models such as Codex or GitHub Copilot are a great way to boost the productivity of software engineers, data analysts, data engineers, and data scientists.  They do not replace these roles, as their accuracy is not perfect; rather, they give engineers the opportunity to start editing from a fairly well-written block of code instead of coding from scratch.DALL-E 2The DALL-E model family is used to generate visuals. By providing a  description in natural language in the input prompt, it generates a series of matching images. While other models are often used at scale in large enterprises, DALL-E 2 tends to be more popular in smaller businesses. Organizations that lack an in-house graphic designer can make great use of DALL-E to generate visuals for banners, brochures, emails, web pages, and so on.DALL-E 2 only has a  single model size to choose from, although open-source alternatives exist if a lightweight version is preferred. Fine-tuning and private deploymentsAs a data architect, it is important to understand the cost structure of these models. The first option is to use the base model in a serverless manner. Similar to how we work with Azure Cognitive Services, users will get a key for the model’s endpoint and simply pay per prediction. For DALL-E 2, costs are incurred per 100 images, while the GPT and Codex models are priced per 1,000 tokens. For every request made to a GPT or Codex model, all tokens of the input prompt and the output are added up to determine the cost of the prediction.TokensIn natural language processing, a token refers to a sequence of characters that represents a distinct unit of meaning in a text. These units do not necessarily correspond to words, although for short words, this is mostly the case. Tokens are used as the basic building blocks to process and analyze text data. A good rule of thumb for the English language is that one token is, on average, four characters. Dividing your total character count by four will make a good estimate of the number of tokens.Azure OpenAI Service also grants extensive fine-tuning functionalities. Up to 1 GB of data can be uploaded per Azure OpenAI instance for fine-tuning. This may not sound like a lot, but note that we are not training a new model from scratch. The goal of fine-tuning is to retrain the last few layers of the model to increase performance on specific tasks or company-specific knowledge. For this process, 1 GB of data is more than sufficient.When adding a fine-tuned model to a solution, two additional costs will be incurred. On top of the token-based inference cost, we need to take into account the training and hosting costs. The hourly training cost can be quite high due to the amount of hardware needed, but compared to the inference and hosting costs during a model’s life cycle, it remains a small percentage. Next, since we are not using the base model anymore and, instead, our own “version” of the model, we will need to host the model ourselves, resulting in an hourly hosting cost.Now that we have covered both pre-trained model collections, Azure Cognitive Services, and Azure OpenAI Service, let’s move on to custom development using Azure Machine Learning.Grounding LLMsOne of the most popular use cases for LLMs involves providing our own data as context to the model (often referred to as grounding). The reason for its popularity is partly due to the fact that many business cases can be solved using a consistent technological architecture. We can reuse the same solution, but by providing different knowledge bases, we can serve different end users.For example, by placing an LLM on top of public data such as product manuals or product specifics, it is easy to develop a customer support chatbot. If we swap out this knowledge base of product information with something such as HR documents, we can reuse the same tech stack to create an internal HR virtual assistant.A common misconception regarding grounding is that a model needs to be trained on our own data. This is not the case. Instead, after a user asks a question, the relevant document (or paragraphs) is injected into the prompt behind the scenes and lives in the memory of the model for the duration of the chat session (when working with conversational AI) or for a single prompt. The context, as we call it, is then wiped clean and all information is forgotten. If we wanted to cache this info, it is possible to make use of a framework such as LangChain or Semantic Kernel, but that is out of the scope of this book.The fact that a model does not get retrained on our own data plays a crucial role in terms of data privacy and cost optimization. As shown before in the section on fine-tuning, as soon as a base model is altered, an hourly operating cost is added to run a private deployment of the model. Also, information from the documents cannot be leaked to other users working with the same model.Figure 9.7 visualizes the architectural concepts to ground an LLM.Figure 9.7 – Architecture to ground an LLMThe first thing to do is turn the documents that should be accessible to the model into embeddings. Simply put, embeddings are mathematical representations of natural language text. By turning text into embeddings, it is possible to accurately calculate the similarity (from a semantics perspective) between two pieces of text.To do this, we can leverage Azure Functions, a service that allows pieces of code to run in a serverless function. It often forms the glue between different components by handling interactions. In this case, an Azure function (on the bottom left of Figure 9.7) will grab the relevant documents from the knowledge base, break them up into chunks (to accommodate for the maximum token limits of the model), and generate an embedding for each one. This embedding is then stored, alongside the natural language text, in a vector database. This function should be run for all historic data that will be accessible to the model, as well as triggered for every new, relevant document that is added to the knowledge base.Once the vector database is in place, users can start asking questions. However, the user questions are not directly sent to the model endpoint. Instead, another Azure function (shown at the top of Figure 9.7) will turn the user question into an embedding and check its similarity of it with the embeddings of the documents or paragraphs in the vector database. Then, the top X most relevant text chunks are injected into the prompt as context, and the prompt is sent over to the LLM. Finally, the response is returned to the user.ConclusionAzure OpenAI Service, a collaboration between OpenAI and Microsoft, delivers potent AI models. The GPT model family, from GPT-1 to GPT-4, has evolved impressively, with GPT-3.5-Turbo (ChatGPT) excelling in conversational AI. GPT-4 introduces multimodal capabilities, comprehending text, speech, images, and videos.Codex specializes in code generation, while DALL-E 2 creates visuals from text descriptions. These models empower developers and designers. Customization via fine-tuning offers cost-effective solutions for specific tasks. Leveraging Azure OpenAI Service for your projects enhances productivity.Grounding language models with user data ensures data privacy and cost efficiency. This collaboration holds promise for innovative AI applications across various domains.Author BioOlivier Mertens is a cloud solution architect for Azure data and AI at Microsoft, based in Dublin, Ireland. In this role, he assisted organizations in designing their enterprise-scale data platforms and analytical workloads. Next to his role as an architect, Olivier leads the technical AI expertise for Microsoft EMEA in the corporate market. This includes leading knowledge sharing and internal upskilling, as well as solving highly complex or strategic customer AI cases. Before his time at Microsoft, he worked as a data scientist at a Microsoft partner in Belgium.Olivier is a lecturer for generative AI and AI solution architectures, a keynote speaker for AI, and holds a master’s degree in information management, a postgraduate degree as an AI business architect, and a bachelor’s degree in business management.Breght Van Baelen is a Microsoft employee based in Dublin, Ireland, and works as a cloud solution architect for the data and AI pillar in Azure. He provides guidance to organizations building large-scale analytical platforms and data solutions. In addition, Breght was chosen as an advanced cloud expert for Power BI and is responsible for providing technical expertise in Europe, the Middle East, and Africa. Before his time at Microsoft, he worked as a data consultant at Microsoft Gold Partners in Belgium.Breght led a team of eight data and AI consultants as a data science lead. Breght holds a master’s degree in computer science from KU Leuven, specializing in AI. He also holds a bachelor’s degree in computer science from the University of Hasselt.
Read more
  • 0
  • 0
  • 17954

article-image-deploying-llms-with-amazon-sagemaker-part-2
Joshua Arvin Lat
30 Nov 2023
19 min read
Save for later

Deploying LLMs with Amazon SageMaker - Part 2

Joshua Arvin Lat
30 Nov 2023
19 min read
Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!IntroductionIn the first part of this post, we showed how easy it is to deploy large language models (LLMs) in the cloud using a managed machine learning service called Amazon SageMaker. In just a few steps, we were able to deploy a MistralLite model in a SageMaker Inference Endpoint. If you’ve worked on real ML-powered projects in the past, you probably know that deploying a model is just the first step! There are definitely a few more steps before we can consider that our application is ready for use.If you’re looking for the link to the first part, here it is: Deploying LLMs with Amazon SageMaker - Part 1In this post, we’ll build on top of what we already have in Part 1 and prepare a demo user interface for our chatbot application. That said, we will tackle the following sections in this post:● Section I: Preparing the SageMaker Notebook Instance (discussed in Part 1)● Section II: Deploying an LLM using the SageMaker Python SDK to a SageMaker Inference Endpoint (discussed in Part 1)● Section III: Enabling Data Capture with SageMaker Model Monitor●  Section IV: Invoking the SageMaker inference endpoint using the boto3 client●  Section V: Preparing a Demo UI for our chatbot application●  Section VI: Cleaning UpWithout further ado, let’s begin!Section III: Enabling Data Capture with SageMaker Model MonitorIn order to analyze our deployed LLM, it’s essential that we’re able to collect the requests and responses to a central storage location. Instead of building our own solution that collects the information we need, we can just utilize the built-in Model Monitor capability of SageMaker. Here, all we need to do is prepare the configuration details and run the update_data_capture_config() method of the inference endpoint object and we’ll have the data capture setup enabled right away! That being said, let’s proceed with the steps required to enable and test data capture for our SageMaker Inference endpoint:STEP # 01: Continuing where we left off in Part 1 of this post, let’s get the bucket name of the default bucket used by our session:s3_bucket_name = sagemaker_session.default_bucket() s3_bucket_nameSTEP # 02: In addition to this, let’s prepare and define a few prerequisites as well:prefix = "llm-deployment" base = f"s3://{s3_bucket_name}/{prefix}" s3_capture_upload_path = f"{base}/model-monitor"STEP # 03: Next, let’s define the data capture config:from sagemaker.model_monitor import DataCaptureConfig data_capture_config = DataCaptureConfig(    enable_capture = True,    sampling_percentage=100,    destination_s3_uri=s3_capture_upload_path,    kms_key_id=None,    capture_options=["REQUEST", "RESPONSE"],    csv_content_types=["text/csv"],    json_content_types=["application/json"] )Here, we specify that we’ll be collecting 100% of the requests and responses that pass through the deployed model.STEP # 04: Let’s enable data capture so that we’re able to save in Amazon S3 the request and response data:predictor.update_data_capture_config(    data_capture_config=data_capture_config )Note that this step may take about 8-10 minutes to complete. Feel free to grab a cup of coffee or tea while waiting!STEP # 05: Let’s check if we are able to capture the input request and output response by performing another sample request:result = predictor.predict(input_data)[0]["generated_text"] print(result)This should yield the following output:"The meaning of life is a philosophical question that has been debated by thinkers and philosophers for centuries. There is no single answer that can be definitively proven, as the meaning of life is subjective and can vary greatly from person to person.\n\nSome people believe that the meaning of life is to find happiness and fulfillment through personal growth, relationships, and experiences. Others believe that the meaning of life is to serve a greater purpose, such as through a religious or spiritual calling, or by making a positive impact on the world through their work or actions.\n\nUltimately, the meaning of life is a personal journey that each individual must discover for themselves. It may involve exploring different beliefs and perspectives, seeking out new experiences, and reflecting on what brings joy and purpose to one's life."Note that it may take a minute or two before the .jsonl file(s) containing the request and response data appear in our S3 bucket.STEP # 06: Let’s prepare a few more examples:prompt_examples = [    "What is the meaning of life?",    "What is the color of love?",    "How to deploy LLMs using SageMaker",    "When do we use Bedrock and when do we use SageMaker?" ] STEP # 07: Let’s also define the perform_request() function which wraps the relevant lines of code for performing a request to our deployed LLM model:def perform_request(prompt, predictor):    input_data = {        "inputs": f"<|prompter|>{prompt}</s><|assistant|>",        "parameters": {            "do_sample": False,            "max_new_tokens": 2000,            "return_full_text": False,        }    }      response = predictor.predict(input_data)    return response[0]["generated_text"] STEP # 08: Let’s quickly test the perform_request() function:perform_request(prompt_examples[0], predictor=predictor)STEP # 09: With everything ready, let’s use the perform_request() function to perform requests using the examples we’ve prepared in an earlier step:from time import sleep for example in prompt_examples:    print("Input:", example)      generated = perform_request(        prompt=example,        predictor=predictor    )    print("Output:", generated)    print("-"*20)    sleep(1)This should return the following:Input: What is the meaning of life? ... -------------------- Input: What is the color of love? Output: The color of love is often associated with red, which is a vibrant and passionate color that is often used to represent love and romance. Red is a warm and intense color that can evoke strong emotions, making it a popular choice for representing love. However, the color of love is not limited to red. Other colors that are often associated with love include pink, which is a softer and more feminine shade of red, and white, which is often used to represent purity and innocence. Ultimately, the color of love is subjective and can vary depending on personal preferences and cultural associations. Some people may associate love with other colors, such as green, which is often used to represent growth and renewal, or blue, which is often used to represent trust and loyalty. ...Note that this is just a portion of the overall output and you should get a relatively long response for each input prompt.Section IV: Invoking the SageMaker inference endpoint using the boto3 clientWhile it’s convenient to use the SageMaker Python SDK to invoke our inference endpoint, it’s best that we also know how to use boto3 as well to invoke our deployed model. This will allow us to invoke the inference endpoint from an AWS Lambda function using boto3.Image 10 — Utilizing API Gateway and AWS Lambda to invoke the deployed LLMThis Lambda function would then be triggered by an event from an API Gateway resource similar to what we have in Image 10. Note that we’re not planning to complete the entire setup in this post but having a working example of how to use boto3 to invoke the SageMaker inference endpoint should easily allow you to build an entire working serverless application utilizing API Gateway and AWS Lambda.STEP # 01: Let’s quickly check the endpoint name of the SageMaker inference endpoint:predictor.endpoint_nameThis should return the endpoint name with a format similar to what we have below:'MistralLite-HKGKFRXURT'STEP # 02: Let’s prepare our boto3 client using the following lines of code:import boto3 import json boto3_client = boto3.client('runtime.sagemaker')STEP # 03: Now, let’s invoke the endpointbody = json.dumps(input_data).encode() response = boto3_client.invoke_endpoint(    EndpointName=predictor.endpoint_name,    ContentType='application/json',    Body=body )   result = json.loads(response['Body'].read().decode())STEP # 04: Let’s quickly inspect the result:resultThis should give us the following:[{'generated_text': "The meaning of life is a philosophical question that has been debated by thinkers and philosophers for centuries. There is no single answer that can be definitively proven, as the meaning of life is subjective and can vary greatly from person to person..."}] STEP # 05: Let’s try that again and print the output text:result[0]['generated_text']This should yield the following output:"The meaning of life is a philosophical question that has been debated by thinkers and philosophers for centuries..."STEP # 06: Now, let’s define perform_request_2 which uses the boto3 client to invoke our deployed LLM:def perform_request_2(prompt, boto3_client, predictor):    input_data = {        "inputs": f"<|prompter|>{prompt}</s><|assistant|>",        "parameters": {            "do_sample": False,            "max_new_tokens": 2000,            "return_full_text": False,        }    }      body = json.dumps(input_data).encode()    response = boto3_client.invoke_endpoint(        EndpointName=predictor.endpoint_name,        ContentType='application/json',        Body=body    )      result = json.loads(response['Body'].read().decode())    return result[0]["generated_text"]STEP # 07: Next, let’s run the following block of code to have our deployed LLM answer the same set of questions using the perform_request_2() function:for example in prompt_examples:    print("Input:", example)      generated = perform_request_2(        prompt=example,        boto3_client=boto3_client,        predictor=predictor    )    print("Output:", generated)    print("-"*20)    sleep(1)This will give us the following output:Input: What is the meaning of life? ... -------------------- Input: What is the color of love? Output: The color of love is often associated with red, which is a vibrant and passionate color that is often used to represent love and romance. Red is a warm and intense color that can evoke strong emotions, making it a popular choice for representing love. However, the color of love is not limited to red. Other colors that are often associated with love include pink, which is a softer and more feminine shade of red, and white, which is often used to represent purity and innocence. Ultimately, the color of love is subjective and can vary depending on personal preferences and cultural associations. Some people may associate love with other colors, such as green, which is often used to represent growth and renewal, or blue, which is often used to represent trust and loyalty. ... Given that it may take a few minutes before the .jsonl files appear in our S3 bucket, let’s wait for about 3-5 minutes before proceeding to the next section. Feel free to grab a cup of coffee or tea while waiting!STEP # 08: Let’s run the following block of code to list the captured data files stored in our S3 bucket:results = !aws s3 ls {s3_capture_upload_path} --recursive resultsSTEP # 09: In addition to this, let’s store the list inside the processed variable:processed = [] for result in results:    partial = result.split()[-1]    path = f"s3://{s3_bucket_name}/{partial}"    processed.append(path)   processedSTEP # 10: Let’s create a new directory named captured_data using the mkdir command:!mkdir -p captured_dataSTEP # 11: Now, let’s download the .jsonl files from the S3 bucket to the captured_data directory in our SageMaker Notebook Instance:for index, path in enumerate(processed):    print(index, path)    !aws s3 cp {path} captured_data/{index}.jsonlSTEP # 12: Let’s define the load_json_file() function which will help us load files with JSON content:import json def load_json_file(path):    output = []      with open(path) as f:        output = [json.loads(line) for line in f]          return outputSTEP # 13: Using the load_json_file() function we defined in an earlier step, let’s load the .jsonl files and store them inside the all variable for easier viewing:all = [] for i, _ in enumerate(processed):    print(f">: {i}")    new_records = load_json_file(f"captured_data/{i}.jsonl")    all = all + new_records     allRunning this will yield the following response:Image 11 — All captured data points inside the all variableFeel free to analyze the nested structure stored in all variables. In case you’re interested in how this captured data can be analyzed and processed further, you may check Chapter 8, Model Monitoring and Management Solutions of my 2nd book “Machine Learning Engineering on AWS”.Section V: Preparing a Demo UI for our chatbot applicationYears ago, we had to spend a few hours to a few days before we were able to prepare a user interface for a working demo. If you have not used Gradio before, you would be surprised that it only takes a few lines of code to set everything up. In the next set of steps, we’ll do just that and utilize the model we’ve deployed in the previous parts of our demo application:STEP # 01: Continuing where we left off in the previous part, let’s install a specific version of gradio using the following command:!pip install gradio==3.49.0STEP # 02: We’ll also be using a specific version of fastapi as well:!pip uninstall -y fastapi !pip install fastapi==0.103.1STEP # 03: Let’s prepare a few examples and store them in a list:prompt_examples = [    "What is the meaning of life?",    "What is the color of love?",    "How to deploy LLMs using SageMaker",    "When do we use Bedrock and when do we use SageMaker?",    "Try again",    "Provide 10 alternatives",    "Summarize the previous answer into at most 2 sentences" ]STEP # 04: In addition to this, let’s define the parameters using the following block of code:parameters = {    "do_sample": False,    "max_new_tokens": 2000, }STEP # 05: Next, define the process_and_response() function which we’ll use to invoke the inference endpoint:def process_and_respond(message, chat_history):    processed_chat_history = ""    if len(chat_history) > 0:        for chat in chat_history:            processed_chat_history += f"<|prompter|>{chat[0]}</s><|assistant|>{chat[1]}</s>"              prompt = f"{processed_chat_history}<|prompter|>{message}</s><|assistant|>"    response = predictor.predict({"inputs": prompt, "parameters": parameters})    parsed_response = response[0]["generated_text"][len(prompt):]    chat_history.append((message, parsed_response))    return "", chat_historySTEP # 06: Now, let’s set up and prepare the user interface we’ll use to interact with our chatbot:import gradio as gr with gr.Blocks(theme=gr.themes.Monochrome(spacing_size="sm")) as demo:    with gr.Row():        with gr.Column():                      message = gr.Textbox(label="Chat Message Box",                                 placeholder="Input message here",                                 show_label=True,                                 lines=12)            submit = gr.Button("Submit")                      examples = gr.Examples(examples=prompt_examples,                                   inputs=message)        with gr.Column():            chatbot = gr.Chatbot(height=900)      submit.click(process_and_respond,                 [message, chatbot],                 [message, chatbot],                 queue=False)Here, we can see the power of Gradio as we only needed a few lines of code to prepare a demo app.STEP # 07: Now, let’s launch our demo application using the launch() method:demo.launch(share=True, auth=("admin", "replacethis1234!"))This will yield the following logs:Running on local URL:  http://127.0.0.1:7860 Running on public URL: https://123456789012345.gradio.live STEP # 08: Open the public URL in a new browser tab. This will load a login page which will require us to input the username and password before we are able to access the chatbot.Image 12 — Login pageSpecify admin and replacethis1234! in the login form to proceed.STEP # 09: After signing in using the credentials, we’ll be able to access a chat interface similar to what we have in Image 13. Here, we can try out various types of prompts.Image 13 — The chatbot interfaceHere, we have a Chat Message Box where we can input and run our different prompts on the left side of the screen. We would then see the current conversation on the right side.STEP # 10: Click the first example “What is the meaning of life?”. This will auto-populate the text area similar to what we have in Image 14:Image 14 — Using one of the examples to populate the Chat Message BoxSTEP # 11:Click the Submit button afterwards. After a few seconds, we should get the following response in the chat box:Image 15 — Response of the deployed modelAmazing, right? Here, we just asked the AI what the meaning of life is.STEP # 12: Click the last example “Summarize the previous answer into at most 2 sentences”. This will auto-populate the text area with the said example. Click the Submit button afterward.Image 16 — Summarizing the previous answer into at most 2 sentencesFeel free to try other prompts. Note that we are not limited to the prompts available in the list of examples in the interface.Important Note: Like other similar AI/ML solutions, there's the risk of hallucinations or the generation of misleading information. That said, it's critical that we exercise caution and validate the outputs produced by any Generative AI-powered system to ensure the accuracy of the results.Section VI: Cleaning UpWe’re not done yet! Cleaning up the resources we’ve created and launched is a very important step as this will help us ensure that we don’t pay for the resources we’re not planning to use.STEP # 01: Once you’re done trying out various types of prompts, feel free to turn off and clean up the resources launched and created using the following lines of code:demo.close() predictor.delete_endpoint()STEP # 02: Make sure to turn off (or delete) the SageMaker Notebook instance as well. I’ll leave this to you as an exercise!Wasn’t that easy?! As you can see, deploying LLMs with Amazon SageMaker is straightforward and easy. Given that Amazon SageMaker handles most of the heavy lifting to manage the infrastructure, we’re able to focus more on the deployment of our machine learning model. We are just scratching the surface as there is a long list of capabilities and features available in SageMaker. If you want to take things to the next level, feel free to read 2 of my books focusing heavily on SageMaker: “Machine Learning with Amazon SageMaker Cookbook” and “Machine Learning Engineering on AWS”.Author BioJoshua Arvin Lat is the Chief Technology Officer (CTO) of NuWorks Interactive Labs, Inc. He previously served as the CTO of 3 Australian-owned companies and also served as the Director for Software Development and Engineering for multiple e-commerce startups in the past. Years ago, he and his team won 1st place in a global cybersecurity competition with their published research paper. He is also an AWS Machine Learning Hero and he has been sharing his knowledge in several international conferences to discuss practical strategies on machine learning, engineering, security, and management. He is also the author of the books "Machine Learning with Amazon SageMaker Cookbook", "Machine Learning Engineering on AWS", and "Building and Automating Penetration Testing Labs in the Cloud". Due to his proven track record in leading digital transformation within organizations, he has been recognized as one of the prestigious Orange Boomerang: Digital Leader of the Year 2023 award winners.
Read more
  • 0
  • 0
  • 17834

article-image-fine-tuning-large-language-models-llms
Amita Kapoor
11 Sep 2023
12 min read
Save for later

Fine-Tuning Large Language Models (LLMs)

Amita Kapoor
11 Sep 2023
12 min read
IntroductionIn the bustling metropolis of machine learning and natural language processing, Large Language Models (LLMs) such as GPT-4 are the skyscrapers that touch the clouds. From chatty chatbots to prolific prose generators, they stand tall, powering a myriad of applications. Yet, like any grand structure, they're not one-size-fits-all. Sometimes, they need a little nipping and tucking to shine their brightest. Dive in as we unravel the art and craft of fine-tuning these linguistic behemoths, sprinkled with code confetti for the hands-on aficionados out there.What's In a Fine-Tune?In a world where a top chef can make spaghetti or sushi but needs finesse for regional dishes like 'Masala Dosa' or 'Tarte Tatin', LLMs are similar: versatile but requiring specialization for specific tasks. A general LLM might misinterpret rare medical terms or downplay symptoms, but with medical text fine-tuning, it can distinguish nuanced health issues. In law, a misread word can change legal interpretations; by refining the LLM with legal documents, we achieve accurate clause interpretation. In finance, where terms like "bearish" and "bullish" are pivotal, specialized training ensures the model's accuracy in financial analysis and predictions.Whipping Up the Perfect AI RecipeJust as a master chef carefully chooses specific ingredients and techniques to curate a gourmet dish, in the vast culinary world of Large Language Models, we have a delectable array of fine-tuning techniques to concoct the ideal AI delicacy. Before we dive into the details, feast your eyes on the visual smorgasbord below, which provides an at-a-glance overview of these methods. With this flavour-rich foundation, we're all set to embark on our fine-tuning journey, focusing on the PEFT method and the Flan-T5 model on the Hugging Face platform. Aprons on, and let's get cooking!Fine Tuning Flan-T5Google AI's Flan-T5, an advanced version of the T5 model, excels in LLMs with its capability to handle text and code. It specialises in Text generation, Translation, Summarization, Question Answering, and Code Generation. Unlike GPT-3 and LLAMA, Flan-T5 is open-source, benefiting researchers worldwide. With configurations ranging from 60M to 11B parameters, it balances versatility and power, though larger models demand more computational resources.For this article, we will leverage the DialogSum dataset, a robust resource boasting 13,460 dialogues, supplemented with manually labelled summaries and topics (and an additional 100 holdout data entries for topic generation). This dataset will serve as the foundation for fine-tuning our open-source giant, Flan-T5, to specialise it for dialogue summarization tasks. Setting the Stage: Preparing the Tool ChestTo fine-tune effectively, ensure your digital setup is optimized. Here's a quick checklist:Hardware: Use platforms like Google Colab.RAM: Memory depends on model parameters. For example:   Memory (MTotal) = 4 x (Number of Parameters x 4 bytes)For a 247,577,856 parameter model (flan-t5-base), around 3.7GB is needed for parameters, gradients, and optimizer states. Ideally, have at least 8GB RAMGPU: A high-end GPU, such as NVIDIA Tesla P100 or T4, speeds up training and inference. Aim for 12GB or more GPU memory, accounting for overheads.Libraries: Like chefs need the right tools, AI fine-tuning demands specific libraries for algorithms, models, and evaluation tools.Remember, your setup is as crucial as the process itself. Let's conjure up the essential libraries, by running the following command:!pip install \    transformers \    datasets \    evaluate \    rouge_score \    loralib \    peftWith these tools in hand, we're now primed to move deeper into the world of fine-tuning. Let's dive right in! Next, it's essential to set up our environment with the necessary tools:from datasets import load_dataset from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, GenerationConfig, TrainingArguments, Trainer import torch import time import evaluate import pandas as pd import numpy as np To put our fine-tuning steps into motion, we first need a dataset to work with. Enter the DialogSum dataset, an extensive collection tailored for dialogue summarization:dataset_name = "knkarthick/dialogsum" dataset = load_dataset(dataset_name)Executing this code, we've swiftly loaded the DialogSum dataset. With our data playground ready, we can take a closer look at its structure and content to better understand the challenges and potentials of our fine-tuning process. DialogSum dataset is neatly structured into three segments:Train: With a generous 12,460 dialogues, this segment is the backbone of our model's learning process.Validation: A set of 500 dialogues, this slice of the dataset aids in fine-tuning the model, ensuring it doesn't merely memorise but truly understandsTest: This final 1,500 dialogue portion stands as the litmus test, determining how well our model has grasped the art of dialogue summarization.Each dialogue entry is accompanied by a unique 'id', a 'summary' of the conversation, and a 'topic' to give context.Before fine-tuning, let's gear up with our main tool: the Flan-T5 model, specifically it's base' variant from Google, which balances performance and efficiency. Using AutoModelForSeq2SeqLM, we effortlessly load the pre-trained Flan-T5, set to use torch.bfloat16 for optimal memory and precision. Alongside, we have the tokenizer, essential for translating text into a model-friendly format. Both are sourced from google/flan-t5-base, ensuring seamless compatibility. Now, let's get this code rolling:model_name='google/flan-t5-base' original_model = AutoModelForSeq2SeqLM.from_pretrained(model_name, torch_dtype=torch.bfloat16) tokenizer = AutoTokenizer.from_pretrained(model_name)Understanding Flan-T5 requires a look at its structure, particularly its parameters. Knowing the number of trainable parameters shows the model's adaptability. The total parameters reflect its complexity. The following code will count these parameters and calculate the ratio of trainable ones, giving insight into the model's flexibility during fine-tuning.Let's now decipher these statistics for our Flan-T5 model:def get_model_parameters_info(model): total_parameters = sum(param.numel() for param in model.parameters()) trainable_parameters = sum(param.numel() for param in model.parameters() if param.requires_grad) trainable_percentage = 100 * trainable_parameters / total_parameters info = ( f"Trainable model parameters: {trainable_parameters}", f"Total model parameters: {total_parameters}", f"Percentage of trainable model parameters: {trainable_percentage:.2f}%" ) return '\n'.join(info) print(get_model_parameters_info(original_model))Trainable model parameters: 247577856 Total model parameters: 247577856 Percentage of trainable model parameters: 100.00%Harnessing PEFT for EfficiencyIn the fine-tuning journey, we seek methods that boost efficiency without sacrificing performance. This brings us to PEFT (Parameter-Efficient Fine-Tuning) and its secret weapon, LORA (Low-Rank Adaptation). LORA smartly adapts a model to new tasks with minimal parameter adjustments, offering a cost-effective solution in computational terms.In the code block that follows, we're initializing LORA's configuration. Key parameters to note include:r: The rank of the low-rank decomposition, which influences the number of adaptable parameters.lora_alpha: A scaling factor determining the initial magnitude of the LORA parameters.target_modules: The neural network components we wish to reparameterize. Here, we're targeting the "q" (query) and "v" (value) modules in the transformer's attention mechanism.lora_dropout:  A regularising dropout is applied to the LORA parameters to prevent overfitting.bias: Specifies the nature of the bias term in the reparameterization. Setting it to "none" means no bias will be added.task_type: Signifying the type of task for which we're employing LORA. In our case, it's sequence-to-sequence language modelling, perfectly aligned with our Flan-T5's capabilities.Using the get_peft_model function, we integrate the LORA configuration into our Flan-T5 model. Now, let's see how this affects the trainable parameters: peft_model = get_peft_model(original_model, lora_config) print(get_model_parameters_info(peft_model))Trainable model parameters: 3538944 Total model parameters: 251116800 Percentage of trainable model parameters: 1.41%Preparing for model training requires setting specific parameters. Directory choice, learning rate, logging frequency, and epochs are vital. A unique output directory segregates results from different training runs, enabling comparison. Our high learning rate signifies aggressive fine-tuning, while allocating 100 epochs ensures ample adaptation time for the model. With these settings, we're poised to initiate the trainer and embark on the training journey.# Set the output directory with a unique name using a timestamp output_dir = f'peft-dialogue-summary-training-{str(int(time.time()))}' # Define the training arguments for PEFT model training peft_training_args = TrainingArguments( output_dir=output_dir, auto_find_batch_size=True, # Automatically find an optimal batch size learning_rate=1e-3, # Use a higher learning rate for fine-tuning num_train_epochs=10, # Set the number of training epochs logging_steps=1000, # Log every 500 steps for more frequent logging max_steps=-1 # Let the number of steps be determined by epochs and dataset size ) # Initialise the trainer with PEFT model and training arguments peft_trainer = Trainer( model=peft_model, args=peft_training_args, train_dataset=formatted_datasets["train"], ) Let the learning begin!peft_trainer.train()To evaluate our models, we'll compare their summaries to a human baseline from our dataset using a `prompt`. With the original and PEFT-enhanced Flan-T5 models, we'll create summaries and contrast them with the human version, revealing AI accuracy and the best-performing model in our summary contest.def generate_summary(model, tokenizer, dialogue, prompt): """ Generate summary for a given dialogue and model. """ input_text = prompt + dialogue input_ids = tokenizer(input_text, return_tensors="pt").input_ids input_ids = input_ids.to(device) output_ids = model.generate(input_ids=input_ids, max_length=200, num_beams=1, early_stopping=True) return tokenizer.decode(output_ids[0], skip_special_tokens=True) index = 270 dialogue = dataset['test'][index]['dialogue'] human_baseline_summary = dataset['test'][index]['summary'] prompt = "Summarise the following conversation:\n\n" # Generate summaries original_summary = generate_summary(original_model, tokenizer, dialogue, prompt) peft_summary = generate_summary(peft_model, tokenizer, dialogue, prompt) # Print summaries print_output('BASELINE HUMAN SUMMARY:', human_baseline_summary) print_output('ORIGINAL MODEL:', original_summary) print_output('PEFT MODEL:', peft_summary)And the output:----------------------------------------------------------------------- BASELINE HUMAN SUMMARY:: #Person1# and #Person1#'s mother are preparing the fruits they are going to take to the picnic. ----------------------------------------------------------------------- ORIGINAL MODEL:: #Person1# asks #Person2# to take some fruit for the picnic. #Person2# suggests taking grapes or apples.. ----------------------------------------------------------------------- PEFT MODEL:: Mom and Dad are going to the picnic. Mom will take the grapes and the oranges and take the oranges.To assess our summarization models, we use the subset of the test dataset. We'll compare the summaries to human-created baselines. Using batch processing for efficiency, dialogues are processed in set group sizes. After processing, all summaries are compiled into a DataFrame for structured comparison and analysis. Below is the Python code for this experiment.dialogues = dataset['test'][0:20]['dialogue'] human_baseline_summaries = dataset['test'][0:20]['summary'] original_model_summaries = [] peft_model_summaries = [] for dialogue in dialogues:    prompt = "Summarize the following conversation:\n\n"      original_summary = generate_summary(original_model, tokenizer, dialogue, prompt)       peft_summary = generate_summary(peft_model, tokenizer, dialogue, prompt)      original_model_summaries.append(original_summary)    peft_model_summaries.append(peft_summary) df = pd.DataFrame({    'human_baseline_summaries': human_baseline_summaries,    'original_model_summaries': original_model_summaries,    'peft_model_summaries': peft_model_summaries }) dfTo evaluate our PEFT model's summaries, we use the ROUGE metric, a common summarization tool. ROUGE measures the overlap between predicted summaries and human references, showing how effectively our models capture key details. The Python code for this evaluation is:rouge = evaluate.load('rouge') original_model_results = rouge.compute( predictions=original_model_summaries, references=human_baseline_summaries[0:len(original_model_summaries)], use_aggregator=True, use_stemmer=True, ) peft_model_results = rouge.compute( predictions=peft_model_summaries, references=human_baseline_summaries[0:len(peft_model_summaries)], use_aggregator=True, use_stemmer=True, ) print('ORIGINAL MODEL:') print(original_model_results) print('PEFT MODEL:') print(peft_model_results) Here is the output:ORIGINAL MODEL: {'rouge1': 0.3870781853986991, 'rouge2': 0.13125454660387353, 'rougeL': 0.2891907205395029, 'rougeLsum': 0.29030342767482775} INSTRUCT MODEL: {'rouge1': 0.3719168722187023, 'rouge2': 0.11574429294744135, 'rougeL': 0.2739614480462256, 'rougeLsum': 0.2751489358330983} PEFT MODEL: {'rouge1': 0.3774164144865605, 'rouge2': 0.13204737323990984, 'rougeL': 0.3030487123408395, 'rougeLsum': 0.30499897454317104}Upon examining the results, it's evident that the original model shines with the highest ROUGE-1 score, adeptly capturing crucial standalone terms. On the other hand, the PEFT Model wears the crown for both ROUGE-L and ROUGE-Lsum metrics. This implies the PEFT Model excels in crafting summaries that string together longer, coherent sequences echoing those in the reference summaries.ConclusionWrapping it all up, in this post we delved deep into the nuances of fine-tuning Large Language Models, particularly spotlighting the prowess of FLAN T5. Through our hands-on venture into the dialogue summarization task, we discerned the intricate dance between capturing individual terms and weaving them into a coherent narrative. While the original model exhibited an impressive knack for highlighting key terms, the PEFT Model emerged as the maestro in crafting flowing, meaningful sequences.It's clear that in the grand arena of language models, knowing the notes is just the beginning; it's how you orchestrate them that creates the magic. Harnessing the techniques illuminated in this post, you too can fine-tune your chosen LLM, crafting your linguistic symphonies with finesse and flair. Here's to you becoming the maestro of your own linguistic ensemble!Author BioAmita Kapoor is an accomplished AI consultant and educator with over 25 years of experience. She has received international recognition for her work, including the DAAD fellowship and the Intel Developer Mesh AI Innovator Award. She is a highly respected scholar with over 100 research papers and several best-selling books on deep learning and AI. After teaching for 25 years at the University of Delhi, Amita retired early and turned her focus to democratizing AI education. She currently serves as a member of the Board of Directors for the non-profit Neuromatch Academy, fostering greater accessibility to knowledge and resources in the field. After her retirement, Amita founded NePeur, a company providing data analytics and AI consultancy services. In addition, she shares her expertise with a global audience by teaching online classes on data science and AI at the University of Oxford. 
Read more
  • 0
  • 0
  • 17691
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-text-classification-with-transformers
Saeed Dehqan
28 Aug 2023
9 min read
Save for later

Text Classification with Transformers

Saeed Dehqan
28 Aug 2023
9 min read
IntroductionThis blog aims to implement binary text classification using a transformer architecture. If you're new to transformers, the "Transformer Building Blocks" blog explains the architecture and its text generation implementation. Beyond text generation and translation, transformers serve classification, sentiment analysis, and speech recognition. The transformer model comprises two parts: an encoder and a decoder. The encoder extracts features, while the decoder processes them. Just as a painter with tree features can draw, describe, visualize, categorize, or write about a tree, transformers encode knowledge (encoder) and apply it (decoder). This dual-part process is pivotal for text classification with transformers, allowing them to excel in diverse tasks like sentiment analysis, illustrating their transformative role in NLP.Deep Dive into Text Classification with TransformersWe train the model on the IMDB dataset. The dataset is ready and there’s no preprocessing needed. The model is vocab-based instead of character-based so that the model can converge faster. I limited the dataset vocabs to the 20000 most frequent vocabs. I also reduced the sequence to 200 so we can train faster. I tried to simplify the model and use torch.nn.MultiheadAttention it instead of writing the Multihead-attention ourselves. It makes the model faster since the nn.MultiheadAttention uses scaled_dot_product_attention under the hood. But if you want to know how MultiheadAttention works you can study the transformer building blocks blog or see the code here.Okay, now, let us add the feature extractor part:class transformer_block(nn.Module): def __init__(self):     super(block, self).__init__()     self.attention = nn.MultiheadAttention(embeds_size, num_heads, batch_first=True)     self.ffn = nn.Sequential(         nn.Linear(embeds_size, 4 * embeds_size),         nn.LeakyReLU(),         nn.Linear(4 * embeds_size, embeds_size),     )     self.drop1 = nn.Dropout(drop_prob)     self.drop2 = nn.Dropout(drop_prob)     self.ln1 = nn.LayerNorm(embeds_size, eps=1e-6)     self.ln2 = nn.LayerNorm(embeds_size, eps=1e-6) def forward(self, hidden_state):     attn, _ = self.attention(hidden_state, hidden_state, hidden_state, need_weights=False)     attn = self.drop1(attn)     out = self.ln1(hidden_state + attn)     observed = self.ffn(out)     observed = self.drop2(observed)     return self.ln2(out + observed)●    hidden_state: A tensor with a shape (batch_size, block_size, embeds_size) goes to the transformer_block and a tensor with the same shape goes out of it.●    self.attention: The transformer block tries to combine the information of tokens so that each token is aware of its neighbors or other tokens in the context. We may call this part the communication part. That’s what the nn.MultiheadAttention does. nn.MultiheadAttention is a ready multihead attention layer that can be faster than implementing it from scratch, just like what we did in the “Transformer Building Blocks” blog. The parameters of nn.MultiheadAttention are as follows:     ○    embeds_size: token embedding size     ○    num_heads: multihead, as the name suggests, consists of multiple heads and each head works on different parts of token embeddings. Suppose, your input data has shape (B,T,C) = (10, 32, 16). The token embedding size for this data is 16. If we specify the num_heads parameter to 2(divisible by 16), the multi-head splits data into two parts with shape (10, 32, 8). The first head works on the first part and the second head works on the second part. This is because transforming data into different subspaces can help the model to see different aspects of the data. Please note that the num_heads should be divisible by the embedding size so that at the end we can concatenate the split parts.    ○    batch_first: True means the first dimension is batch.●    Dropout: After the attention layer, the communication between tokens is closed and computations on tokens are done individually. We run a dropout on tokens. Dropout is a method of regularization. Regularization helps the training process to be based on generalization, not memorization. Without regularization, the model tries to memorize the training set and has poor performance on the test set. The dropout method turns off features with a probability of drop_prob.●    self.ln1: Layer normalization normalizes embeddings so that they have zero mean and standard deviation one.●    Residual connection: hidden_state + attn: Observe that before normalization, we added the input to the output of multihead attention, named residual connection. It has two benefits:   ○    It helps the model to have the unchanged embedding information.   ○    It helps to prevent gradient vanishing, which is common in deep networks where we stack multiple transformer layers.●    self.ffn: After dropout, residual connection, and normalization, we forward data into a simple non-linear neural network to adjust the tokens one by one for better representation.●    self.ln2(out + observed): Finally, another dropout, residual connection, and layer normalization.The transformer block is ready. And here is the final piece:class transformer(nn.Module): def __init__(self):     super(transformer, self).__init__()     self.tok_embs = nn.Embedding(vocab_size, embeds_size)     self.pos_embs = nn.Embedding(block_size, embeds_size)     self.block = block()     self.ln1 = nn.LayerNorm(embeds_size)     self.ln2 = nn.LayerNorm(embeds_size)     self.classifier_head = nn.Sequential(         nn.Linear(embeds_size, embeds_size),         nn.LeakyReLU(),         nn.Dropout(drop_prob),         nn.Linear(embeds_size, embeds_size),         nn.LeakyReLU(),         nn.Linear(embeds_size, num_classes),         nn.Softmax(dim=1),     )     print("number of parameters: %.2fM" % (self.num_params()/1e6,)) def num_params(self):     n_params = sum(p.numel() for p in self.parameters())     return n_params def forward(self, seq):     B,T = seq.shape     embedded = self.tok_embs(seq)     embedded = embedded + self.pos_embs(torch.arange(T, device=device))     output = self.block(embedded)     output = output.mean(dim=1)     output = self.classifier_head(output)     return output●    self.tok_embs: nn.Embedding is like a lookup table that receives a sequence of indices, and returns their corresponding embeddings. These embeddings will receive gradients so that the model can update them to make better predictions.●    self.tok_embs: To comprehend a sentence, you not only need words, you also need to have the order of words. Here, we embed positions and add them to the token embeddings. In this way, the model has both words and their order.●    self.block: In this model, we only use one transformer block, but you can stack more blocks to get better results.●    self.classifier_head: This is where we put the extracted information into action to classify the sequence. We call it the transformer head. It receives a fixed-size vector and classifies the sequence. The softmax as the final activation function returns a probability distribution for each class.●    self.tok_embs(seq): Given a sequence of indices (batch_size, block_size), it returns (batch_size, block_size, embeds_size).●    self.pos_embs(torch.arange(T, device=device)): Given a sequence of positions, i.e. [0,1,2], it returns embeddings of each position. Then, we add them to the token embeddings.●    self.block(embedded): The embedding goes to the transformer block to extract features. Given the embedded shape (batch_size, block_size, embeds_size), the output has the same shape (batch_size, block_size, embeds_size).●    output.mean(dim=1): The purpose of using mean is to aggregate the information from the sequence into a compact representation before feeding it into self.classifier_head. It helps in reducing the spatial dimensionality and extracting the most important features from the sequence. Given the input shape (batch_size, block_size, embeds_size), the output shape is (batch_size, embeds_size). So, one fixed-size vector for each batch.●    self.classifier_head(output): And here we classify.The final code can be found here. The remaining code consists of downstream tasks such as the training loop, loading the dataset, setting the hyperparameters, and optimizer. I used RMSprop instead of Adam and AdamW. I also used BCEWithLogitsLoss instead of cross-entropy loss. BCE(Binary Cross Entropy) is for binary classification models and it combines sigmoid with cross entropy and it is numerically more stable. I also empirically got better accuracy. After 30 epochs, the final accuracy is ~84%.ConclusionThis exploration of text classification using transformers reveals their revolutionary potential. Beyond text generation, transformers excel in sentiment analysis. The encoder-decoder model, analogous to a painter interpreting tree feature, propels efficient text classification. A streamlined practical approach and the meticulously crafted transformer block enhance the architecture's robustness. Through optimization methods and loss functions, the model is honed, yielding an empirically validated 84% accuracy after 30 epochs. This journey highlights transformers' disruptive impact on reshaping AI-driven language comprehension, fundamentally altering the landscape of Natural Language Processing.Author BioSaeed Dehqan trains language models from scratch. Currently, his work is centered around Language Models for text generation, and he possesses a strong understanding of the underlying concepts of neural networks. He is proficient in using optimizers such as genetic algorithms to fine-tune network hyperparameters and has experience with neural architecture search (NAS) by using reinforcement learning (RL). He implements models starting from data gathering to monitoring, and deployment on mobile, web, cloud, etc. 
Read more
  • 0
  • 0
  • 17395

article-image-transformer-building-blocks
Saeed Dehqan
29 Aug 2023
22 min read
Save for later

Transformer Building Blocks

Saeed Dehqan
29 Aug 2023
22 min read
IntroductionTransformers employ potent techniques to preprocess tokens before sequentially inputting them into a neural network, aiding in the selection of the next token. At the transformer's apex is a basic neural network, the transformer head. The text generator model processes input tokens and generates a probability distribution for subsequent tokens. Context length, termed context length or block size, is recognized as a hyperparameter denoting input token count. The model's primary aim is to predict the next token based on input tokens (referred to as context tokens or context windows). Our goal with n tokens is to predict the subsequent fitting token following previous ones. Thus, we rely on these n tokens to anticipate the next. As humans, we attempt to grasp the conversation's context - our location and a loose foresight of the path's culmination. Upon gathering pertinent insights, relevant words emerge, while irrelevant ones fade, enabling us to choose the next word with precision. We occasionally err but backtrack, a luxury transformers lack. If they incorrectly predict (an irrelevant token), they persist, though exceptions exist, like beam search. Unlike us, transformers can't forecast. Revisiting n prior tokens, our human assessment involves inspecting them individually, and discerning relationships from diverse angles. By prioritizing pivotal tokens and disregarding superfluous ones, we evaluate tokens within various contexts. We scrutinize all n prior tokens individually, ready to prognosticate. This embodies the essence of the multihead attention mechanism in transformers. Consider a context window with 5 tokens. Each wears a distinct mask, predicting its respective next token:"To discern the void amidst, we must first grasp the fullness within." To understand what token is lacking, we must first identify what we are and possess. We need communication between tokens since tokens don’t know each other yet and in order to predict their own next token, they first need to know each other well and pair together in such a way that tokens with similar characteristics stay near each other (technically having similar vectors). Each token has three vectors that represent:●    What tokens they are looking for (known as query)●    What they really have (known as key)●    What they are (known as value)Each token with its query starts looking for similar keys, finds each other, and starts to know one another by adding up their values:Similar tokens find each other and if a token is somehow dissimilar, here Token 4, other tokens don’t consider it much. But please note that every token (much or less) has its own effect on other tokens. Also, in self-attention, all tokens ask all other tokens with their query and keys to find familiar tokens, but not the future tokens, named masked self-attention. We prohibit tokens from communicating to future tokens. After exchanging information between tokens and mixing up their values, similar tokens become more similar:As you can see, the color of similar tokens becomes more similar(in action, their vectors become more similar). Since tokens in the group wear a mask, we cannot access the true tokens’ values. We just know and distinguish them from their mask(value). This is because every token has different characteristics in different contexts, and they don’t show their true essence.So far so good; we have finished the self-attention process and now, the group is ready to predict their next tokens. This is because individuals are aware of each other very well, and as a result, they can guess the next token better. Now, each token separately needs to go to a nonlinear network and then to the transformer head, to predict its own next token. We ask each one of the tokens separately to tell their opinion about the probability of what token comes next. Finally, we collect the probability distributions of all tokens in the context window. A probability distribution sums up to 100, or actually in action to 1. We give probability to every token the model has in its vocabulary. The simplest method to extract the next token from probability distributions is to select the one with the highest probability:As you can see, each token goes to the neural network and the network returns a probability distribution. The result is the following sentence: “It looks like a bug”.Voila! We managed to go through a simple Transformer model.Let’s recap everything we’ve said. A transformer receives n tokens as input, does some stuff (like self-attention, layer normalization, etc.) and feed-forward them into a neural network to get probability distributions of the next token. Each token goes to the neural network separately; if the number of tokens is 10, there are 10 probability distributions.At this point, you know intuitively how the main building blocks of a transformer work. But let us better understand them by implementing a transformer model.Clone the repository tiny-transformer:git clone https://github.com/saeeddhqan/tiny-transformerExecute simple_model.py in the repository If you simply want to run the model for training.Create a new file, and import the necessary modules:import math import torch import torch.nn as nn import torch.nn.functional as F Load the dataset and write the tokenizer: with open('shakespeare.txt') as fp: text = fp.read() chars = sorted(list(set(text))) vocab_size = len(chars) stoi = {c:i for i,c in enumerate(chars)} itos = {i:c for c,i in stoi.items()} encode = lambda s: [stoi[x] for x in s] decode = lambda e: ''.join([itos[x] for x in e])●    Open the dataset, and define a variable that is a list of all unique characters in the text.●    The set function splits the text character by character and then removes duplicates, just like sets in set theory. list(set(myvar)) is a way of removing duplicates in a list or string.●    vocab_size is the number of unique characters (here 65). ●    stoi is a dictionary where its keys are characters and values are their indices.●    itos is used to convert indices to characters. ●    encode function receives a string and returns indices of characters. ●    decode receives a list of indices and returns a string. Split the dataset into test and train and write a function that returns data for training:device = 'cuda' if torch.cuda.is_available() else 'cpu' torch.manual_seed(1234) data = torch.tensor(encode(text), dtype=torch.long).to(device) train_split = int(0.9 * len(data)) train_data = data[:train_split] test_data = data[train_split:] def get_batch(split='train', block_size=16, batch_size=1) -> 'Create a random batch and returns batch along with targets': data = train_data if split == 'train' else test_data ix = torch.randint(len(data) - block_size, (batch_size,)) x = torch.stack([data[i:i + block_size] for i in ix]) y = torch.stack([data[i+1:i + block_size + 1] for i in ix]) return x, y●    Choose a suitable device.●     Set a seed to make the training reproducible.●    Convert the text into a large list of indices with the encode function.●    Since the character indices are integer, we use torch.long data type to make the data suitable for the model. ●    90% for training and 10% for testing.●    If the batch_size is 10, we select 10 chunks or sequences from the dataset and stack them up to process them simultaneously. ●    If the batch_size is 1, get_batch function selects 1 random chunk (n consequence characters) from the dataset and returns x and y, where x is 16 characters’ indices and y is the target characters for x.The shape, value, and decoded version of the selected chunk are as follows:shape x: torch.Size([1, 16]) shape y: torch.Size([1, 16]) value x: tensor([[41, 43, 6, 1, 60, 47, 50, 50, 39, 47, 52, 2, 1, 52, 43, 60]]) value y: tensor([[43, 6, 1, 60, 47, 50, 50, 39, 47, 52, 2, 1, 52, 43, 60, 43]]) decoded x: ce, villain! nev decoded y: e, villain! neveWe usually process multiple chunks or sequences at once with batching in order to speed up the training. For each character, we have an equivalent target, which is its next token. The target for ‘c’ is ‘e’, for ‘e’ is ‘,’, for ‘v’ is ‘i’, and so on. Let us talk a bit about the input shape and output shape of tensors in a transformer model. The model receives a list of token indices like the above(named a sequence, or chunk) and maps them into their corresponding vectors.●    The input shape is (batch_size, block_size).●    After mapping indices into vectors, the data shape becomes (batch_size, block_size, embed_size).●    Then, through the multihead attention and feed-forward layers, the data shape does not change.●    Finally, the data with shape (batch_size, block_size, embed_size) goes to the transformer head (a simple neural network) and the output shape becomes (batch_size, block_size, vocab_size). vocab_size is the number of unique characters that can come next (for the Shakespeare dataset, the number of unique characters is 65).Self-attentionThe communication between tokens happens in the head class; we define the scores variable to save the similarity between vectors. The higher the score is, the more two vectors have in common. We then utilize these scores to do a weighted sum of all the vectors: class head(nn.Module): def __init__(self, embeds_size=32, block_size=16, head_size=8):     super().__init__()     self.key = nn.Linear(embeds_size, head_size, bias=False)     self.query = nn.Linear(embeds_size, head_size, bias=False)     self.value = nn.Linear(embeds_size, head_size, bias=False)     self.register_buffer('tril', torch.tril(torch.ones(block_size, block_size)))     self.dropout = nn.Dropout(0.1) def forward(self, x):     B,T,C = x.shape     # What am I looking for?     q = self.query(x)     # What do I have?     k = self.key(x)     # What is the representation value of me?     # Or: what's my personality in the group?     # Or: what mask do I have when I'm in a group?     v = self.value(x)     scores = q @ k.transpose(-2,-1) * (1 / math.sqrt(C)) # (B,T,head_size) @ (B,head_size,T) --> (B,T,T)     scores = scores.masked_fill(self.tril[:T, :T] == 0, float('-inf'))     scores = F.softmax(scores, dim=-1)     scores = self.dropout(scores)     out = scores @ v     return outUse three linear layers to transform the vector into key, query, and value, but with a smaller dimension (here same as head_size).Q, K, V: Q and K are for when we want to find similar tokens. We calculate the similarity between vectors with a dot product: q @ k.transpose(-2, -1). The shape of scores is (batch_size, block_size, block_size), which means we have the similarity scores between all the vectors in the block. V is used when we want to do the weighted sum. Scores: Pure dot product scores tend to have very high numbers that are not suitable for softmax since it makes the scores dense. Therefore, we rescale the results with a ratio of (1 / math.sqrt(C)). C is the embedding size. We call this a scaled dot product.Register_buffer: We used register_buffer to register a lower triangular tensor. In this way, when you save and load the model, this tensor also becomes part of the model.Masking: After calculating the scores, we need to replace future scores with -inf to shut them off so that the vectors do not have access to the future tokens. By doing so, these scores effectively become zero after applying the softmax function, resulting in a probability of zero for the future tokens. This process is referred to as masking. Here’s an example of masked scores with a block size 4:[[-0.1710, -inf, -inf, -inf], [ 0.2007, -0.0878, -inf, -inf], [-0.0405, 0.2913, 0.0445, -inf], [ 0.1328, -0.2244, 0.0796, 0.1719]]Softmax: It converts a vector into a probability distribution that sums up to 1. Here’s the scores after softmax:      [[1.0000, 0.0000, 0.0000, 0.0000],      [0.5716, 0.4284, 0.0000, 0.0000],      [0.2872, 0.4002, 0.3127, 0.0000],      [0.2712, 0.1897, 0.2571, 0.2820]]The scores of future tokens are zero; after doing a weighted sum, the future vectors become zero and the vectors receive none data from future vectors(n*0=0)Dropout: Dropout is a regularization technique. It drops some of the numbers in vectors randomly. Dropout helps the model to generalize, not memorize the dataset. We don’t want the model to memorize the Shakespeare model, right? We want it to create new texts like the dataset.Weighted sum: Weighted sum is used to combine different representations or embeddings based on their importance. The scores are calculated by measuring the relevance or similarity between each pair of vectors. The relevance scores are obtained by applying a scaled dot product between the query and key vectors, which are learned during the training process. The resulting weighted sum emphasizes the more important elements and reduces the influence of less relevant ones, allowing the model to focus on the most salient information. We dot product scores with values and the result is the outcome of self-attention.Output: since the embedding size and head size are 32 and 8 respectively, if the input shape is (batch_size, block_size, 32), the output has the shape of (batch_size, block_size, 8).Multihead self-attention“I have multiple personalities(v), tendencies and needs (q), and valuable things (k) in different spaces”. Vectors said.We transform the vectors into small dimensions, and then run self-attention on them; we did this in the previous class. In multihead self-attention, we call the head class four times, and then, concatenate the smaller vectors to have the same input shape. We call this multihead self-attention. For instance, if the shape of input data is (1, 16, 32), we transform it into four (1, 16, 8) tensors and run self-attention on these tensors. Why four times? 4 * 8 = initial shape. By using multihead self-attention and running self-attention multiple times, what we do is consider the different aspects of vectors in different spaces. That’s all!Here is the code:class multihead(nn.Module): def __init__(self, num_heads=4, head_size=8):     super().__init__()     self.multihead = nn.ModuleList([head(head_size) for _ in range(num_heads)])     self.output_linear = nn.Linear(embeds_size, embeds_size)     self.dropout = nn.Dropout(0.1) def forward(self, hidden_state):     hidden_state = torch.cat([head(hidden_state) for head in self.multihead], dim=-1)     hidden_state = self.output_linear(hidden_state)     hidden_state = self.dropout(hidden_state)     return hidden_state●    self.multihead: The variable creates four heads and we do this with nn.ModuleList.●    self.output_linear: Another transformer linear layer we apply at the end of the multihead self-attention process.●    self.dropout: Using dropout on the final results.●    hidden_state 1: Concatenating the output of heads so that we have the same shape as input. Heads transform data into different spaces with smaller dimensions, and then do the self-attention.●    hidden_state 2: After doing communication between tokens with self-attention, we use the self.output_linear projector to let the model adjust vectors further based on the gradients that flow through the layer.●    dropout: Run dropout on the output of the projection with a 10% probability of turning off values (make them zero) in the vectors.Transformer blockThere are two new techniques, including layer normalization and residual connection, that need to be explained:class transformer_block(nn.Module): def __init__(self, embeds_size=32, num_heads=8):     super().__init__()     self.head_count = embeds_size // num_heads     self.n_heads = multihead(num_heads, self.head_count)     self.ffn = nn.Sequential(         nn.Linear(embeds_size, 4 * embeds_size),         nn.ReLU(),         nn.Linear(4 * embeds_size, embeds_size),         nn.Dropout(drop_prob),     )     self.ln1 = nn.LayerNorm(embeds_size)     self.ln2 = nn.LayerNorm(embeds_size) def forward(self, hidden_state):     hidden_state = hidden_state + self.n_heads(self.ln1(hidden_state))     hidden_state = hidden_state + self.ffn(self.ln2(hidden_state))     return hidden_state self.head_count: Calculates the head size. The number of heads should be divisible by the embedding size so that we can concatenate the output of heads.self.n_heads: The multihead self-attention layer. self.ffn: This is the first time that we have non-linearity in our model. Non-linearity helps the model to capture complex relationships and patterns in the data. By introducing non-linearity through ReLU activation functions, or GLUE, the model can make a correlation for the data. As a result, it better models the intricacies of the input data. Non-linearity is like “you go to the next layer”, “You don’t go to the next layer”, or “Create y from x for the next layer”. The recommended hidden layer size is a number four times bigger than the embedding size. That’s why “4 * embeds_size”. You can also try SwiGLU as the activation function instead of ReLU.self.ln1 and self.ln2: Layer normalizers make the model more robust and they also help the model to converge faster. Layer normalization rescales the data in such a way that the mean is zero and the standard deviation is one. hidden_state 1: Normalize the vectors with self.ln1 and forward the vectors to the multihead attention. Next, we add the input to the output of multihead attention. It helps the model in two ways:○    First, the model has some information from the original vectors. ○    Second, when the model becomes deep, during backpropagation, the gradients will be weak for earlier layers and the model will converge too slowly. We recognize this effect as gradient vanishing. Adding the input helps to enrich the gradients and mitigate the gradient vanishing. We recognize it as a residual connection.hidden_state 2: Hidden_state 1 goes to a layer normalization and then to a nonlinear network. The output will be added to the hidden state with the aim of keeping gradients for all layers.The modelAll the necessary parts are ready, let us stack them up to make the full model:class transformer(nn.Module): def __init__(self):     super().__init__()     self.stack = nn.ModuleDict(dict(         tok_embs=nn.Embedding(vocab_size, embeds_size),         pos_embs=nn.Embedding(block_size, embeds_size),         dropout=nn.Dropout(drop_prob),         blocks=nn.Sequential(             transformer_block(),             transformer_block(),             transformer_block(),             transformer_block(),             transformer_block(),         ),         ln=nn.LayerNorm(embeds_size),         lm_head=nn.Linear(embeds_size, vocab_size),     ))●    self.stack: A list of all necessary layers.●    tok_embs: This is a learnable lookup table that receives a list of indices and returns their vectors.●    pos_embs: Just like tok_embs, it is also a learnable look-up table, but for positional embedding. It receives a list of positions and returns their vectors.●    dropout: Dropout layer.●    blocks: We create multiple transformer blocks sequentially.●    ln: A layer normalization.●    lm_heas: Transformer head receives a token and returns probabilities of the next token. To change the model to be a classifier, or a sentimental analysis model, we just need to change this layer and remove masking from the self-attention layer.The forward method of the transformer class:    def forward(self, seq, targets=None):     B, T = seq.shape     tok_emb = self.stack.tok_embs(seq) # (batch, block_size, embed_dim) (B,T,C)     pos_emb = self.stack.pos_embs(torch.arange(T, device=device))     x = tok_emb + pos_emb     x = self.stack.dropout(x)     x = self.stack.blocks(x)     x = self.stack.ln(x)     logits = self.stack.lm_head(x) # (B, block_size, vocab_size)     if targets is None:         loss = None     else:         B, T, C = logits.shape         logits = logits.view(B * T, C)         targets = targets.view(B * T)         loss = F.cross_entropy(logits, targets)     return logits, loss●  tok_emb: Convert token indices into vectors. Given the input (B, T), the output is (B, T, C), where C is the embeds_size.●  pos_emb: Given the number of tokens in the context window or block_size, it returns the positional embedding of each position.●  x 1: Add up token embeddings and position embeddings. A little bit lossy but it works just fine.●  x 2: Run dropout on embeddings.●  x 3: The embeddings go through all the transformer blocks, and multihead self-attention. The input is (B, T, C) and the output is (B, T, C).●  x 4: The outcome of transformer blocks goes to the layer normalization.●  logits: We usually recognize the unnormalized values extracted from the language model head as logits :)●  if-else block: Were the targets specified, we calculated cross-entropy loss. Otherwise, the loss will be None. Before calculating loss in the else block, we change the shape as the cross entropy function expects.●  Output: The method returns logits with shape (batch_size, block_size, vocab_size) and loss if any.For generating a text, add this to the transformer class:    def autocomplete(self, seq, _len=10):        for _ in range(_len):            seq_crop = seq[:, -block_size:] # crop it            logits, _ = self(seq_crop)            logits = logits[:, -1, :] # we care about the last token            probs = F.softmax(logits, dim=-1)            next_char = torch.multinomial(probs, num_samples=1)            seq = torch.cat((seq, next_char), dim=1)        return seq●  autocomplete: Given a tokenized sequence, and the number of tokens that need to be created, this method returns _len tokens.●  seq_crop: Select the last n tokens in the sequence to give it to the model. The sequence length might be larger than the block_size and it causes an error if we don’t crop it.●  logits 1: Forward the sequence into the model to receive the logits.●  logits 2: Select the last logit that will be used to select the next token.●  probs: Run the softmax on logits to get a probability distribution.●  next_char: Multinomial selects one sample from the probs. The higher the probability of a token, the higher the chance of being selected.●  seq: Add the selected character to the sequence.TrainingThe rest of the code is downstream tasks such as training loops, etc. The codes that are provided here are slightly different from the tiny-transformer repository. I trained the model with the following hyperparameters:block_size = 256 learning_rate = 9e-4 eval_interval = 300 # Every n step, we do an evaluation. iterations = 5000 # Like epochs batch_size = 64 embeds_size = 195 num_heads = 5 num_layers = 5 drop_prob = 0.15And here’s the generated text:If you need to improve the quality, increase embeds_size, num_layers, and heads.ConclusionThe article explores transformers' text generation role, detailing token preprocessing through self-attention and neural network heads. Transformers predict tokens using context length as a hyperparameter. Human context comprehension is paralleled, highlighting relevant word emergence and fading of irrelevant words for precise selection. Transformers lack human foresight and backtracking. Key components—self-attention, multihead self-attention, and transformer blocks—are explained, and supported by code snippets. Token and positional embeddings, layer normalization, and residual connections are detailed. The model's text generation is exemplified via the autocomplete method. Training parameters and text quality enhancement are addressed, showcasing transformers' potential.Author BioSaeed Dehqan trains language models from scratch. Currently, his work is centered around Language Models for text generation, and he possesses a strong understanding of the underlying concepts of neural networks. He is proficient in using optimizers such as genetic algorithms to fine-tune network hyperparameters and has experience with neural architecture search (NAS) by using reinforcement learning (RL). He implements models starting from data gathering to monitoring, and deployment on mobile, web, cloud, etc. 
Read more
  • 0
  • 0
  • 17357

article-image-llm-powered-chatbots-for-financial-queries
Alan Bernardo Palacio
12 Sep 2023
27 min read
Save for later

LLM-powered Chatbots for Financial Queries

Alan Bernardo Palacio
12 Sep 2023
27 min read
IntroductionIn the ever-evolving realm of digital finance, the convergence of user-centric design and cutting-edge AI technologies is pushing the boundaries of innovation. But with the influx of data and queries, how can we better serve users and provide instantaneous, accurate insights? Within this context, Large Language Models (LLMs) have emerged as a revolutionary tool, providing businesses and developers with powerful capabilities. This hands-on article will walk you through the process of leveraging LLMs to create a chatbot that can query real-time financial data extracted from the NYSE to address users' queries in real time about the current market state. We will dive into the world of LLMs, explore their potential, and understand how they seamlessly integrate with databases using LangChain. Furthermore, we'll fetch real-time data using the finance package offering the chatbot the ability to answer questions using current data.In this comprehensive tutorial, you'll gain proficiency in diverse aspects of modern software development. You'll first delve into the realm of database interactions, mastering the setup and manipulation of a MySQL database to store essential financial ticker data. Unveil the intricate synergy between Large Language Models (LLMs) and SQL through the innovative LangChain tool, which empowers you to bridge natural language understanding and database operations seamlessly. Moving forward, you'll explore the dynamic fusion of Streamlit and LLMs as you demystify the mechanics behind crafting a user-friendly front. Witness the transformation of your interface using OpenAI's Davinci model, enhancing user engagement with its profound knowledge. As your journey progresses, you'll embrace the realm of containerization, ensuring your application's agility and scalability by harnessing the power of Docker. Grasp the nuances of constructing a potent Dockerfile and orchestrating dependencies, solidifying your grasp on holistic software development practices.By the end of this guide, readers will be equipped with the knowledge to design, implement, and deploy an intelligent, finance-focused chatbot. This isn't just about blending frontend and backend technologies; it's about crafting a digital assistant ready to revolutionize the way users interact with financial data. Let's dive in!The Power of Containerization with Docker ComposeNavigating the intricacies of modern software deployment is simplified through the strategic implementation of Docker Compose. This orchestration tool plays a pivotal role in harmonizing multiple components within a local environment, ensuring they collaborate seamlessly.Docker allows to deploy multiple components seamlessly in a local environment. In our journey, we will use docker-compose to harmonize various components, including MySQL for data storage, a Python script as a data fetcher for financial insights, and a Streamlit-based web application that bridges the gap between the user and the chatbot.Our deployment landscape consists of several interconnected components, each contributing to the finesse of our intelligent chatbot. The cornerstone of this orchestration is the docker-compose.yml file, a blueprint that encapsulates the deployment architecture, coordinating the services to deliver a holistic user experience.With Docker Compose, we can efficiently and consistently deploy multiple interconnected services. Let's dive into the structure of our docker-compose.yml:version: '3' services: db:    image: mysql:8.0    environment:      - MYSQL_ROOT_PASSWORD=root_password      - MYSQL_DATABASE=tickers_db      - MYSQL_USER=my_user   # Replace with your desired username      - MYSQL_PASSWORD=my_pass  # Replace with your desired password    volumes:      - ./db/setup.sql:/docker-entrypoint-initdb.d/setup.sql    ports:      - "3306:3306"  # Maps port 3306 in the container to port 3306 on the host ticker-fetcher:    image: ticker/python    build:      context: ./ticker_fetcher    depends_on:      - db    environment:      - DB_USER=my_user   # Must match the MYSQL_USER from above      - DB_PASSWORD=my_pass   # Must match the MYSQL_PASSWORD from above      - DB_NAME=tickers_db app:    build:      context: ./app    ports:      - 8501:8501    environment:      - OPENAI_API_KEY=${OPENAI_API_KEY}    depends_on:      - ticker-fetcherContained within the composition are three distinctive services:db: A MySQL database container, configured with environmental variables for establishing a secure and efficient connection. This container is the bedrock upon which our financial data repository, named tickers_db, is built. A volume is attached to import a setup SQL script, enabling rapid setup.ticker-fetcher: This service houses the heart of our real-time data acquisition system. Crafted around a custom Python image, it plays the crucial role of fetching the latest stock information from Yahoo Finance. It relies on the db service to persistently store the fetched data, ensuring that our chatbot's insights are real-time data.app: The crown jewel of our user interface is the Streamlit application, which bridges the gap between users and the chatbot. This container grants users access to OpenAI's LLM model. It harmonizes with the ticker-fetcher service to ensure that the data presented to users is not only insightful but also dynamic.Docker Compose's brilliance lies in its capacity to encapsulate these services within isolated, reproducible containers. While Docker inherently fosters isolation, Docker Compose takes it a step further by ensuring that each service plays its designated role in perfect sync.The docker-compose.yml configuration file serves as the conductor's baton, ensuring each service plays its part with precision and finesse. As you journey deeper into the deployment architecture, you'll uncover the intricate mechanisms powering the ticker-fetcher container, ensuring a continuous flow of fresh financial data. Through the lens of Docker Compose, the union of user-centric design, cutting-edge AI, and streamlined deployment becomes not just a vision, but a tangible reality poised to transform the way we interact with financial data.Enabling Real-Time Financial Data AcquisitionAt the core of our innovative architecture lies the pivotal component dedicated to real-time financial data acquisition. This essential module operates as the engine that drives our chatbot's ability to deliver up-to-the-minute insights from the ever-fluctuating financial landscape.Crafted as a dedicated Docker container, this module is powered by a Python script that through the yfinance package, retrieves the latest stock information directly from Yahoo Finance. The result is a continuous stream of the freshest financial intelligence, ensuring that our chatbot remains armed with the most current and accurate market data.Our Python script, fetcher.py, looks as follows:import os import time import yfinance as yf import mysql.connector import pandas_market_calendars as mcal import pandas as pd import traceback DB_USER = os.environ.get('DB_USER') DB_PASSWORD = os.environ.get('DB_PASSWORD') DB_NAME = 'tickers_db' DB_HOST = 'db' DB_PORT = 3306 def connect_to_db():    return mysql.connector.connect(        host=os.getenv("DB_HOST", "db"),        port=os.getenv("DB_PORT", 3306),        user=os.getenv("DB_USER"),        password=os.getenv("DB_PASSWORD"),        database=os.getenv("DB_NAME"),    ) def wait_for_db():    while True:        try:            conn = connect_to_db()            conn.close()            return        except mysql.connector.Error:            print("Unable to connect to the database. Retrying in 5 seconds...")            time.sleep(5) def is_market_open():    # Get the NYSE calendar    nyse = mcal.get_calendar('NYSE')    # Get the current timestamp and make it timezone-naive    now = pd.Timestamp.now(tz='UTC').tz_localize(None)    print("Now its:",now)    # Get the market open and close times for today    market_schedule = nyse.schedule(start_date=now, end_date=now)    # If the market isn't open at all today (e.g., a weekend or holiday)    if market_schedule.empty:        print('market is empty')        return False    # Today's schedule    print("Today's schedule")    # Check if the current time is within the trading hours    market_open = market_schedule.iloc[0]['market_open'].tz_localize(None)    market_close = market_schedule.iloc[0]['market_close'].tz_localize(None)    print("market_open",market_open)    print("market_close",market_close)    market_open_now = market_open <= now <= market_close    print("Is market open now:",market_open_now)    return market_open_now def chunks(lst, n):    """Yield successive n-sized chunks from lst."""    for i in range(0, len(lst), n):        yield lst[i:i + n] if __name__ == "__main__":    wait_for_db()    print("-"*50)    tickers = ["AAPL", "GOOGL"]  # Add or modify the tickers you want      print("Perform backfill once")    # historical_backfill(tickers)    data = yf.download(tickers, period="5d", interval="1m", group_by="ticker", timeout=10) # added timeout    print("Data fetched from yfinance.")    print("Head")    print(data.head().to_string())    print("Tail")    print(data.head().to_string())    print("-"*50)    print("Inserting data")    ticker_data = []    for ticker in tickers:        for idx, row in data[ticker].iterrows():            ticker_data.append({                'ticker': ticker,                'open': row['Open'],                'high': row['High'],                'low': row['Low'],                'close': row['Close'],                'volume': row['Volume'],                'datetime': idx.strftime('%Y-%m-%d %H:%M:%S')            })    # Insert data in bulk    batch_size=200    conn = connect_to_db()    cursor = conn.cursor()    # Create a placeholder SQL query    query = """INSERT INTO ticker_history (ticker, open, high, low, close, volume, datetime)               VALUES (%s, %s, %s, %s, %s, %s, %s)"""    # Convert the data into a list of tuples    data_tuples = []    for record in ticker_data:        for key, value in record.items():            if pd.isna(value):                record[key] = None        data_tuples.append((record['ticker'], record['open'], record['high'], record['low'],                            record['close'], record['volume'], record['datetime']))    # Insert records in chunks/batches    for chunk in chunks(data_tuples, batch_size):        cursor.executemany(query, chunk)        print(f"Inserted batch of {len(chunk)} records")    conn.commit()    cursor.close()    conn.close()    print("-"*50)    # Wait until starting to insert live values    time.sleep(60)    while True:        if is_market_open():            print("Market is open. Fetching data.")            print("Fetching data from yfinance...")            data = yf.download(tickers, period="1d", interval="1m", group_by="ticker", timeout=10) # added timeout            print("Data fetched from yfinance.")            print(data.head().to_string())                      ticker_data = []            for ticker in tickers:                latest_data = data[ticker].iloc[-1]                ticker_data.append({                    'ticker': ticker,                    'open': latest_data['Open'],                    'high': latest_data['High'],                    'low': latest_data['Low'],                    'close': latest_data['Close'],                    'volume': latest_data['Volume'],                    'datetime': latest_data.name.strftime('%Y-%m-%d %H:%M:%S')                })                # Insert the data                conn = connect_to_db()                cursor = conn.cursor()                print("Inserting data")                total_tickers = len(ticker_data)                for record in ticker_data:                    for key, value in record.items():                        if pd.isna(value):                            record[key] = "NULL"                    query = f"""INSERT INTO ticker_history (ticker, open, high, low, close, volume, datetime)                                VALUES (                                    '{record['ticker']}',{record['open']},{record['high']},{record['low']},{record['close']},{record['volume']},'{record['datetime']}')"""                    print(query)                    cursor.execute(query)                print("Data inserted")                conn.commit()                cursor.close()                conn.close()            print("Inserted data, waiting for the next batch in one minute.")            print("-"*50)            time.sleep(60)        else:            print("Market is closed. Waiting...")            print("-"*50)            time.sleep(60)  # Wait for 60 seconds before checking againWithin its code, the script seamlessly navigates through a series of well-defined stages:Database Connectivity: The script initiates by establishing a secure connection to our MySQL database. With the aid of the connect_to_db() function, a connection is created while the wait_for_db() mechanism guarantees the script's execution occurs only once the database service is fully primed.Market Schedule Evaluation: Vital to the script's operation is the is_market_open() function, which determines the market's operational status. By leveraging the pandas_market_calendars package, this function ascertains whether the New York Stock Exchange (NYSE) is currently active.Data Retrieval and Integration: During its maiden voyage, fetcher.py fetches historical stock data from the past five days for a specified list of tickers—typically major entities such as AAPL and GOOGL. This data is meticulously processed and subsequently integrated into the tickers_db database. During subsequent cycles, while the market is live, the script periodically procures real-time data at one-minute intervals.Batched Data Injection: Handling substantial volumes of stock data necessitates an efficient approach. To address this, the script ingeniously partitions the data into manageable chunks and employs batched SQL INSERT statements to populate our database. This technique ensures optimal performance and streamlined data insertion.Now let’s discuss the Dockerfile that defines this container. The Dockerfile is the blueprint for building the ticker-fetcher container. It dictates how the Python environment will be set up inside the container.FROM python:3 WORKDIR /app COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt COPY . . CMD ["python", "-u", "fetcher.py"] Base Image: We start with a basic Python 3 image.Working Directory: The working directory inside the container is set to /app.Dependencies Installation: After copying the requirements.txt file into our container, the RUN command installs all necessary Python packages.Starting Point: The script's entry point, fetcher.py, is set as the command to run when the container starts.A list of Python packages needed to run our fetcher script:mysql-connector-python yfinance pandas-market-calendarsmysql-connector-python: Enables the script to connect to our MySQL database.yfinance: Fetches stock data from Yahoo Finance.pandas-market-calendars: Determines the NYSE's operational schedule.As a symphony of technology, the ticker-fetcher container epitomizes precision and reliability, acting as the conduit that channels real-time financial data into our comprehensive architecture.Through this foundational component, the chatbot's prowess in delivering instantaneous, accurate insights comes to life, representing a significant stride toward revolutionizing the interaction between users and financial data.With a continuously updating financial database at our disposal, the next logical step is to harness the potential of Large Language Models. The subsequent section will explore how we integrate LLMs using LangChain, allowing our chatbot to transform raw stock data into insightful conversations.Leveraging Large Language Models with SQL using LangChainThe beauty of modern chatbot systems lies in the synergy between the vast knowledge reservoirs of Large Language Models (LLMs) and real-time, structured data from databases. LangChain is a bridge that efficiently connects these two worlds, enabling seamless interactions between LLMs and databases such as SQL.The marriage between LLMs and SQL databases opens a world of possibilities. With LangChain as the bridge, LLMs can effectively query databases, offering dynamic responses based on stored data. This section delves into the core of our setup, the utils.py file. Here, we knit together the MySQL database with our Streamlit application, defining the agent that stands at the forefront of database interactions.LangChain is a library designed to facilitate the union of LLMs with structured databases. It provides utilities and agents that can direct LLMs to perform database operations via natural language prompts. Instead of a user having to craft SQL queries manually, they can simply ask a question in plain English, which the LLM interprets and translates into the appropriate SQL query.Below, we present the code if utils.py, that brings our LLM-database interaction to life:from langchain import PromptTemplate, FewShotPromptTemplate from langchain.prompts.example_selector import LengthBasedExampleSelector from langchain.llms import OpenAI from langchain.sql_database import SQLDatabase from langchain.agents import create_sql_agent from langchain.agents.agent_toolkits import SQLDatabaseToolkit from langchain.agents.agent_types import AgentType # Database credentials DB_USER = 'my_user' DB_PASSWORD = 'my_pass' DB_NAME = 'tickers_db' DB_HOST = 'db' DB_PORT = 3306 mysql_uri = f"mysql+mysqlconnector://{DB_USER}:{DB_PASSWORD}@{DB_HOST}:{DB_PORT}/{DB_NAME}" # Initialize the SQLDatabase and SQLDatabaseToolkit db = SQLDatabase.from_uri(mysql_uri) toolkit = SQLDatabaseToolkit(db=db, llm=OpenAI(temperature=0)) # Create SQL agent agent_executor = create_sql_agent(    llm=OpenAI(temperature=0),    toolkit=toolkit,    verbose=True,    agent_type=AgentType.ZERO_SHOT_REACT_DESCRIPTION, ) # Modified the generate_response function to now use the SQL agent def query_db(prompt):    return agent_executor.run(prompt)Here's a breakdown of the key components:Database Credentials: These credentials are required to connect our application to the tickers_db database. The MySQL URI string represents this connection.SQLDatabase Initialization: Using the SQLDatabase.from_uri method, LangChain initializes a connection to our database. The SQLDatabaseToolkit provides a set of tools that help our LLM interact with the SQL database.Creating the SQL Agent: The SQL agent, in our case a ZERO_SHOT_REACT_DESCRIPTION type, is the main executor that takes a natural language prompt and translates it into SQL. It then fetches the data and returns it in a comprehensible manner. The agent uses the OpenAI model (an instance of LLM) and the aforementioned toolkit to accomplish this.The query_db function: This is the interface to our SQL agent. Upon receiving a prompt, it triggers the SQL agent to run and then returns the response.The architecture is now in place: on one end, we have a constant stream of financial data being fetched and stored in tickers_db. On the other, we have an LLM ready to interpret and answer user queries. The user might ask, "What was the closing price of AAPL yesterday?" and our system will seamlessly fetch this from the database and provide a well-crafted response, all thanks to LangChain.In the forthcoming section, we'll discuss how we present this powerful capability through an intuitive interface using Streamlit. This will enable end-users, irrespective of their technical proficiency, to harness the combined might of LLMs and structured databases, all with a simple chat interface.Building an Interactive Chatbot with Streamlit, OpenAI, and LangChainIn the age of user-centric design, chatbots represent an ideal solution for user engagement. But while typical chatbots can handle queries, imagine a chatbot powered by both Streamlit for its front end and a Large Language Model (LLM) for its backend intelligence. This powerful union allows us to provide dynamic, intelligent responses, leveraging our stored financial data to answer user queries.The grand finale of our setup is the Streamlit application. This intuitive web interface allows users to converse with our chatbot in natural language, making database queries feel like casual chats. Behind the scenes, it leverages the power of the SQL agent, tapping into the real-time financial data stored in our database, and presenting users with instant, accurate insights.Let's break down our chatbot's core functionality, designed using Streamlit:import streamlit as st from streamlit_chat import message from streamlit_extras.colored_header import colored_header from streamlit_extras.add_vertical_space import add_vertical_space from utils import * # Now the Streamlit app # Sidebar contents with st.sidebar:    st.title('Financial QnA Engine')    st.markdown('''    ## About    This app is an LLM-powered chatbot built using:    - Streamlit    - Open AI Davinci LLM Model    - LangChain    - Finance    ''')    add_vertical_space(5)    st.write('Running in Docker!') # Generate empty lists for generated and past. ## generated stores AI generated responses if 'generated' not in st.session_state:    st.session_state['generated'] = ["Hi, how can I help today?"] ## past stores User's questions if 'past' not in st.session_state:    st.session_state['past'] = ['Hi!'] # Layout of input/response containers input_container = st.container() colored_header(label='', description='', color_name='blue-30') response_container = st.container() # User input ## Function for taking user provided prompt as input def get_text():    input_text = st.text_input("You: ", "", key="input")    return input_text ## Applying the user input box with input_container:    user_input = get_text() # Response output ## Function for taking user prompt as input followed by producing AI generated responses def generate_response(prompt):    response = query_db(prompt)    return response ## Conditional display of AI generated responses as a function of user provided prompts with response_container:    if user_input:        response = generate_response(user_input)        st.session_state.past.append(user_input)        st.session_state.generated.append(response)          if st.session_state['generated']:        for i in range(len(st.session_state['generated'])):            message(st.session_state['past'][i], is_user=True, key=str(i) + '_user',avatar_style='identicon',seed=123)            message(st.session_state["generated"][i], key=str(i),avatar_style='icons',seed=123)The key features of the application are in general:Sidebar Contents: The sidebar provides information about the chatbot, highlighting the technologies used to power it. With the aid of streamlit-extras, we've added vertical spacing for visual appeal.User Interaction Management: Our chatbot uses Streamlit's session_state to remember previous user interactions. The 'past' list stores user queries, and 'generated' stores the LLM-generated responses.Layout: With Streamlit's container feature, the layout is neatly divided into areas for user input and the AI response.User Input: Users interact with our chatbot using a simple text input box, where they type in their query.AI Response: Using the generate_response function, the chatbot processes user input, fetching data from the database using LangChain and LLM to generate an appropriate response. These responses, along with past interactions, are then dynamically displayed in the chat interface using the message function from streamlit_chat.Now in order to ensure portability and ease of deployment, our application is containerized using Docker. Below is the Dockerfile that aids in this process:FROM python:3 WORKDIR /app COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt COPY . . CMD ["streamlit", "run", "streamlit_app.py"]This Dockerfile:Starts with a base Python 3 image.Sets the working directory in the container to /app.Copies the requirements.txt into the container and installs the necessary dependencies.Finally, it copies the rest of the application and sets the default command to run the Streamlit application.Our application depends on several Python libraries which are specified in requirements.txt:streamlit streamlit-chat streamlit-extras mysql-connector-python openai==0.27.8 langchain==0.0.225These include:streamlit: For the main frontend application.streamlit-chat & streamlit-extras: For enhancing the chat interface and adding utility functions like colored headers and vertical spacing.mysql-connector-python: To interact with our MySQL database.openai: For accessing OpenAI's Davinci model.langchain: To bridge the gap between LLMs and our SQL database.Our chatbot application combines now a user-friendly frontend interface with the immense knowledge and adaptability of LLMs.We can verify that the values provided by the chatbot actually reflect the data in the database:In the next section, we'll dive deeper into deployment strategies, ensuring users worldwide can benefit from our financial Q&A engine.ConclusionAs we wrap up this comprehensive guide, we have used from raw financial data to a fully-fledged, AI-powered chatbot, we've traversed a myriad of technologies, frameworks, and paradigms to create something truly exceptional.We initiated our expedition by setting up a MySQL database brimming with financial data. But the real magic began when we introduced LangChain to the mix, establishing a bridge between human-like language understanding and the structured world of databases. This amalgamation ensured that our application could pull relevant financial insights on the fly, using natural language queries. Streamlit stood as the centerpiece of our user interaction. With its dynamic and intuitive design capabilities, we crafted an interface where users could communicate effortlessly. Marrying this with the vast knowledge of OpenAI's Davinci LLM, our chatbot could comprehend, reason, and respond, making financial inquiries a breeze. To ensure that our application was both robust and portable, we harnessed the power of Docker. This ensured that irrespective of the environment, our chatbot was always ready to assist, without the hassles of dependency management.We've showcased just the tip of the iceberg. The bigger picture is captivating, revealing countless potential applications. Deploying such tools in firms could help clients get real-time insights into their portfolios. Integrating such systems with personal finance apps can help individuals make informed decisions about investments, savings, or even daily expenses. The true potential unfolds when you, the reader, take these foundational blocks and experiment, innovate, and iterate. The amalgamation of LLMs, databases, and interactive interfaces like Streamlit opens a realm of possibilities limited only by imagination.As we conclude, remember that in the world of technology, learning is a continuous journey. What we've built today is a stepping stone. Challenge yourself, experiment with new data sets, refine the user experience, or even integrate more advanced features. The horizon is vast, and the opportunities, endless. Embrace this newfound knowledge and craft the next big thing in financial technology. After all, every revolution starts with a single step, and today, you've taken many. Happy coding!Author BioAlan Bernardo Palacio is a data scientist and an engineer with vast experience in different engineering fields. His focus has been the development and application of state-of-the-art data products and algorithms in several industries. He has worked for companies such as Ernst and Young, and Globant, and now holds a data engineer position at Ebiquity Media helping the company to create a scalable data pipeline. Alan graduated with a Mechanical Engineering degree from the National University of Tucuman in 2015, participated as the founder of startups, and later on earned a Master's degree from the faculty of Mathematics at the Autonomous University of Barcelona in 2017. Originally from Argentina, he now works and resides in the Netherlands.LinkedIn
Read more
  • 0
  • 0
  • 16792

article-image-large-language-models-llms-and-knowledge-graphs
Mostafa Ibrahim
15 Nov 2023
7 min read
Save for later

Large Language Models (LLMs) and Knowledge Graphs

Mostafa Ibrahim
15 Nov 2023
7 min read
Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!IntroductionHarnessing the power of AI, this article explores how Large Language Models (LLMs) like OpenAI's GPT can analyze data from Knowledge Graphs to revolutionize data interpretation, particularly in healthcare. We'll illustrate a use case where an LLM assesses patient symptoms from a Knowledge Graph to suggest diagnoses, showcasing LLM’s potential to support medical diagnostics with precision.Brief Introduction Into Large Language Models (LLMs)Large Language Models (LLMs), such as OpenAI's GPT series, represent a significant advancement in the field of artificial intelligence. These models are trained on vast datasets of text, enabling them to understand and generate human-like language.LLMs are adept at understanding complex questions and providing appropriate responses, akin to human analysis. This capability stems from their extensive training on diverse datasets, allowing them to interpret context and generate relevant text-based answers.While LLMs possess advanced data processing capabilities, their effectiveness is often limited by the static nature of their training data. Knowledge Graphs step in to fill this gap, offering a dynamic and continuously updated source of information. This integration not only equips LLMs with the latest data, enhancing the accuracy and relevance of their output but also empowers them to solve more complex problems with a greater level of sophistication. As we harness this powerful combination, we pave the way for innovative solutions across various sectors that demand real-time intelligence, such as the ever-fluctuating stock market.Exploring Knowledge Graphs and How LLMs Can Benefit From ThemKnowledge Graphs represent a pivotal advancement in organizing and utilizing data, especially in enhancing the capabilities of Large Language Models (LLMs).Knowledge Graphs organize data in a graph format, where entities (like people, places, and things) are nodes, and the relationships between them are edges. This structure allows for a more nuanced representation of data and its interconnected nature. Take the above Knowledge Graph as an example.Doctor Node: This node represents the doctor. It is connected to the patient node with an edge labeled "Patient," indicating the doctor-patient relationship.Patient Node (Patient123): This is the central node representing a specific patient, known as "Patient123." It serves as a junction point connecting to various symptoms that the patient is experiencing.Symptom Nodes: There are three separate nodes representing individual symptoms that the patient has: "Fever," "Cough," and "Shortness of breath." Each of these symptoms is connected to the patient node by edges labeled "Symptom," indicating that these are the symptoms experienced by "Patient123.          To simplify, the Knowledge Graph shows that "Patient123" is a patient of the "Doctor" and is experiencing three symptoms: fever, cough, and shortness of breath. This type of graph is useful in medical contexts where it's essential to model the relationships between patients, their healthcare providers, and their medical conditions or symptoms. It allows for easy querying of related data—for example, finding all symptoms associated with a particular patient or identifying all patients experiencing a certain symptom.Practical Integration of LLMs and Knowledge GraphsStep 1: Installing and Importing the Necessary LibrariesIn this step, we're going to bring in two essential libraries: rdflib for constructing our Knowledge Graph and openai for tapping into the capabilities of GPT, the Large Language Model.!pip install rdflib !pip install openai==0.28 import rdflib import openaiStep 2: Import your Personal OPENAI API KEYopenai.api_key = "Insert Your Personal OpenAI API Key Here"Step 3: Creating a Knowledge Graph# Create a new and empty Knowledge graph g = rdflib.Graph() # Define a Namespace for health-related data namespace = rdflib.Namespace("http://example.org/health/")Step 4: Adding data to Our GraphIn this part of the code, we will introduce a single entry to the Knowledge Graph pertaining to patient124. This entry will consist of three distinct nodes, each representing a different symptom exhibited by the patient.def add_patient_data(patient_id, symptoms):    patient_uri = rdflib.URIRef(patient_id)      for symptom in symptoms:        symptom_predicate = namespace.hasSymptom        g.add((patient_uri, symptom_predicate, rdflib.Literal(symptom))) # Example of adding patient data add_patient_data("Patient123", ["fever", "cough", "shortness of breath"])Step 5: Identifying the get_stock_price functionWe will utilize a simple query in order to extract the required data from the knowledge graph.def get_patient_symptoms(patient_id):    # Correctly reference the patient's URI in the SPARQL query    patient_uri = rdflib.URIRef(patient_id)    sparql_query = f"""        PREFIX ex: <http://example.org/health/>        SELECT ?symptom        WHERE {{            <{patient_uri}> ex:hasSymptom ?symptom.        }}    """    query_result = g.query(sparql_query)    symptoms = [str(row.symptom) for row in query_result]    return symptomsStep 6: Identifying the generate_llm_response functionThe generate_daignosis_response function takes as input the user’s name along with the list of symptoms extracted from the graph. Moving on, the LLM uses such data in order to give the patient the most appropriate diagnosis.def generate_diagnosis_response(patient_id, symptoms):    symptoms_list = ", ".join(symptoms)    prompt = f"A patient with the following symptoms - {symptoms_list} - has been observed. Based on these symptoms, what could be a potential diagnosis?"      # Placeholder for LLM response (use the actual OpenAI API)    llm_response = openai.Completion.create(        model="text-davinci-003",        prompt=prompt,        max_tokens=100    )    return llm_response.choices[0].text.strip() # Example usage patient_id = "Patient123" symptoms = get_patient_symptoms(patient_id) if symptoms:    diagnosis = generate_diagnosis_response(patient_id, symptoms)    print(diagnosis) else:    print(f"No symptoms found for {patient_id}.")Output: The potential diagnosis could be pneumonia. Pneumonia is a type of respiratory infection that causes symptoms including fever, cough, and shortness of breath. Other potential diagnoses should be considered as well and should be discussed with a medical professional.As demonstrated, the LLM connected the three symptoms—fever, cough, and shortness of breath—to suggest that patient123 may potentially be diagnosed with pneumonia.ConclusionIn summary, the collaboration of Large Language Models and Knowledge Graphs presents a substantial advancement in the realm of data analysis. This article has provided a straightforward illustration of their potential when working in tandem, with LLMs to efficiently extract and interpret data from Knowledge Graphs.As we further develop and refine these technologies, we hold the promise of significantly improving analytical capabilities and informing more sophisticated decision-making in an increasingly data-driven world.Author BioMostafa Ibrahim is a dedicated software engineer based in London, where he works in the dynamic field of Fintech. His professional journey is driven by a passion for cutting-edge technologies, particularly in the realms of machine learning and bioinformatics. When he's not immersed in coding or data analysis, Mostafa loves to travel.Medium
Read more
  • 0
  • 0
  • 16177
article-image-fine-tuning-llama-2
Prakhar Mishra
06 Nov 2023
9 min read
Save for later

Fine-Tuning LLaMA 2

Prakhar Mishra
06 Nov 2023
9 min read
Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!IntroductionLarge Language Models have recently become the talk of the town. I am very sure, you must have heard of ChatGPT. Yes, that’s an LLM, and that’s what I am talking about. Every few weeks, we have been witnessing newer, better but not necessarily larger LLMs coming out either as open-source or closed-source. This is probably the best time to learn about them and make these powerful models work for your specific use case.In today’s blog, we will look into one of the recent open-source models called Llama2 and try to fine-tune it on a standard NLP task of recognizing entities from text. We will first look into what are large language models, what are open-source and closed-source models, and some examples of them. We will then move to learning about Llama2 and why is it so special. We then describe our NLP task and dataset. Finally, we get into coding.About Large Language Models (LLMs)Language models are artificial intelligence systems that have been trained to understand and generate human language. Large Language Models (LLMs) like GPT-3, ChatGPT, GPT-4, Bard, and similar can perform diverse sets of tasks out of the box. Often the quality of output from these large language models is highly dependent on the finesse of the prompt given by the user.These Language models are trained on vast amounts of text data from the Internet. Most of the language models are trained in an auto-regressive way i.e. they try to maximize the probability of the next word based on the words they have produced or seen in the past. This data includes a wide range of written text, from books and articles to websites and social media posts. Language models have a wide range of applications, including chatbots, virtual assistants, content generation, and more. They can be used in industries like customer service, healthcare, finance, and marketing.Since these models are trained on enormous data, they are already good at zero-shot inference and can be steered to perform better with few-shot examples. Zero-shot is a setup in which a model can learn to recognize things that it hasn't explicitly seen before in training. In a Few-shot setting, the goal is to make predictions for new classes based on the few examples of labeled data that is provided to it at inference time.Despite their amazing capabilities of generating text, these humongous models come with a few limitations that must be thought of when building an LLM-based production pipeline. Some of these limitations are hallucinations, biases, and more.Closed and Open-source Language ModelsLarge language models from closed-source are those employed by some companies and are not readily accessible to the public. Training data for these models are typically kept private. While they can be highly sophisticated, this limits transparency, potentially leading to concerns about bias, and data privacy.In contrast, open-source projects like GPT-3, are designed to be freely available to researchers and developers. These models are trained on extensive, publicly available datasets, allowing for a degree of transparency and collaboration.The decision between closed- and open-source language models is influenced by several variables, such as the project's goals, the need for openness, and others.About LLama2Meta's open-source LLM is called Llama 2. It was trained with 2 trillion "tokens" from publicly available sources like Wikipedia, Common Crawl, and books from the Gutenberg project. Three different parameter level model versions are available, i.e. 7 billion, 13 billion, and 70 billion parameter models. There are two types of completion models available: Chat-tuned and General. The chat-tuned models that have been fine-tuned for chatbot-like dialogue are denoted by the suffix '-chat'. We will use general Meta's 7b Llama-2 huggingface model as the base model that we fine-tune. Feel free to use any other version of llama2-7b.Also, if you are interested, there are several threads that you can go through to understand how good is Llama family w.r.t GPT family is - source, source, source.About Named Entity RecognitionAs a component of information extraction, named-entity recognition locates and categorizes specific entities inside the unstructured text by allocating them to pre-defined groups, such as individuals, organizations, locations, measures, and more. NER offers a quick way to understand the core idea or content of a lengthy text.There are many ways of extracting entities from a given text, in this blog, we will specifically delve into fine-tuning Llama2-7b using PEFT techniques on Colab Notebook.We will transform the SMSSpamCollection classification data set for NER. Pretty interesting 😀We search through all 10 letter words and tag them as 10_WORDS_LONG. And this is the entity that we want our Llama to extract. But why this bizarre formulation? I did it intentionally to show that this is something that the pre-trained model would not have seen during the pre-training stage. So it becomes essential to fine-tune it and make it work for our use case 👍. But surely we can add logic to our formulation - think of these words as probable outliers/noisy words. The larger the words, the higher the possibility of it being noise/oov. However, you will have to come up with the extract letter count after seeing the word length distribution. Please note that the code is generic enough for fine-tuning any number of entities. It’s just a change in the data preparation step that we will make to slice out only relevant entities.Code for Fine-tuning Llama2-7b# Importing Libraries from transformers import LlamaTokenizer, LlamaForCausalLM import torch from datasets import Dataset import transformers import pandas as pd from peft import get_peft_model, LoraConfig, TaskType, prepare_model_for_int8_training, get_peft_model_state_dict, PeftModel from sklearn.utils import shuffleData Preparation Phasedf = pd.read_csv('SMSSpamCollection', sep='\t', header=None)  all_text = df[1].str.lower().tolist()  input, output = [], []  for text in all_text:               input.append(text)               output.append({word: '10_WORDS_LONG' for word in text.split() if len(word)==10}) df = pd.DataFrame([input, output]).T df.rename({0:'input_text', 1: 'output_text'}, axis=1, inplace=True) print (df.head(5)) total_ds = shuffle(df, random_state=42) total_train_ds = total_ds.head(4000) total_test_ds = total_ds.tail(1500) total_train_ds_hf = Dataset.from_pandas(total_train_ds) total_test_ds_hf = Dataset.from_pandas(total_test_ds) tokenized_tr_ds = total_train_ds_hf.map(generate_and_tokenize_prompt) tokenized_te_ds = total_test_ds_hf.map(generate_and_tokenize_prompt) Fine-tuning Phase# Loading Modelmodel_name = "meta-llama/Llama-2-7b-hf" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name) def create_peft_config(m): peft_cofig = LoraConfig( task_type=TaskType.CAUSAL_LM, inference_mode=False, r=8, lora_alpha=16, lora_dropout=0.05, target_modules=['q_proj', 'v_proj'], ) model = prepare_model_for_int8_training(model) model.enable_input_require_grads() model = get_peft_model(model, peft_cofig) model.print_trainable_parameters() return model, peft_cofig model, lora_config = create_peft_config(model) def generate_prompt(data_point): return f"""Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. ### Instruction: Extract entity from the given input: ### Input: {data_point["input_text"]} ### Response: {data_point["output_text"]}""" tokenizer.pad_token_id = 0 def tokenize(prompt, add_eos_token=True): result = tokenizer( prompt, truncation=True, max_length=128, padding=False, return_tensors=None, ) if ( result["input_ids"][-1] != tokenizer.eos_token_id and len(result["input_ids"]) < 128 and add_eos_token ): result["input_ids"].append(tokenizer.eos_token_id) result["attention_mask"].append(1) result["labels"] = result["input_ids"].copy() return result def generate_and_tokenize_prompt(data_point): full_prompt = generate_prompt(data_point) tokenized_full_prompt = tokenize(full_prompt) return tokenized_full_prompt training_arguments = transformers.TrainingArguments(  per_device_train_batch_size=1, gradient_accumulation_steps=16,  learning_rate=4e-05,  logging_steps=100,  optim="adamw_torch",  evaluation_strategy="steps",  save_strategy="steps",  eval_steps=100,  save_steps=100,  output_dir="saved_models/" ) data_collator = transformers.DataCollatorForSeq2Seq(tokenizer) trainer = transformers.Trainer(model=model, tokenizer=tokenizer, train_dataset=tokenized_tr_ds, eval_dataset=tokenized_te_ds, args=training_arguments, data_collator=data_collator) with torch.autocast("cuda"):       trainer.train()InferenceLoaded_tokenizer = LlamaTokenizer.from_pretrained(model_name) Loaded_model = LlamaForCausalLM.from_pretrained(model_name, load_in_8bit=True, torch.dtype=torch.float16, device_map=’auto’) Model = PeftModel.from_pretrained(Loaded_model, “saved_model_path”, torch.dtype=torch.float16) Model.config.pad_tokeni_id = loaded_tokenizer.pad_token_id = 0 Model.eval() def extract_entity(text):   inp = Loaded_tokenizer(prompt, return_tensor=’pt’).to(“cuda”)   with torch.no_grad():       P_ent = Loaded_tokenizer.decode(model.generate(**inp, max_new_tokens=128)[0], skip_special_tokens=True)       int_idx = P_ent.find(‘Response:’)       P_ent = P_ent[int_idx+len(‘Response:’):]   return P_ent.strip() extracted_entity = extract_entity(text) print (extracted_entity) ConclusionWe covered the process of optimizing the llama2-7b model for the Named Entity Recognition job in this blog post. For that matter, it can be any task that you are interested in. The core concept that one must learn from this blog is PEFT-based training of large language models. Additionally, as pre-trained LLMs might not always perform well in your work, it is best to fine-tune these models.Author BioPrakhar Mishra has a Master’s in Data Science with over 4 years of experience in industry across various sectors like Retail, Healthcare, Consumer Analytics, etc. His research interests include Natural Language Understanding and generation, and has published multiple research papers in reputed international publications in the relevant domain. Feel free to reach out to him on LinkedIn
Read more
  • 0
  • 0
  • 16143

article-image-making-the-best-out-of-hugging-face-hub-using-langchain
Ms. Valentina Alto
17 Jun 2023
6 min read
Save for later

Making the best out of Hugging Face Hub using LangChain

Ms. Valentina Alto
17 Jun 2023
6 min read
Since the launch of ChatGPT in November 2022, everyone is talking about GPT models and OpenAI. There is no doubt that the Generative Pre-trained Transformers (GPT) architecture developed by OpenAI has demonstrated incredible results, also given the investments in training (almost 500 billion tokens) and complexity of the model (175 billion parameters for the GPT-3).Nevertheless, there is an incredible number of open-source Large Language Models (LLMs) that have been widespread in the last months. Below are some examples:Dolly: 12 billion parameters LLM developed by Databricks and trained on their ML platform. Source codeàhttps://github.com/databrickslabs/dollyStableML: This is a series of LLM developed by StabilityAI, the company behind the popular image generation model Stable Diffusion. The series encompasses a variety of LLMs, some of which are fine-tuned on specific use cases. Source codeàhttps://github.com/Stability-AI/StableLMFalcon LLM: A 40 billion parameters LLM developed by the Technology Innovation Institute and trained on a particularly high-quality dataset called RefinedWeb. Plus, as for now (June 2023) ranks 1 globally in the latest Hugging Face independent verification of open-source AI models. Source codeà https://huggingface.co/tiiuaeGPT NeoX and GPT-J: An open-source reproduction of the OpenAI’s GPT series developed by Eulether AI, with respectively 40 and 6 billion parameters. Source codeà https://huggingface.co/EleutherAI/gpt-neox-20b and https://huggingface.co/EleutherAI/gpt-j-6bOpenLLaMa: As for the previous class of models, also this one is an open-source reproduction of Meta AI’s LLaMA and has 3.7 billion parameters. Source codeàhttps://github.com/openlm-research/open_llamaIf you are interested in getting deeper into those models and their performance, you can reference the Hugging Face leaderboard here.Image1: Hugging Face Leaderboard Now, LLMs are great, yet to unlock their real power we need them to be positioned within an applicative logic. In other words, we want our LLMs to infuse intelligence within our applications.For this purpose, we will be using LangChain, a powerful lightweight SDK which makes it easier to integrate and orchestrate LLMs within applications. LangChain is one of the most popular LLMs orchestrators, yet if you want to explore further packages I encourage you to read about Semantic Kernel and Jarvis.One of the nice things about LangChain is its integration with external tools: those might be OpenAI (and other LLMs vendors), data sources, search APIs, and so on. In this article, we are going to explore how LangChain makes it easier to leverage open-source LLMs by leveraging its integration with the Hugging Face Hub.Welcome to the realm of open source LLMsThe Hugging Face Hub serves as a comprehensive platform comprising more than 120k models, 20kdatasets, and 50k demo apps (Spaces), all of which are openly accessible and shared as open-source projects. It provides an online environment where developers can effortlessly collaborate and collectively develop machine learning solutions. Thanks to LangChain, it is way easier to start interacting with open-source LLMs. Plus, you can also surround those models with all the libraries provided by LangChain in terms of prompt design, memory retention, chain management, and so on.Let’s see an implementation with Python. To reproduce the code, make sure to have: Python 3.7.1 or higheràyou can check your Python version running python --version in your terminalLangChain installedàyou can install it via pip install langchainThe huggingface_hub Python package installedà you can install it via pip install huggingface_butHugging Face Hub API keyàto get the API key, you can register into the portal here and then generate your secret key.For this example, I’m going to use the lightest version of Dolly, developed by Databricks and available in three sizes: 3, 7, and 12 billion parameters.from langchain import HuggingFaceHub from getpass import getpass HUGGINGFACEHUB_API_TOKEN = "your-api-key" import os os.environ["HUGGINGFACEHUB_API_TOKEN"] = HUGGINGFACEHUB_API_TOKEN repo_id = " databricks/dolly-v2-3b" llm = HuggingFaceHub(repo_id=repo_id, model_kwargs={"temperature":0, "max_length":64})As you can see from the above code, the only information we need are our Hugging Face Hub API key and the model’s repo ID; then, LangChain will take care of initializing our model thanks to the direct integration with Hugging Face Hub.Now that we have initialized our model, it is time to define the structure of the prompt:from langchain import PromptTemplate, LLMChain template = """Question: {question} Answer: Let's think step by step.""" prompt = PromptTemplate(template=template, input_variables=["question"]) llm_chain = LLMChain(prompt=prompt, llm=llm)Finally, we can feed our model with a first question:question = "In the first movie of Harry Potter, what is the name of the three-headed dog? “ print(llm_chain.run(question)) Output: The name of the three-headed dog in Harry Potter and the Philosopher Stone is Fuffy.Even though I tested the light version of Dolly with “only” 3 billion parameters, it came with pretty accurate results. Of course, for more complex tasks or real-world projects, heavier models might be taken into consideration, like the one emerging as top performers in the Hugging Face leaderboard mentioned at the beginning of this article.ConclusionThe realm of open-source LLM is growing exponentially, and this creates a vibrant environment of experimentation and tuning from which anyone can benefit. Plus, some interesting trends are rising, like the reduction of the number of models’ parameters in favour of an increase in quality of the training dataset. In fact, we saw that the current top performer among the open source model is Falcon LLM, with “only” 40 billion parameters, which gained its strength from the high-quality training dataset. Finally, with the development of orchestration frameworks like LangChain and similar, it’s getting easier and easier to leverage open source LLMs and integrate them into our applications. Referenceshttps://huggingface.co/docs/hub/indexOpen LLM Leaderboard — a Hugging Face Space by HuggingFaceH4Hugging Face Hub — 🦜🔗 LangChain 0.0.189Overview (huggingface.co)stabilityai (Stability AI) (huggingface.co)Stability-AI/StableLM: StableLM: Stability AI Language Models (github.com)Author BioValentina Alto graduated in 2021 in data science. Since 2020, she has been working at Microsoft as an Azure solution specialist, and since 2022, she has been focusing on data and AI workloads within the manufacturing and pharmaceutical industries. She has been working closely with system integrators on customer projects to deploy cloud architecture with a focus on modern data platforms, data mesh frameworks, IoT and real-time analytics, Azure Machine Learning, Azure Cognitive Services (including Azure OpenAI Service), and Power BI for dashboarding. Since commencing her academic journey, she has been writing tech articles on statistics, machine learning, deep learning, and AI in various publications and has authored a book on the fundamentals of machine learning with Python.Author of the book: Modern Generative AI with ChatGPT and OpenAI ModelsLink - Medium  LinkedIn  
Read more
  • 0
  • 0
  • 15487

article-image-detecting-and-mitigating-hallucinations-in-llms
Ryan Goodman
25 Oct 2023
10 min read
Save for later

Detecting and Mitigating Hallucinations in LLMs

Ryan Goodman
25 Oct 2023
10 min read
Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!IntroductionIn large language models, the term "hallucination" describes a behavior where an AI model produces results that are not entirely accurate or might sound nonsensical. It's important to understand that large language models are not search engines or databases. They do not search for information from external sources or perform complex computations. Instead, large language models (LLM) belong to a category of generative artificial intelligence.Recap How Generative AI WorksGenerative AI is a technology trained on large volumes of data and, as a result, can " generate" text, images, and even audio. This makes it fundamentally different from search engines and other software tools you might be familiar with. This foundational difference presents challenges, most notably that generative AI can’t cite sources for its responses. Large language models are also not designed to solve computational problems like math. However, generative AI can quickly generate code that might solve complex mathematical challenges. A large language model responds to inputs, most notably the text instruction called a "prompt." As the large language model generates text, it uses its training data as a foundation to extrapolate information.Understanding HallucinationsThe simplest way to understand a hallucination is the old game of telephone. In the same way, a message gets distorted in the game of telephone; information can get "distorted" or "hallucinated" as a language model tries to generate outputs based on patterns it observed in its training data. The model might "misremember" or "misinterpret" certain information, leading to inaccuracies.Let's use another model example to understand the concept of generating unique combinations of words in the context of food recipes. Imagine you want to create new recipes by observing existing ones. If you were to build a Markov model for food ingredients, you would:1.  Compile a comprehensive dataset of recipes and extract individual ingredients.2.   Create pairs of neighboring ingredients, like "tomato-basil" and "chicken-rice," and record how often each pair occurs.For example, if you start with the ingredient "chicken," you might notice it's frequently paired with "broccoli" and "garlic" but less so with "pineapple." If you then choose "broccoli" as the next ingredient, it might be equally likely to be paired with "cheese" or "lemon." By following these ingredient pairings, at some point, the model might suggest creative combinations like "chicken-pineapple-lemon," offering new culinary ideas based on observed patterns.This approach allows the Markov model to generate novel recipe ideas based on the statistical likelihood of ingredient pairings.Hallucinations as a FeatureWhen researching or computing factual information, a hallucination is a bad thing. However, the same concept that gets a bad rap for accurate information or research is what makes large language models demonstrate another human condition of creativity. As a developer, if you want to make your language model creative, OpenAI, for example, has a "temperature" input, a hyperparameter that makes the model's outputs more random. A high temperature of 1 or above will result in hallucinations and randomness. For example, a lower temperature of .2 will make the modern outputs more deterministic to match patterns it was trained on.  As an experiment, try inputting a prompt to any large language model chatbot, including ChatGPT, to provide a plot of any romantic story without copying existing accounts on the internet, a new storyline, and new characters. The LLM will offer a fictitious story with characters, a plot, multiple acts, an arc, and an ending.In specific scenarios, end users or developers might intentionally coax their large language models into a state of "hallucination. When seeking out-of-the-box ideas to think beyond its training, you can get abstract ideas. In this scenario, the model's ability to "hallucinate" isn't a bug but rather a feature. To continue the experiment, you can return to ChatGPT and ask it to pretend you have changed the temperature hyperparameter to 1.1 and re-write the story. Your results will be very “creative.”In creative pursuits, like crafting tales or penning poems, these so-called "hallucinations" aren't just tolerated; they're celebrated. They can add depth, surprise, and innovation layers to the generated content.Types of hallucination One can categorize hallucinations into different forms.Intrinsic hallucination happens to contradict the source material directly. It also offers logical inconsistency and factual inaccuracies.Extrinsic hallucination does not contradict. However, at the same time, it cannot be verified against any source. Hence, it adds elements that are considered to be unconfirmable and speculative. Detecting hallucinations Detecting hallucinations in the large language models is a tricky task. LLMs will deliver information with the same tone and certainty even if the answer is unknown. It puts the responsibility of users and developers to be careful about how information from LLMs is used.The following techniques can be utilized to uncover or measure hallucinations in large language models.Identify the grounding dataGrounding data is the standard against which the Large Language Model (LLM) output is measured. The selection of grounding data depends on the specific application. For example, real job resumes could be grounding data for generating resume-related content. Conversely, search engine outcomes could be employed for web-based inquiries. Especially in language translation, the choice of grounding data is pivotal for accurate translation. For example, official legal documents could serve as grounding data for legal translations, ensuring precision in the translated content.Create a measurement test setA measurement test data set comprises input/output pairs incorporating human interactions and the Large Language Model (LLM). These datasets often include various input conditions and their corresponding program outputs. These sets may involve simulated interactions between users and software systems, depending on the scenario.Ideally, there should be a minimum of two kinds of test sets:1. A standard or randomly generated test set that would be conventional but cater to diverse scenarios.2. An adversarial test set is created through performance in edge cases, high-risk situations, or when presented with deliberately misleading or tricky inputs, even security threats.Extract any claimsFollowing the preparation of test data sets, the subsequent stage involves extracting assertions from the Large Language Model (LLM). This extraction can occur manually, through rule-based methodologies, or even by employing machine learning models.In data analysis, the next step is to extract specific patterns from the data after gathering the datasets. This extraction can be done manually or through predefined rules, basic descriptive analytics, or, for large-scale projects, machine learning algorithms. Each method has its merits and drawbacks, which we will thoroughly investigate.Use validations against any grounding dataValidation guarantees that the content generated by the Large Language Model (LLM) corresponds to the grounding data. Frequently, this stage replicates the techniques employed for data extraction.To support the above, here is the code snippet of the same.# Define grounding data (acceptable sentences) grounding_data = [    "The sky is blue.",    "Python is a popular programming language.",    "ChatGPT provides intelligent responses." ] # List of generated sentences to be validated generated_sentences = [    "The sky is blue.",    "ChatGPT is a popular programming language.",    "Python provides intelligent responses." ] # Validate generated sentences against grounding data valid_sentences = [sentence for sentence in generated_sentences if sentence in grounding_data] # Output valid sentences print("Valid Sentences:") for sentence in valid_sentences:    print("- " + sentence) # Output invalid sentences invalid_sentences = list(set(generated_sentences) - set(valid_sentences)) print("\nInvalid Sentences:") for sentence in invalid_sentences:    print("- " + sentence)Output: Valid Sentences: - The sky is blue. Invalid Sentences: - ChatGPT is a popular programming language. - Python provides intelligent responses. Furthermore, in verifying research findings, validation ensures that the conclusions drawn from the research align with the collected data. This process often mirrors the research methods employed earlier.Metrics reportingThe "Grounding Defect Rate" is a crucial metric that measures the proportion of responses lacking context to the total generated outputs. Further metrics will be explored later for a more detailed assessment. For instance, the "Error Rate" is a vital metric indicating the percentage of mistranslated phrases from the translated text. Additional metrics will be introduced later for a comprehensive evaluation.A Multifaceted Approach to Mitigate Hallucination in the Large Language Model Leveraging product designThe developer needs to employ large language models so that it does not create material issues, even when it hallucinates. For example, you would not design an application that writes your annual report or news articles. Instead, opinion pieces or content summarization within a prompt can immediately lower the risk of problematic hallucination.If an app allows AI-generated outputs to be distributed, end users should be able to review and revise the content. It adds a protective layer of scrutiny and puts the responsibility into the hands of the user.Continuous improvement and loggingPersisting prompts and LLM output are essential for auditing purposes. As models evolve, you cannot count on prompting an LLM and getting the same result. However, regression testing and reviewing user input are critical as long as it adheres to data, security, and privacy practice.Prompt engineeringIt is essential to get the best possible output to use the concept of meta prompts effectively. A meta prompt is a high-level instruction given to a language model to guide its output in a specific direction. Rather than asking a direct question, provide context, structure, and guidance to refine the output.For example, instead of asking, "What is photosynthesis?", you can ask, "Explain photosynthesis in simple terms suitable for a 5th-grade student." This will adjust the complexity and style of the answer you get.Multi-Shot PromptsMulti-shot prompts refer to a series of prompts given to a language model, often in succession. The goal is to guide the model step-by-step toward a desired output instead of asking for a large chunk of information in a single prompt.  This approach is extremely useful when the required information is complex or extensive. Typically, these prompts are best delivered as a chat user experience, allowing the user and model to break down the requests into multiple, manageable parts.ConclusionThe issue of hallucination in Large Language Models (LLMs) presents a significant hurdle for consumers, users, and developers. While overhauling the foundational architecture of these models isn't a feasible solution for most, the good news is that there are strategies to navigate these challenges. But beyond these technical solutions, there's an ethical dimension to consider. As developers and innovators harness the power of LLMs, it's imperative to prioritize disclosure and transparency. Only through openness can we ensure that LLMs integrate seamlessly into our daily lives and gain the trust and acceptance they require to revolutionize our digital interactions truly.Author BioRyan Goodman has dedicated 20 years to the business of data and analytics, working as a practitioner, executive, and entrepreneur. He recently founded DataTools Pro after 4 years at Reliant Funding, where he served as the VP of Analytics and BI. There, he implemented a modern data stack, utilized data sciences, integrated cloud analytics, and established a governance structure. Drawing from his experiences as a customer, Ryan is now collaborating with his team to develop rapid deployment industry solutions. These solutions utilize machine learning, LLMs, and modern data platforms to significantly reduce the time to value for data and analytics teams.
Read more
  • 0
  • 0
  • 15296
article-image-spark-and-langchain-for-data-analysis
Alan Bernardo Palacio
31 Aug 2023
12 min read
Save for later

Spark and LangChain for Data Analysis

Alan Bernardo Palacio
31 Aug 2023
12 min read
IntroductionIn today's data-driven world, the demand for extracting insights from large datasets has led to the development of powerful tools and libraries. Apache Spark, a fast and general-purpose cluster computing system, has revolutionized big data processing. Coupled with LangChain, a cutting-edge library built atop advanced language models, you can now seamlessly combine the analytical capabilities of Spark with the natural language interaction facilitated by LangChain. This article introduces Spark, explores the features of LangChain, and provides practical examples of using Spark with LangChain for data analysis.Understanding Apache SparkThe processing and analysis of large datasets have become crucial for organizations and individuals alike. Apache Spark has emerged as a powerful framework that revolutionizes the way we handle big data. Spark is designed for speed, ease of use, and sophisticated analytics. It provides a unified platform for various data processing tasks, such as batch processing, interactive querying, machine learning, and real-time stream processing.At its core, Apache Spark is an open-source, distributed computing system that excels at processing and analyzing large datasets in parallel. Unlike traditional MapReduce systems, Spark introduces the concept of Resilient Distributed Datasets (RDDs), which are immutable distributed collections of data. RDDs can be transformed and operated upon using a wide range of high-level APIs provided by Spark, making it possible to perform complex data manipulations with ease.Key Components of SparkSpark consists of several components that contribute to its versatility and efficiency:Spark Core: The foundation of Spark, responsible for tasks such as task scheduling, memory management, and fault recovery. It also provides APIs for creating and manipulating RDDs.Spark SQL: A module that allows Spark to work seamlessly with structured data using SQL-like queries. It enables users to interact with structured data through the familiar SQL language.Spark Streaming: Enables real-time stream processing, making it possible to process and analyze data in near real-time as it arrives in the system.MLlib (Machine Learning Library): A scalable machine learning library built on top of Spark, offering a wide range of machine learning algorithms and tools.GraphX: A graph processing library that provides abstractions for efficiently manipulating graph-structured data.Spark DataFrame: A higher-level abstraction on top of RDDs, providing a structured and more optimized way to work with data. DataFrames offer optimization opportunities, enabling Spark's Catalyst optimizer to perform query optimization and code generation.Spark's distributed computing architecture enables it to achieve high performance and scalability. It employs a master/worker architecture where a central driver program coordinates tasks across multiple worker nodes. Data is distributed across these nodes, and tasks are executed in parallel on the distributed data.We will be diving into two different types of interaction with Spark, SparkSQL, and Spark Data Frame. Apache Spark is a distributed computing framework with Spark SQL as one of its modules for structured data processing. Spark DataFrame is a distributed collection of data organized into named columns, offering a programming abstraction similar to data frames in R or Python but optimized for distributed processing. It provides a functional programming API, allowing operations like select(), filter(), and groupBy(). On the other hand, Spark SQL allows users to run unmodified SQL queries on Spark data, integrating seamlessly with DataFrames and offering a bridge to BI tools through JDBC/ODBC.Both Spark DataFrame and Spark SQL leverage the Catalyst optimizer for efficient query execution. While DataFrames are preferred for programmatic APIs and functional capabilities, Spark SQL is ideal for ad-hoc querying and users familiar with SQL. The choice between them often hinges on the specific use case and the user's familiarity with either SQL or functional programming.In the next sections, we will explore how LangChain complements Spark's capabilities by introducing natural language interactions through agents.Introducing Spark Agent to LangChainLangChain, a dynamic library built upon the foundations of modern Language Model (LLM) technologies, is a pivotal addition to the world of data analysis. It bridges the gap between the power of Spark and the ease of human language interaction.LangChain harnesses the capabilities of advanced LLMs like ChatGPT and HuggingFace-hosted Models. These language models have proven their prowess in understanding and generating human-like text. LangChain capitalizes on this potential to enable users to interact with data and code through natural language queries.Empowering Data AnalysisThe introduction of the Spark Agent to LangChain brings about a transformative shift in data analysis workflows. Users are now able to tap into the immense analytical capabilities of Spark through simple daily language. This innovation opens doors for professionals from various domains to seamlessly explore datasets, uncover insights, and derive value without the need for deep technical expertise.LangChain acts as a bridge, connecting the technical realm of data processing with the non-technical world of language understanding. It empowers individuals who may not be well-versed in coding or data manipulation to engage with data-driven tasks effectively. This accessibility democratizes data analysis and makes it inclusive for a broader audience.The integration of LangChain with Spark involves a thoughtful orchestration of components that work in harmony to bring human-language interaction to the world of data analysis. At the heart of this integration lies the collaboration between ChatGPT, a sophisticated language model, and PythonREPL, a Python Read-Evaluate-Print Loop. The workflow is as follows:ChatGPT receives user queries in natural language and generates a Python command as a solution.The generated Python command is sent to PythonREPL for execution.PythonREPL executes the command and produces a result.ChatGPT takes the result from PythonREPL and translates it into a final answer in natural language.This collaborative process can repeat multiple times, allowing users to engage in iterative conversations and deep dives into data analysis.Several keynotes ensure a seamless interaction between the language model and the code execution environment:Initial Prompt Setup: The initial prompt given to ChatGPT defines its behavior and available tooling. This prompt guides ChatGPT on the desired actions and toolkits to employ.Connection between ChatGPT and PythonREPL: Through predefined prompts, the format of the answer is established. Regular expressions (regex) are used to extract the specific command to execute from ChatGPT's response. This establishes a clear flow of communication between ChatGPT and PythonREPL.Memory and Conversation History: ChatGPT does not possess a memory of past interactions. As a result, maintaining the conversation history locally and passing it with each new question is essential to maintaining context and coherence in the interaction.In the upcoming sections, we'll explore practical use cases that illustrate how this integration manifests in the real world, including interactions with Spark SQL and Spark DataFrames.The Spark SQL AgentIn this section, we will walk you through how to interact with Spark SQL using natural language, unleashing the power of Spark for querying structured data.Let's walk through a few hands-on examples to illustrate the capabilities of the integration:Exploring Data with Spark SQL Agent:Querying the dataset to understand its structure and metadata.Calculating statistical metrics like average age and fare.Extracting specific information, such as the name of the oldest survivor.Analyzing Dataframes with Spark DataFrame Agent:Counting rows to understand the dataset size.Analyzing the distribution of passengers with siblings.Computing descriptive statistics like the square root of average age.By interacting with the agents and experimenting with natural language queries, you'll witness firsthand the seamless fusion of advanced data processing with user-friendly language interactions. These examples demonstrate how Spark and LangChain can amplify your data analysis efforts, making insights more accessible and actionable.Before diving into the magic of Spark SQL interactions, let's set up the necessary environment. We'll utilize LangChain's SparkSQLToolkit to seamlessly bridge between Spark and natural language interactions. First, make sure you have your API key for OpenAI ready. You'll need it to integrate the language model.from langchain.agents import create_spark_sql_agent from langchain.agents.agent_toolkits import SparkSQLToolkit from langchain.chat_models import ChatOpenAI from langchain.utilities.spark_sql import SparkSQL import os # Set up environment variables for API keys os.environ['OPENAI_API_KEY'] = 'your-key'Now, let's get hands-on with Spark SQL. We'll work with a Titanic dataset, but you can replace it with your own data. First, create a Spark session, define a schema for the database, and load your data into a Spark DataFrame. We'll then create a table in Spark SQL to enable querying.from pyspark.sql import SparkSession spark = SparkSession.builder.getOrCreate() schema = "langchain_example" spark.sql(f"CREATE DATABASE IF NOT EXISTS {schema}") spark.sql(f"USE {schema}") csv_file_path = "titanic.csv" table = "titanic" spark.read.csv(csv_file_path, header=True, inferSchema=True).write.saveAsTable(table) spark.table(table).show() Now, let's initialize the Spark SQL Agent. This agent acts as your interactive companion, enabling you to query Spark SQL tables using natural language. We'll create a toolkit that connects LangChain, the SparkSQL instance, and the chosen language model (in this case, ChatOpenAI).from langchain.agents import AgentType spark_sql = SparkSQL(schema=schema) llm = ChatOpenAI(temperature=0, model="gpt-4-0613") toolkit = SparkSQLToolkit(db=spark_sql, llm=llm, handle_parsing_errors="Check your output and make sure it conforms!") agent_executor = create_spark_sql_agent(    llm=llm,    toolkit=toolkit,    agent=AgentType.CHAT_ZERO_SHOT_REACT_DESCRIPTION,    verbose=True,    handle_parsing_errors=True)Now comes the exciting part—querying Spark SQL tables using natural language! With your Spark SQL Agent ready, you can ask questions about your data and receive insightful answers. Let's try a few examples:# Describe the Titanic table agent_executor.run("Describe the titanic table") # Calculate the square root of the average age agent_executor.run("whats the square root of the average age?") # Find the name of the oldest survived passenger agent_executor.run("What's the name of the oldest survived passenger?") With these simple commands, you've tapped into the power of Spark SQL using natural language. The Spark SQL Agent makes data exploration and querying more intuitive and accessible than ever before.The Spark DataFrame AgentIn this section, we'll dive into another facet of LangChain's integration with Spark—the Spark DataFrame Agent. This agent leverages the power of Spark DataFrames and natural language interactions to provide an engaging and insightful way to analyze data.Before we begin, make sure you have a Spark session set up and your data loaded into a DataFrame. For this example, we'll use the Titanic dataset. Replace csv_file_path with the path to your own data if needed.from langchain.llms import OpenAI from pyspark.sql import SparkSession from langchain.agents import create_spark_dataframe_agent spark = SparkSession.builder.getOrCreate() csv_file_path = "titanic.csv" df = spark.read.csv(csv_file_path, header=True, inferSchema=True) df.show()Initializing the Spark DataFrame AgentNow, let's unleash the power of the Spark DataFrame Agent! This agent allows you to interact with Spark DataFrames using natural language queries. We'll initialize the agent by specifying the language model and the DataFrame you want to work with.# Initialize the Spark DataFrame Agent agent = create_spark_dataframe_agent(llm=OpenAI(temperature=0), df=df, verbose=True)With the agent ready, you can explore your data using natural language queries. Let's dive into a few examples:# Count the number of rows in the DataFrame agent.run("how many rows are there?") # Find the number of people with more than 3 siblings agent.run("how many people have more than 3 siblings") # Calculate the square root of the average age agent.run("whats the square root of the average age?")Remember that the Spark DataFrame Agent under the hood uses generated Python code to interact with Spark. While it's a powerful tool for interactive analysis, ensures that the generated code is safe to execute, especially in a sensitive environment.In this final section, let's tie everything together and showcase how Spark and LangChain work in harmony to unlock insights from data. We've covered the Spark SQL Agent and the Spark DataFrame Agent, so now it's time to put theory into practice.In conclusion, the combination of Spark and LangChain transcends the traditional boundaries of technical expertise, enabling data enthusiasts of all backgrounds to engage with data-driven tasks effectively. Through the Spark SQL Agent and Spark DataFrame Agent, LangChain empowers users to interact, explore, and analyze data using the simplicity and familiarity of natural language. So why wait? Dive in and unlock the full potential of your data analysis journey with the synergy of Spark and LangChain.ConclusionIn this article, we've delved into the world of Apache Spark and LangChain, two technologies that synergize to transform how we interact with and analyze data. By bridging the gap between technical data processing and human language understanding, Spark and LangChain enable users to derive meaningful insights from complex datasets through simple, natural language queries. The Spark SQL Agent and Spark DataFrame Agent presented here demonstrate the potential of this integration, making data analysis more accessible to a wider audience. As both technologies continue to evolve, we can expect even more powerful capabilities for unlocking the true potential of data-driven decision-making. So, whether you're a data scientist, analyst, or curious learner, harnessing the power of Spark and LangChain opens up a world of possibilities for exploring and understanding data in an intuitive and efficient manner.Author BioAlan Bernardo Palacio is a data scientist and an engineer with vast experience in different engineering fields. His focus has been the development and application of state-of-the-art data products and algorithms in several industries. He has worked for companies such as Ernst and Young, and Globant, and now holds a data engineer position at Ebiquity Media helping the company to create a scalable data pipeline. Alan graduated with a Mechanical Engineering degree from the National University of Tucuman in 2015, participated as the founder of startups, and later on earned a Master's degree from the faculty of Mathematics at the Autonomous University of Barcelona in 2017. Originally from Argentina, he now works and resides in the Netherlands.LinkedIn 
Read more
  • 0
  • 0
  • 15281

article-image-building-a-containerized-llm-chatbot-application
Alan Bernardo Palacio
21 Aug 2023
19 min read
Save for later

Building a Containerized LLM Chatbot Application

Alan Bernardo Palacio
21 Aug 2023
19 min read
In this hands-on tutorial, we will build a containerized LLM-powered chatbot application that uses examples to create a custom chatbot capable of answering deep philosophical questions and responding with profound questions in return. We will use Streamlit as the web application framework, PostgreSQL as the database to store examples, and OpenAI's GPT-3.5 "text-davinci-003" model for language processing.The application allows users to input philosophical questions, and the AI-powered chatbot will respond with insightful answers based on the provided examples. Additionally, the chatbot will ask thought-provoking questions in response to user input, simulating the behavior of philosophical minds like Socrates and Nietzsche.We'll break down the implementation into several files, each serving a specific purpose:Dockerfile: This file defines the Docker image for our application, specifying the required dependencies and configurations.docker-compose.yml: This file orchestrates the Docker containers for our application, including the web application (Streamlit) and the PostgreSQL database.setup.sql: This file contains the SQL commands to set up the PostgreSQL database and insert example data.streamlit_app.py: This file defines the Streamlit web application and its user interface.utils.py: This file contains utility functions to interact with the database, create the Da Vinci LLM model, and generate responses.requirements.txt: This file lists the Python dependencies required for our application.The DockerfileThe Dockerfile is used to build the Docker image for our application. It specifies the base image, sets up the working directory, installs the required dependencies, and defines the command to run the Streamlit application:FROM python:3 WORKDIR /app COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt COPY . . CMD ["streamlit", "run", "streamlit_app.py"]In the Dockerfile, we define the base image to Python 3 using FROM python:3, which enables us to use Python and its packages. Next, we specify the working directory inside the container as /app where we will copy our application files. To ensure all required Python packages are installed, we copy the requirements.txt file, which lists the dependencies, into the container's and then, we run the command pip install --no-cache-dir -r requirements.txt to install the Python dependencies. We proceed to copy all the files from the current directory (containing our application files) into the container's /app directory using COPY . .. Finally, we define the command to run the Streamlit application when the container starts using CMD ["streamlit", "run", "streamlit_app.py"]. This command starts the Streamlit app, enabling users to interact with the philosophical AI assistant through their web browsers once the container is up and running.The requirements.txt file lists the Python dependencies required for our application:streamlit streamlit-chat streamlit-extras psycopg2-binary openai==0.27.8 langchain==0.0.225The requirement file uses the next packages:streamlit: The Streamlit library for creating web applications.streamlit-chat: Streamlit Chat library for adding chat interfaces to Streamlit apps.streamlit-extras: Streamlit Extras library for adding custom components to Streamlit apps.psycopg2-binary: PostgreSQL adapter for Python.openai==0.27.8: The OpenAI Python library for accessing the GPT-3.5 model.langchain==0.0.225: LangChain library for working with language models and prompts.Next, we will define the docker compose file which will also handle the deployment of the Postgres database where we will store our examples.Creating the docker-composeThe docker-compose.yml file orchestrates the Docker containers for our application: the Streamlit web application and the PostgreSQL database:version: '3' services: app:    build:      context: ./app    ports:      - 8501:8501    environment:      - OPENAI_API_KEY=${OPENAI_API_KEY}    depends_on:      - db db:    image: postgres:13    environment:      - POSTGRES_USER=your_username      - POSTGRES_PASSWORD=your_password      - POSTGRES_DB=chatbot_db      - POSTGRES_HOST_AUTH_METHOD=trust    volumes:      - ./db/setup.sql:/docker-entrypoint-initdb.d/setup.sqlThe docker-compose.yml file orchestrates the deployment of our LLM-powered chatbot applicationand defines the services, i.e., the containers, needed for our application.In the services section, we have two distinct services defined: app and db. The app service corresponds to our Streamlit web application, which will serve as the user interface for interacting with the philosophical AI assistant. To build the Docker image for this service, we specify the build context as ./app, where the necessary application files, including the Dockerfile, reside.To ensure seamless communication between the host machine and the app container, we use the ports option to map port 8501 from the host to the corresponding port inside the container. This allows users to access the web application through their web browsers.For the application to function effectively, the environment variable OPENAI_API_KEY must be set, providing the necessary authentication for our LLM model to operate. This is done using the environment section, where we define this variable.One of the critical components of our application is the integration of a PostgreSQL database to store the philosophical question-answer pairs. The db service sets up the PostgreSQL database using the postgres:13 image. We configure the required environment variables, such as the username, password, and database name, to establish the necessary connection.To initialize the database with our predefined examples, we leverage the volumes option to mount the setup.sql file from the host machine into the container's /docker-entrypoint-initdb.d directory. This SQL script contains the commands to create the examples table and insert the example data. By doing so, our PostgreSQL database is ready to handle the profound philosophical interactions with the AI assistant.In conclusion, the docker-compose.yml file provides a streamlined and efficient way to manage the deployment and integration of Language Model Microservices with a PostgreSQL database, creating a cohesive environment for our philosophical AI assistant application.Setting up examplesThe setup.sql file contains the SQL commands to set up the PostgreSQL database and insert example data. We use this file in the volumes section of the docker-compose.yml file to initialize the database when the container starts:-- Create the examples table CREATE TABLE IF NOT EXISTS examples ( id SERIAL PRIMARY KEY, query TEXT, answer TEXT ); -- Insert the examples INSERT INTO examples (query, answer) VALUES ('What is the nature of truth?', 'Truth is a mirror reflecting the depths of our souls.'), ('Is there an objective reality?', 'Reality is an ever-shifting kaleidoscope, molded by our perceptions.'), (' What is the role of reason in human understanding?', 'Reason illuminates the path of knowledge, guiding us towards self-awareness.'), ('What is the nature of good and evil?', 'Good and evil are intertwined forces, dancing in the eternal cosmic tango.'), ('Is there a purpose to suffering?', 'Suffering unveils the canvas of resilience, painting a masterpiece of human spirit.'), ('What is the significance of morality?', 'Morality is the compass that navigates the vast ocean of human conscience.'), ('What is the essence of human existence?', 'Human existence is a riddle wrapped in the enigma of consciousness.'), ('How can we find meaning in a chaotic world?', 'Meaning sprouts from the fertile soil of introspection, blooming in the garden of wisdom.'), ('What is the nature of love and its transformative power?', 'Love is an alchemist, transmuting the mundane into the divine.'), ('What is the relationship between individuality and society?', 'Individuality dances in the grand symphony of society, playing a unique melody of self-expression.'), ('What is the pursuit of knowledge and its impact on the human journey?', 'Knowledge is the guiding star, illuminating the path of human evolution.'), ('What is the essence of human freedom?', 'Freedom is the soaring eagle, embracing the vast expanse of human potential.');The setup.sql script plays a crucial role in setting up the PostgreSQL database for our LLM-powered chatbot application. The SQL commands within this script are responsible for creating the examples table with the necessary columns and adding the example data to this table.In the context of our LLM application, these examples are of great importance as they serve as the foundation for the assistant's responses. The examples table could be a collection of question-answer pairs that the AI assistant has learned from past interactions. Each row in the table represents a specific question (query) and its corresponding insightful answer (answer).When a user interacts with the chatbot and enters a new question, the application leverages these examples to create a custom prompt for the LLM model. By selecting a relevant example based on the length of the user's question, the application constructs a few-shot prompt that incorporates both the user's query and an example from the database.The LLM model uses this customized prompt, containing the user's input and relevant examples, to generate a thoughtful and profound response that aligns with the philosophical nature of the AI assistant. The inclusion of examples in the prompt ensures that the chatbot's responses resonate with the same level of wisdom and depth found in the example interactions stored in the database.By learning from past examples and incorporating them into the prompts, our LLM-powered chatbot can emulate the thought processes of philosophical giants like Socrates and Nietzsche. Ultimately, these examples become the building blocks that empower the AI assistant to engage in the profound realms of philosophical discourse with the users.The Streamlit ApplicationThe streamlit_app.py file defines the Streamlit web application and its user interface. It is the main file where we build the web app and interact with the LLM model:import streamlit as st from streamlit_chat import message from streamlit_extras.colored_header import colored_header from streamlit_extras.add_vertical_space import add_vertical_space from utils import * # Define database credentials here DB_HOST = "db" DB_PORT = 5432 DB_NAME = "chatbot_db" DB_USER = "your_username" DB_PASSWORD = "your_password" # Connect to the PostgreSQL database and retrieve examples examples = get_database_examples(DB_HOST, DB_PORT, DB_NAME, DB_USER, DB_PASSWORD) # Create the Da Vinci LLM model davinci = create_davinci_model() # Create the example selector and few shot prompt template example_selector = create_example_selector(examples) dynamic_prompt_template = create_few_shot_prompt_template(example_selector) # Now the Streamlit app # Sidebar contents with st.sidebar:    st.title('The AI seeker of truth and wisdom')    st.markdown('''    ## About    This app is an LLM-powered chatbot built using:    - Streamlit    - Open AI Davinci LLM Model    - LangChain    - Philosophy    ''')    add_vertical_space(5)    st.write('Running in Docker!') # Generate empty lists for generated and past. ## generated stores AI generated responses if 'generated' not in st.session_state:    st.session_state['generated'] = ["Hi, what questions do you have today?"] ## past stores User's questions if 'past' not in st.session_state:    st.session_state['past'] = ['Hi!'] # Layout of input/response containers input_container = st.container() colored_header(label='', description='', color_name='blue-30') response_container = st.container() # User input ## Function for taking user provided prompt as input def get_text():    input_text = st.text_input("You: ", "", key="input")    return input_text ## Applying the user input box with input_container:    user_input = get_text() # Response output ## Function for taking user prompt as input followed by producing AI generated responses def generate_response(prompt):    response = davinci(        dynamic_prompt_template.format(query=prompt)    )    return response ## Conditional display of AI generated responses as a function of user provided prompts with response_container:    if user_input:        response = generate_response(user_input)        st.session_state.past.append(user_input)       st.session_state.generated.append(response)    if st.session_state['generated']:        for i in range(len(st.session_state['generated'])):            message(st.session_state['past'][i], is_user=True, key=str(i) + '_user',avatar_style='identicon',seed=123)            message(st.session_state["generated"][i], key=str(i),avatar_style='icons',seed=123)In this part of the code, we set up the core components of our LLM-powered chatbot application. We begin by importing the necessary libraries, including Streamlit, Streamlit Chat, and Streamlit Extras, along with utility functions from the utils.py file. Next, we define the database credentials (DB_HOST, DB_PORT, DB_NAME, DB_USER, DB_PASSWORD) required for connecting to the PostgreSQL database.The application then establishes a connection to the database using the get_database_examples function from the utils.py file. This crucial step retrieves profound philosophical question-answer pairs stored in the examples table. These examples are essential as they serve as a knowledge base for the AI assistant and provide the context and wisdom needed to generate meaningful responses.To leverage the OpenAI Da Vinci LLM model, we create the model instance using the create_davinci_model function from utils.py. This model acts as the core engine of our chatbot, enabling it to produce thoughtful and profound responses.In order to create custom prompts for the LLM model, we utilize the create_example_selector and create_few_shot_prompt_template functions from the utils.py file. These functions help select relevant examples based on the length of the user's input and construct dynamic prompts that combine the user's query with relevant examples.The Streamlit web app's sidebar is then set up, providing users with information about the application's purpose and inspiration. Within the application's session state, two lists (generated and past) are initialized to store AI-generated responses and user questions, respectively.To ensure an organized layout, we define two containers (input_container and response_container). The input_container houses the text input box where users can enter their questions. The get_text function is responsible for capturing the user's input.For generating AI responses, the generate_response function takes the user's prompt, processes it through the Da Vinci LLM model, and produces insightful replies. The AI-generated responses are displayed in the response_container using the message function from the Streamlit Chat library, allowing users to engage in profound philosophical dialogues with the AI assistant. Overall, this setup lays the groundwork for an intellectually stimulating and philosophical chatbot experience.Crating the utils fileThe utils.py file contains utility functions for our application, including connecting to the database, creating the Da Vinci LLM model, and generating responses:from langchain import PromptTemplate, FewShotPromptTemplate from langchain.prompts.example_selector import LengthBasedExampleSelector from langchain.llms import OpenAI from langchain import PromptTemplate, LLMChain from langchain.prompts.example_selector import LengthBasedExampleSelector from langchain import FewShotPromptTemplate import psycopg2 def get_database_examples(host, port, dbname, user, password):    try:        conn = psycopg2.connect(            host=host,            port=port,            dbname=dbname,            user=user,            password=password        )        cursor = conn.cursor()        cursor.execute("SELECT query, answer FROM examples")        rows = cursor.fetchall()        examples = [{"query": row[0], "answer": row[1]} for row in rows]        cursor.close()        conn.close()        return examples    except psycopg2.Error as e:        raise Exception(f"Error connecting to the database: {e}") def create_davinci_model():    return OpenAI(model_name='text-davinci-003') def create_example_selector(examples):    example_template = """    User: {query}    AI: {answer}    """    example_prompt = PromptTemplate(        input_variables=["query", "answer"],        template=example_template    )    if not examples:        raise Exception("No examples found in the database.")    return LengthBasedExampleSelector(        examples=examples,        example_prompt=example_prompt,        max_length=50    ) def create_few_shot_prompt_template(example_selector):    prefix = """The following are excerpts from conversations with a philosophical AI assistant.    The assistant is a seeker of truth and wisdom, responding with profound questions to know yourself    in a way that Socrates, Nietzsche, and other great minds would do. Here are some examples:"""    suffix = """    User: {query}    AI: """    return FewShotPromptTemplate(        example_selector=example_selector,        example_prompt=example_selector.example_prompt,        prefix=prefix,        suffix=suffix,        input_variables=["query"],        example_separator="\\\\n"    ) def generate_response(davinci, dynamic_prompt_template, prompt):    response = davinci(dynamic_prompt_template.format(query=prompt))    return responseThe get_database_examples function is responsible for establishing a connection to the PostgreSQL database using the provided credentials (host, port, dbname, user, password). Through this connection, the function executes a query to retrieve the question-answer pairs stored in the examples table. The function then organizes this data into a list of dictionaries, with each dictionary representing an example containing the query (question) and its corresponding answer.The create_davinci_model function is straightforward, as it initializes and returns the Da Vinci LLM model.To handle the selection of relevant examples for constructing dynamic prompts, the create_example_selector function plays a crucial role. It takes the list of examples as input and creates an example selector. This selector helps choose relevant examples based on the length of the user's query. By using this selector, the AI assistant can incorporate diverse examples that align with the user's input, leading to more coherent and contextually appropriate responses.The create_few_shot_prompt_template function is responsible for building the few-shot prompt template. This template includes a custom prefix and suffix to set the tone and style of the philosophical AI assistant. The prefix emphasizes the assistant's role as a "seeker of truth and wisdom" while the suffix provides the formatting for the user's query and AI-generated response. The custom template ensures that the AI assistant's interactions are profound and engaging, resembling the thought-provoking dialogues of historical philosophers like Socrates and Nietzsche.Finally, the generate_response function is designed to generate the AI's response based on the user's prompt. It takes the Da Vinci LLM model, dynamic prompt template, and the user's input as input parameters. The function uses the LLM model to process the dynamic prompt, blending the user's query with the selected examples, and returns the AI-generated response.Starting the applicationTo launch our philosophical AI assistant application with all its components integrated seamlessly, we can use Docker Compose. By executing the command docker-compose --env-file .env up, the Docker Compose tool will orchestrate the entire application deployment process.The --env-file .env option allows us to specify the environment variables from the .env file, which holds sensitive credentials and configuration details. This ensures that the necessary environment variables, such as the OpenAI API key and database credentials, are accessible to the application without being explicitly exposed in the codebase.When the docker-compose up command is initiated, Docker Compose will first build the application's Docker image using the Dockerfile defined in the ./app directory. This image will contain all the required dependencies and configurations for our Streamlit web application and the integration with the Da Vinci LLM model.Next, Docker Compose will create two services: the app service, which represents our Streamlit web application, and the db service, representing the PostgreSQL database. The app service is configured to run on port 8501, making it accessible through http://localhost:8501 in the browser.Once the services are up and running, the Streamlit web application will be fully operational, and users can interact with the philosophical AI assistant through the user-friendly interface. When a user enters a philosophical question, the application will use the Da Vinci LLM model, together with the selected examples, to generate insightful and profound responses in the style of great philosophers:With Docker Compose, our entire application, including the web server, LLM model, and database, will be containerized, enabling seamless deployment across different environments. This approach ensures that the application is easily scalable and portable, allowing users to experience the intellectual exchange with the philosophical AI assistant effortlessly.ConclusionIn this tutorial, we've built a containerized LLM-powered chatbot application capable of answering deep philosophical questions and responding with profound questions, inspired by philosophers like Socrates and Nietzsche. We used Streamlit as the web application framework, PostgreSQL as the database, and OpenAI's GPT-3.5 model for language processing.By combining Streamlit, PostgreSQL, and OpenAI's GPT-3.5 model, you've crafted an intellectually stimulating user experience. Your chatbot can answer philosophical inquiries with deep insights and thought-provoking questions, providing users with a unique and engaging interaction.Feel free to experiment further with the chatbot, add more examples to the database, or explore different prompts for the LLM model to enrich the user experience. As you continue to develop your AI assistant, remember the immense potential these technologies hold for solving real-world challenges and fostering intelligent conversations.Author Bio:Alan Bernardo Palacio is a data scientist and an engineer with vast experience in different engineering fields. His focus has been the development and application of state-of-the-art data products and algorithms in several industries. He has worked for companies such as Ernst and Young, Globant, and now holds a data engineer position at Ebiquity Media helping the company to create a scalable data pipeline. Alan graduated with a Mechanical Engineering degree from the National University of Tucuman in 2015, participated as the founder in startups, and later on earned a Master's degree from the faculty of Mathematics in the Autonomous University of Barcelona in 2017. Originally from Argentina, he now works and resides in the Netherlands.LinkedIn
Read more
  • 0
  • 0
  • 15111
Modal Close icon
Modal Close icon