LLM | 0 articles | Tech News, Tutorials & Expert Insights

article-image-evaluating-large-language-models

27 Oct 2023

8 min read

Evaluating Large Language Models

27 Oct 2023

0
0
19498

article-image-build-your-llm-powered-personal-website

Louis Owen

11 Sep 2023

8 min read

Build Your LLM-Powered Personal Website

Louis Owen

11 Sep 2023

8 min read

IntroductionSince ChatGPT shocked the world with its capability, AI started to be utilized in numerous fields: customer service assistant, marketing content creation, code assistant, travel itinerary planning, investment analysis, and you name it.However, have you ever wondered about utilizing AI, or more specifically Large Language Model (LLM) like ChatGPT, to be your own personal AI assistant on your website?A personal website, usually also called a personal web portfolio, consists of all the things we want to showcase to the world, starting from our short biography, work experiences, projects we have done, achievements, paper publications, and any other things that are related to our professional work. We put this website live on the internet and people can come and see all of its content by scrolling and surfing the pages.What if we can change the User Experience a bit from scrolling/surfing to giving a query? What if we add a small search bar or widget where they can directly ask anything they want to know about us and they’ll get the answer immediately? Let’s imagine a head-hunter or hiring manager who opened your personal website. In their mind, they already have specific criteria for the potential candidates they want to hire. If we put a search bar or any type of widget on our website, they can directly ask what they want to know about us. Hence, improving our chances of being approached by them. This will also let them know that we’re adapting to the latest technology available in the market and surely will increase our positive points in their scoring board.In this article, I’ll guide you to build your own LLM-Powered Personal Website. We’ll start by discussing several FREE-ly available LLMs that we can utilize. Then, we’ll go into the step-by-step process of how to build our personal AI assistant by exploiting the LLM capability as a Question and Answering (QnA) module. As a hint, we’ll use one of the available task-specific models provided by AI21Labs as our LLM. They provide a 3-month free trial worth $90 or 18,000 free calls for the QnA model. Finally, we’ll see how we can put our personal AI assistant on our website.Without wasting any more time, let’s take a deep breath, make yourselves comfortable, and be ready to learn how to build your own LLM-powered personal website!Freely Available LLMsThe main engine of our personal AI assistant is an LLM. The question is what LLM should we use?There are many variants of LLM available in the market right now, starting from open-source to closed-source LLM. There are two main differences between open-source and closed-source. Open-source LLMs are absolutely free but you need to host it by yourself. On the other hand, closed-source LLMs are not free but we don’t need to host it by ourselves, we just need to call an API request to utilize it.As for open-source LLM, the go-to LLM for a lot of use cases is LLaMA-2 by Meta AI. Since LLM consumes a large amount of GPU memory, in practice we usually perform 4-bit quantization to reduce the memory usage. Thanks to the open-source community, you can now directly use the quantized version of LLaMA-2 in the HuggingFace library released by TheBloke. To host the LLM, we can also utilize a very powerful inference server called Text Generation Inference (TGI).The next question is whether there are freely available GPU machines out there that we can use to host the LLM. We can’t use Google Colab since we want to host it on a server where the personal website can send API requests to that server. Luckily, there are 2 available free options for us: Google Cloud Platform and SaturnCloud. Both of them offer free trial accounts for us to rent the GPU machines.Open-source LLM like LLaMA-2 is free but it comes with an additional hassle which is to host it by ourselves. In this article, we’ll use closed-source LLM as our personal AI assistant instead. However, most closed-source LLMs that can be accessed via API are not free: GPT-3.5 and GPT-4 by OpenAI, Claude by Anthropic, Jurassic by AI21Labs, etc.Luckily, AI21Labs offers $90 worth of free trial for us! Moreover, they also provide task-specific models that are charged based on the number of API calls, not based on the number of tokens like in other most closed-source LLMs. This is surely very suitable for our use case since we’ll have long input tokens!Let’s dive deeper into AI21Labs LLM, specifically the QnA model which we’ll be using as our personal AI assistant!AI21Labs QnA LLMAI21Labs provides numerous task-specific models, which offer out-of-the-box reading and writing capabilities. The LLM we’ll be using is fine-tuned specifically for the QnA task, or they called it the “Contextual Answers” model. We just need to provide the context and query, then it will return the answer solely based on the information available in the context. This model is priced at $0.005 / API request, which means with our $90 free trial account, we can send 18,000 API calls! Isn’t it amazing? Without further ado, let’s start building our personal AI assistant!1. Create AI21Labs Free Trial AccountTo use the QnA model, we just need to create a free trial account on the AI21Labs website. You can follow the steps from the website, it’s super easy just like creating a new account on most websites.2. Enter the PlaygroundOnce you have the free trial account, you can go to the AI21Studio page and select “Contextual Answers” under the “Task-Specific Models” button in the left bar. Then, we can go to the Playground to test the model. Inside the Playground of the QnA model, there will be 2 input fields and 1 output field. As for input, we need to pass the context (the knowledge list) and the query. As for output, we’ll get the answer from the given query based on the context provided. What if the answer doesn’t exist in the context? This model will return “Answer not in documents.” as the fallback.3. Create the Knowledge ListThe next and main task that we need to do is to create the knowledge list as part of the context input. Just think of this knowledge list as the Knowledge Base (KB) for the model. So, the model is able to answer the model only based on the information available in this KB.4. Test with Several QueriesMost likely, our first set of knowledge is not exhaustive. Thus, we need to do several iterations of testing to keep expanding the list while also maintaining the quality of the returned answer. We can start by creating a list of possible queries that can be asked by our web visitors. Then, we can add several answers for each of the queries inside the knowledge list. Pro tip: Once our assistant is deployed on our website, we can also add a logger to store all queries and responses that we get. Using that log data, we can further expand our knowledge list, hence making our AI assistant “smarter”.5. Embed the AI Assistant on Our WebsiteUntil now, we just played with the LLM in the Playground. However, our goal is to put it inside our web portfolio. Thanks to AI21Labs, we can do it easily just by adding the JavaScript code inside our website. We can just click the three-dots button in the top right of the “context input” and choose the “Code” option. Then, a pop-up page will be shown, and you can directly copy and paste the JavaScript code into your personal website. That’s it!ConclusionCongratulations on keeping up to this point! Hopefully, I can see many new LLM-powered portfolios developed after this article is published. Throughout this article, you have learned how to build your own LLM-powered personal website starting from the motivation, freely available LLMs with their pros and cons, AI21Labs task-specific models, creating your own knowledge list along with some tips, and finally how to embed your AI assistant in your personal website. See you in the next article!Author BioLouis Owen is a data scientist/AI engineer from Indonesia who is always hungry for new knowledge. Throughout his career journey, he has worked in various fields of industry, including NGOs, e-commerce, conversational AI, OTA, Smart City, and FinTech. Outside of work, he loves to spend his time helping data science enthusiasts to become data scientists, either through his articles or through mentoring sessions. He also loves to spend his spare time doing his hobbies: watching movies and conducting side projects.Currently, Louis is an NLP Research Engineer at Yellow.ai, the world’s leading CX automation platform. Check out Louis’ website to learn more about him! Lastly, if you have any queries or any topics to be discussed, please reach out to Louis via LinkedIn.

0
0
19427

article-image-using-langchain-for-large-language-model-powered-applications

Avratanu Biswas

15 Jun 2023

5 min read

Using LangChain for Large Language Model — Powered Applications

Avratanu Biswas

15 Jun 2023

5 min read

This article is the second part of a series of articles, please refer to Part 2 for learning how to Get to grips with LangChain framework and how to utilize it for building LLM-powered AppsIntroductionLangChain is a powerful and open-source Python library specifically designed to enhance the usability, accessibility, and versatility of Large Language Models (LLMs) such as GPT-3 (Generative Pre-trained Transformer 3), BERT(Bidirectional Encoder Representations from Transformers), BLOOM (BigScience Large Open-science Open-access Multilingual Language Model). It provides developers with a comprehensive set of tools to seamlessly combine multiple prompts, creating a harmonious orchestra for working with LLMs effortlessly. The project was initiated by Harrison Chase, with the first commit made in late October 2022. In just a few weeks, LangChain gained immense popularity within the open-source community. Image 1: The popularity of the LangChain Python libraryLangChain for LLMsTo fully grasp the fundamentals of LangChain and utilize it effectively — understanding the fundamentals of LLMs is essential. In simple terms, LLMs are sophisticated language models or AI systems that have been extensively trained on massive amounts of text data to comprehend and generate human-like language. Albeit their powerful capabilities, LLMs are generic in nature i.e. lacking domain-specific knowledge or expertise. For instance, when addressing queries in fields like medicine or law, while an LLM can provide general insights, it, however, may struggle to offer in-depth or nuanced responses that require specialized expertise. Alongside such limitations, LLMs are susceptible to biases and inaccuracies present in training data which can yield contextually plausible, yet incorrect outputs. This is where LangChain shines — serving as an open-source library that leverages the power of LLMs and mitigates their drawbacks by providing abstractions and a diverse range of modules, akin to Lego blocks, thus facilitating intuitive integration with other tools and knowledge bases.In brief, LangChain presents a useful approach for handling text data, wherein the initial step involves preprocessing of the large corpus by segmenting it into smaller chunks or summaries. These chunks are then transformed into vector representations, enabling efficient comparisons and retrieval of similar chunks when questions are posed. This approach of preprocessing, real-time data collection, and interaction with the LLM is not only applicable to the specific context but can also be effectively utilized in other scenarios like code and semantic search.Image 2 - Typical workflow of Langchain ( Image created by Author)A typical workflow of LangChain involves several steps that enable efficient interaction between the user, the preprocessed text corpus, and the LLM. Notably, the strengths of LangChain lie in its provision of an abstraction layer, streamlining the intricate process of composing and integrating these text components, thereby enhancing overall efficiency and effectiveness.Key Attributes offered by LangChainThe core concept behind LangChain is its ability to connect a “Chain of thoughts” around LLMs, as evident from its name. However, LangChain is not limited to just a few LLMs — it provides a wide range of components that work together as building blocks for advanced use cases involving LLMs. Now, let’s delve into the various components that the LangChain library offers, making our work with LLMs easier and more efficient.Image 3: LangChain features at a glance. (Image created by Author)Prompts and Prompt Templates: Prompts refer to the inputs or queries we send to LLMs. As we have experienced with ChatGPT, the quality of the response depends heavily on the prompt. LangChain provides several functionalities to simplify the construction and handling of prompts. A prompt template consists of multiple parts, including instructions, content, and queries.Models: While LangChain itself does not provide LLMs, it leverages various Language Models (such as GPT3 and BLOOM, discussed earlier), Chat Models (like get-3.5-turbo), and Text Embedding Models (offered by CohereAI, HuggingFace, OpenAI).Chains: Chains are an end-to-end wrapper around multiple individual components, playing a major role in LangChain. The two most common types of chains are LLM chains and vector index chains.Memory: By default, Chains in LangChain are stateless, treating each incoming query or input independently without retaining context (i.e., lacking memory). To overcome this limitation, LangChain assists in both short-term memory (using previous conversational messages or summarised messages) and long-term memory (managing the retrieval and updating of information between conversations).Indexes: Index modules provide various document loaders to connect with different data resources and utility functions to seamlessly integrate with external vector databases like Pinecone, ChromoDB, and Weaviate, enabling smooth handling of large arrays of vector embeddings. The types of vector indexes include Document Loaders, Text Splitters, Retriever, and Vectorstore.Agents: While the sequence of chains is often deterministic, in certain applications, the sequence of calls may not be deterministic, with the next step depending on the user input and previous responses. Agents utilize LLMs to determine the appropriate actions and their orders. Agents perform these tasks using a suite of tools.Limitations on LangChain usageAbstraction challenge for debugging: The comprehensive abstraction provided by LangChain poses challenges for debugging as it becomes difficult to comprehend the underlying processes.Higher token consumption due to prompt coupling: Coupling a chain of prompts when executing multiple chains for a specific task often leads to higher token consumption, making it less cost-effective. Increased latency and slower performance: The latency period experienced when using LangChain in applications with agents or tools is higher, resulting in slower performance.Overall, LangChain provides a broad spectrum of features and modules that greatly enhance our interaction with LLMs. In the subsequent sections, we will explore the practical usage of LangChain and demonstrate how to build simple demo web applications using its capabilities.Referenceshttps://docs.langchain.com/docs/https://github.com/hwchase17/langchain https://medium.com/databutton/getting-started-with-langchain-a-powerful-tool-for-working-with-large-language-models-286419ba0842https://medium.com/@avra42/how-to-build-a-personalized-pdf-chat-bot-with-conversational-memory-965280c160f8AuthorAvratanu Biswas, Ph.D. Student ( Biophysics ), Educator, and Content Creator, ( Data Science, ML & AI ).Twitter YouTube Medium GitHub

0
0
19384

article-image-optimized-llm-serving-with-tgi

Louis Owen

25 Sep 2023

7 min read

Optimized LLM Serving with TGI

Louis Owen

25 Sep 2023

7 min read

Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights and books. Don't miss out – sign up today!IntroductionThe release of LLaMA-2 LLM to the public has been a tremendous contribution by Meta to the open-source community. It’s very powerful that many developers are fine-tuning it for so many different use cases. Closed-source LLMs such as GPT3.5, GPT4, or Claude are really convenient to utilize since we just need to send API requests to power up our application. On the other hand, utilizing open-source LLMs such as LLaMA-2 in production is not an easy task. Although there are several algorithms that support deploying an LLM with CPU, most of the time we still need to utilize GPU for better throughput.At a high level, there are two main things that need to be taken care of when deploying your LLM in production: memory and throughput. LLM is very big in size, the smallest version of LLaMA-2 has 7 billion parameters, which requires around 28GB GPU RAM for loading the model. Google Colab offers free NVIDIA T4 GPU to be used with 16GB memory, meaning that we can’t load even the smallest version of LLaMA-2 freely using the GPU provided by Google Colab.There are two popular ways to solve this memory issue: half-precision and 4/8-bit quantization. Half-precision is a very simple optimization method that we can adopt. In PyTorch, we just need to add the following line and we can load the 7B model by using only around 13GB GPU RAM. Basically, half-precision means that we’re representing the weights of the model with 16-bit floating points (fp16) instead of 32-bit floating points (fp32).torch.set_default_tensor_type(torch.cuda.HalfTensor)Another way to solve the memory issue is by performing 4/8-bits quantization. There are two quantization algorithms that are widely adopted by the developers: GPT-Q and BitsandBytes. These two algorithms are also supported within the `transformers` package. Quantization is a way to reduce the model size by representing the model weights in lower precision data types, such as 8-bit integers (int8) or 4-bit integers (int4) instead of 32-bit floating point (fp32) or 16-bit (fp16).As for comparison, after applying the 4-bit quantization with the GPT-Q algorithm, we can load the 7B model by using only around 4GB GPU RAM! This is indeed a huge drop - from 28GB to 4GB!Once we’re able to load the model, the simplest way to serve the model is by using the native HuggingFace workflow along with Flask or FastAPI. However, serving our model with this simple solution is not scalable and reliable enough to be used in production since it can’t handle parallel requests decently. This is related to the throughput problem mentioned earlier. There are many ways to solve this throughput problem, starting from continuous batching, tensor parallelism, prompt caching, paged attention, prefill parallel computing, KV cache, and many more. Discussing each of the algorithms will result in a very long article. However, luckily there’s an inference serving library for LLM that applies all of these techniques to ensure the reliability of serving our LLM in production. This library is called Text Generation Inference (TGI).Throughout this article, we’ll discuss in more detail what TGI is and how to utilize it to serve your LLM. We’ll see what are the top features of TGI and the step-by-step tutorial on how to spin up TGI as the inference server.Without wasting any more time, let’s take a deep breath, make yourselves comfortable, and be ready to learn how to utilize TGI as an inference server!What’s TGI?Text Generation Inference (TGI) is like the `transformers` package for inference servers. In fact, TGI is used in production at HuggingFace to power many applications. It provides a lot of useful features:Provide a straightforward launcher for the most popular Large Language Models.Implement Tensor Parallelism to enhance inference speed across multiple GPUs.Enable token streaming through Server-Sent Events (SSE).Implement continuous batching of incoming requests to boost overall throughput.Enhance transformer code for inference using flash-attention and Paged Attention on leading architectures.Apply quantization techniques with bitsandbytes and GPT-Q.Ensure fast and safe persistence format of weight loading with Safetensors.Incorporate watermarking using A Watermark for Large Language Models.Integrate a logit warper for various operations like temperature scaling, top-p, top-k, and repetition penalty.Implement stop sequences and log probabilities.Ensure production readiness with distributed tracing through Open Telemetry and Prometheus metrics.Offer Custom Prompt Generation, allowing users to easily generate text by providing custom prompts for guiding the model's output.Provide support for fine-tuning, enabling the use of fine-tuned models for specific tasks to achieve higher accuracy and performance.Implement RoPE scaling to extend the context length during inferenceTGI is surely a one-stop solution for serving LLMs. However, one important note about TGI that you need to know is regarding the project license. Starting from v1.0, HuggingFace has updated the license for TGI using the HFOIL 1.0 license. For more details, please refer to this GitHub issue. If you’re using TGI only for research purposes, then there’s nothing to worry about. If you’re using TGI for commercial purposes, please make sure to read the license carefully since there are cases where it can and can’t be used for commercial purposes.How to Use TGI?Using TGI is relatively easy and straightforward. We can use Docker to spin up the server. Note that you need to install the NVIDIA Container Toolkit to use GPU.docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:1.0.3 --model-id $modelOnce the server is up, we can easily send API requests to the server using the following command.curl 127.0.0.1:8080/generate \ -X POST \ -d '{"inputs":"What is LLM?","parameters":{"max_new_tokens":120}}' \ -H 'Content-Type: application/json'For more details regarding the API documentation, please refer to this page. Another way to send API requests to the TGI server is by sending it from Python. To do that, we need to install the Python library firstpip install text-generationOnce the library is installed, we can send API requests like the following.from text_generation import Client client = Client("http://127.0.0.1:8080") print(client.generate("What is LLM?", max_new_tokens=120).generated_text) text = "" for response in client.generate_stream("What is Deep Learning?", max_new_tokens=20): if not response.token.special: text += response.token.text print(text)ConclusionCongratulations on keeping up to this point! Throughout this article, you have learned what are the important things to be considered when you’re trying to serve your own LLM. You have also learned about TGI, starting from what is it, what’s the features, and also step-by-step examples on how to get the TGI server up. Hope the best for your LLM deployment journey and see you in the next article!Author BioLouis Owen is a data scientist/AI engineer from Indonesia who is always hungry for new knowledge. Throughout his career journey, he has worked in various fields of industry, including NGOs, e-commerce, conversational AI, OTA, Smart City, and FinTech. Outside of work, he loves to spend his time helping data science enthusiasts to become data scientists, either through his articles or through mentoring sessions. He also loves to spend his spare time doing his hobbies: watching movies and conducting side projects.Currently, Louis is an NLP Research Engineer at Yellow.ai, the world’s leading CX automation platform. Check out Louis’ website to learn more about him! Lastly, if you have any queries or any topics to be discussed, please reach out to Louis via LinkedIn.

0
0
18816

article-image-building-and-deploying-web-app-using-langchain

Avratanu Biswas

26 Jun 2023

12 min read

Building and deploying Web App using LangChain

Avratanu Biswas

26 Jun 2023

12 min read

So far, we've explored the LangChain modules and how to use them (refer to the earlier blog post on LangChain Modules here). In this section, we'll focus on the LangChain Indexes and Agent module and also walk through the process of creating and launching a web application that everyone can access. To make things easier, we'll be using Databutton, an all-in-one online workspace to build and deploy web apps, integrated with Streamlit, a Python web- development framework known for its support in building interactive web applications.What are LangChain Agents?In simpler terms, LangChain Agents are tools that enable Large Language Models (LLMs) to perform various actions, such as accessing Google search, executing Python calculations, or making SQL queries, thereby empowering LLMs to make informed decisions and interact with users by using tools and observing their outputs. The official documentation of LangChain describes Agents as:" …there is an agent which has access to a suite of tools. Depending on the user input, the agent can then decide which, if any, of these tools to call… In building agents, there are several abstractions involved. The Agent abstraction contains the application logic, receiving user input and previous steps to return either an AgentAction (tool and input) or AgentFinish (completion information). Agent covers another aspect, called Tools, which represents the actions agents can take, while Toolkits group tools for specific use cases (e.g., SQL querying). Lastly, the Agent Executor manages the iterative execution of the agent with the available tools. Thus, in this section, we will briefly explore such abstractions while using the Agent functionality to integrate tools and primarily focus on building a real-world easily deployable web application.IndexesThis module provides utility functions for structuring documents using indexes and allowing LLMs to interact with them effectively. We will focus on one of the most commonly used retrieval systems, where indexes are used to find the most relevant documents based on a user's query. Additionally, LangChain supports various index and retrieval types, with a focus on vector databases for unstructured data. We will explore this component in detail as it can be leveraged in a wide number of real-world applications.Image 1 Langchain workflow by AuthorWorkflow of a question & answer generation interface using Retrieval index, where we leverage all types of Indexes which LangChain provides. Indexes are primarily of four types, namely : Document Loaders, Text Splitters, VectorStores, and Retrievers. Briefly, (a) the documents fetched from any datasource is split into chunks using text splitter modules (b) Embeddings are created (c)Stored over a vector store index ( vector databases such as chromadb / pinecone / weaviate, etc ) (d) Queries from the user is retrieved via retrieval QA chain We will use the WikipediaLoader to load Wikipedia documents related to our query "LangChain" and retrieve the metadata and a portion of the page content of the first document.from langchain.document_loaders import WikipediaLoader docs = WikipediaLoader(query='LangChain', load_max_docs=2).load() docs[0].metadata docs[0].page_content[:400]CharacterTextSplitter is used to split the loaded documents into smaller chunks for further processing.from langchain.text_splitter import CharacterTextSplitter text_splitter = CharacterTextSplitter(chunk_size=4000, chunk_overlap=0) texts = text_splitter.split_documents(docs)The OpenAIEmbeddings the module is then employed to generate embeddings for the text chunks.from langchain.embeddings import OpenAIEmbeddings embeddings = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY)We will use Chroma vector store, which is created from the generated text chunks and embeddings, allowing for efficient storage and retrieval of vectorized data.Next, the RetrievalQA module is instantiated with an OpenAI LLM and the created retriever, setting up a question-answering system.from langchain.vectorstores import Chroma db = Chroma.from_documents(texts, embeddings) retriever = db.as_retriever() from langchain.chains import RetrievalQA from langchain.llms import OpenAI Qa = RetrievalQA.from_chain_type(llm=OpenAI(openai_api_key=OPENAI_API_KEY), chain_type="stuff", retriever=retriever)At this stage, we can easily seek answers from the stored indexed data. For instance, query = "What is LangChain?" qa.run(query)LangChain is a framework designed to simplify the creation of applications using large language models (LLMs).query = "When was LangChain founded?" qa.run(query)LangChain was founded in October 2022.query = "When was LangChain founded?" qa.run(query)LangChain was founded in October 2022.query = "Who is the founder?" qa.run(query) The founder of LangChain is Harrison Chase.The Q&A functionality implemented using the retrieval chain provides reasonable answers to most of our queries. Different types of indexes provided by LangChain, can be leveraged for various real-world use cases involving data structuring and retrieval. Moving forward, we will delve into the next section, where we will focus on the final component called the "Agent." During this section, we will not only gain a hands-on understanding of its usage but also build and deploy a web app using an online workspace called Databutton.Building Web App using DatabuttonPrerequisitesTo begin using Databutton, all that is required is to sign up through their official website. Once logged in, we can either create a blank template app from scratch or choose from the pre-existing templates provided by Databutton.Image by Author | Screen grasp showing on how to start working with a new blank appOnce the blank app is created, we generate our online workspace consisting of several features for building and deploying the app. We can immediately begin writing our code within the online editor. The only requirement at this stage is to include the necessary packages or dependencies that our app requires.Image by Author | Screen grasp showing the different components available within the Databutton App's online workspace. Databutton's workspace initialization includes some essential packages by default. However, for our specific app, we need to add two additional packages - openai and langchain. This can be easily accomplished within the "configuration" workspace of Databutton.Image by Author | Screen grasp of the configuration options within Databutton's online workspace. Here we can add the additional packages which we need for working with our app. The workspace is generated with few pre-installed dependencies.Writing the codeNow that we have a basic understanding of Agents and their abstraction methods, let's put them to use, alongside incorporating some basic Streamlit syntax for the front end.Importing the required modules: For building the web app, we will require the Streamlit library and several LangChain modules. Additionally, we will utilise a helper function that relies on the sys and io libraries for capturing and displaying function outputs. We will discuss the significance of this helper function towards the end to better understand its purpose.# Modules to Import import streamlit as st import sys import io import re from typing import Callable, Any from langchain.agents.tools import Tool from langchain.agents import initialize_agent from langchain.llms import OpenAI from langchain.chains import LLMChain from langchain import LLMMathChain from langchain import PromptTemplateUsing the LangChain modules and building the main user interface: We set the title of the app using st.title() syntax and also enables the user to enter their OpenAI API key using the st.text_input() widget.# Set the title of the app st.title("LangChain `Agent` Module Usage Demo App") # Get the OpenAI API key from the user OPENAI_API_KEY = st.text_input( "Enter your OpenAI API Key to get started", type="password" )As we discussed in the previous sections, we need to define a template for the prompt that incorporates a placeholder for the user's query.# Define a template for the prompt template = """You are a friendly and polite AI Chat Assistant. You must try to provide accurate but concise answers. If you don't know the answer, just say "I don't know." Question: {query} Answer: """ # Create a prompt template object with the template prompt = PromptTemplate(template=template, input_variables=["query"])Next, we implement a conditional loop. If the user has provided an OpenAI API key, we proceed with the flow of the app. The user is asked to enter their query using the st.text_input() widget.# Check if the user has entered an OpenAI API key if OPENAI_API_KEY: # Get the user's query query = st.text_input("Ask me anything")Once the user has the correct API keys inserted, from this point onward, we will proceed with the implementation of LangChain modules. Some of these modules may be new to us, while others may have already been covered in our previous sections.Next, we create instances of the OpenAI language model, OpenAI, the LLMMathChain for maths-related queries, and the LLMChain for general-purpose queries.# Check if the user has entered a query if query: # Create an instance of the OpenAI language model llm = OpenAI(temperature=0, openai_api_key=OPENAI_API_KEY) # Create an instance of the LLMMathChain for math-related queries llm_math_chain = LLMMathChain.from_llm(llm=llm, verbose=True) # Create an instance of the LLMChain for general-purpose queries llm_chain = LLMChain(llm=llm, prompt=prompt)Following that, we create a list of tools that the agent will utilize. Each tool comprises a name, a corresponding function to handle the query, and a brief description.# Create a list of tools for the agent tools = [ Tool( name="Search", func=llm_chain.run, description="Useful for when you need to answer general purpose questions", ), Tool( name="Calculator", func=llm_math_chain.run, description="Useful for when you need to answer questions about math", ), ]Further, we need to initialize a zero-shot agent with the tools and other parameters. This agent employs the ReAct framework to determine which tool to utilize based solely on the description associated with each tool. It is essential to provide a description of each tool.# Initialize the zero-shot agent with the tools and parameters zero_shot_agent = initialize_agent( agent="zero-shot-react-description", tools=tools, llm=llm, verbose=True, max_iterations=3, ) And now finally, we can easily call the zero-shot agent with the user's query using the run(query) method.# st.write(zero_shot_agent.run(query))However, this would only yield the final outcome of the result within our Streamlit UI, without providing access to the underlying LangChain thought process (i.e. the verbose ) that we typically observe in a Notebook environment. This information is crucial to understand which tools our agent is opting for based on the user query. To address this, a helper function called capture_and_display_output was created.# Helper function to dump LangChain Verbose / Though Process # Function to capture and display the output of a function def capture_and_display_output(func: Callable[..., Any], args, **kwargs) -> Any: # Redirect stdout to a string buffer original_stdout = sys.stdout sys.stdout = output_catcher = io.StringIO() # Call the function and capture the response response = func(args, *kwargs) # Restore the original stdout and get the captured output sys.stdout = original_stdout output_text = output_catcher.getvalue() # Clean the output text by removing escape sequences cleaned_text = re.sub(r"\x1b\[([0-9]{1,2}(;[0-9]{1,2})?)?[m|K]", "", output_text) # Split the cleaned text into lines and concatenate them with line breaks lines = cleaned_text.split("\n") concatenated_string = "\n".join([s if s else "\n" for s in lines]) # Display the captured output in an expander with st.expander("Thoughts", expanded=True): st.write(concatenated_string)This function allows users to monitor the actions undertaken by the agent. Consequently, the response from the agent is displayed within the UI.# Call the zero-shot agent with the user's query and capture the output response = capture_and_display_output(zero_shot_agent.run, query) Image by Author | Screen grasp of the app in local deployment displays the entire verbose or rather the thought process Deploy and Testing of the AppThe app can now be easily deployed by clicking the "Deploy" button on the top left-hand side. The deployed app will provide us with a unique URL that can be shared with everyone!Image by Author | Screen grasp of the Databutton online workspace showing the Deploy options. Yay! We have successfully built and deployed a LangChain-based web app from scratch. Here's the link to the app ! The app also consists of a view code page , which can be accessed via this link.To test the web app, we will employ two different types of prompts. One will be a general question that can be answered by any LLMs, while the other will be a maths-related question. Our hypothesis is that the LangChain agents will intelligently determine which agents to execute and provide the most appropriate response. Let's proceed with the testing to validate our assumption.Image by Author | Screen grasped from the deployed web app.Two different prompts were used to validate our assumptions. Based on the thought process ( displayed in the UI under the thoughts expander ), we can easily interpret which Tool has been chosen by the Agent. (Left) Usage of LLMMath chain incorporating Tool (Right) Usage of a simple LLM Chain incorporating Tool.ConclusionTo summarise, we have not only explored various aspects of working with LangChain and LLMs but have also successfully built and deployed a web app powered by LangChain. This demonstrates the versatility and capabilities of LangChain in enabling the development of powerful applications.ReferencesLangChain Agents official documentation : https://python.langchain.com/en/latest/modules/agents.htmlDatabutton : https://www.databutton.io/Streamlit : https://streamlit.io/ Build a Personal Search Engine Web App using Open AI Text Embeddings : https://medium.com/@avra42/build-a-personal-search-engine-web-app-using-open-ai-text-embeddings-d6541f32892dPart 1: Using LangChain for Large Language Model — powered Applications: https://www.packtpub.com/article-hub/using-langchain-for-large-language-model-powered-applicationsDeployed Web app - https://databutton.com/v/23ks6sem Source code for the app - https://databutton.com/v/23ks6sem/View_CodeAuthor BioAvratanu Biswas, Ph.D. Student ( Biophysics ), Educator, and Content Creator, (Data Science, ML & AI ).Twitter YouTube Medium GitHub

0
0
18743

article-image-demystifying-azure-openai-service

Olivier Mertens, Breght Van Baelen

15 Sep 2023

16 min read

Demystifying Azure OpenAI Service

Olivier Mertens, Breght Van Baelen

15 Sep 2023

16 min read

Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!This article is an excerpt from the book, Azure Data and AI Architect Handbook, by Olivier Mertens and Breght Van Baelen. Master core data architecture design concepts and Azure Data & AI services to gain a cloud data and AI architect’s perspective to developing end-to-end solutionsIntroductionOpenAI has risen immensely in popularity with the arrival of ChatGPT. The company, which started as an on-profit organization, has been the driving force behind the GPT and DALL-E model families, with intense research at a massive scale. The speed at which new models get released and become available on Azure has become impressive lately.Microsoft has a close partnership with OpenAI, after heavy investments in the company from Microsoft . The models created by OpenAI use Azure infrastructure for development and deployment. Within this partnership, OpenAI carries the responsibility of research and innovation, coming up with new models and new versions of their existing models. Microsoft manages the enterprise-scale go-to-market. It provides infrastructure and technical guidance, along with reliable SLAs, to get large organizations started with the integrations of these models, fine-tuning them on their own data, and hosting a private deployment of the models.Like the face recognition model in Azure Cognitive Services, powerful LLMs such as the ones in Azure OpenAI Service could be used to cause harm at scale. Therefore, this service is also gated according to Microsoft ’s guidelines on responsible AI.At the time of writing, Azure OpenAI Service offers access to the following models: GPT model family * GPT-3.5 * GPT-3.5-Turbo (the model behind ChatGPT * GPT-4CodexDALL-E 2Let’s dive deeper into these models.The GPT model familyGPT models, which stands for generative pre-trained transformer models, made their first appearance in 2018, with GPT-1, trained on a dataset of roughly 7,000 books. This made good advancements in performance at the time, but the model was already vastly outdated a couple of years later. GPT-2 followed in 2019, trained on the WebText dataset (a collection of 8 million web pages). In 2020, GPT-3 was released, trained on the WebText dataset, two book corpora, and English Wikipedia.In these years, there were no major breakthroughs in terms of efficient algorithms , but rather, in the scale of the architecture and datasets. This becomes easily visible when we look at the growing number of parameters used for every new generation of the model, as shown in the following figure.Figure 9.3 – A visual comparison between the sizes of the different generations of GPT models, based on their trainable parametersThe question is often raised of how to interpret this concept of parameters. An easy analogy is the number of neurons in a brain. Although parameters in a neural network are not equivalent to its artificial neurons, the number of parameters and neurons are heavily correlated – more parameters = more neurons. The more neurons there are in the brain, the more knowledge it can grasp.Since the arrival of GPT-3, we have seen two major adaptations of the third-generation model being made. The first one is GPT-3.5. This model has a similar architecture as the GPT-3 model but is trained on text and code, whereas the original GPT-3 only saw text data during training. Therefore, GPT-3.5 is capable of generating and understanding code. GPT-3.5, in turn, became the basis for the next adaptation, the vastly popular ChatGPT model. This model has been fine-tuned for conversational usage while using additional reinforcement learning to get a sense of ethical behavior.GPT model sizesThe OpenAI models are available in different sizes, which are all named after remarkable scientists. The GPT-3.5 model specifically, is available in four versions:AdaBabbage CurieDavinciThe Ada model is the smallest, most lightweight model, while Davinci is the most complex and most performant model. The larger the model, the more expensive it is to use, host, and fine-tune, as shown in Figure 9.4. As a side note, when you hear about the absurd number of parameters of new GPT models, this usually refers to the Davinci model.Figure 9.4 – A trade-off exists between lightweight, cheap models and highly performant, complex modelsWith a trade-off between costs and performance available, an architect can start thinking about which model size may best fit a solution. In reality, this often comes down to empirical testing. If the cheaper model can perform the job at an acceptable performance, then this is the more cost-effective solution. Note that when talking about performance in this scenario, we mean predictive power, not the speed at which the model makes predictions. The larger models will be slower to output a prediction than the lightweight models.Understanding the difference between GPT-3.5 and GPT-3.5-Turbo (ChatGPT)GPT-3.5 and GPT-3.5-Turbo are both models used to generate natural language text, but they are used in different ways. GPT-3.5 is classified as a text completion model, whereas GPT-3.5-Turbo is referred to as conversational AI.To better understand the contrast between the two models, we first need to introduce the concept of contextual learning. These models are trained to understand the structure of the input prompt to provide a meaningful answer. Contextual learning is often split up into few-shot learning, one-shot learning, and zero-shot learning. Shot, in this context, refers to an example given in the input prompt. With few-shot learning, we provide multiple examples in the input prompt, one-shot learning provides a single example, and zero-shot indicates that no examples are given. In the case of the latter, the model will have to figure out a different way to understand what is being asked of it (such as interpreting the goal of a question).Consider the following example:Figure 9.5 – Few-shot learning takes up the most amount of tokens and requires more effort but often results in model outputs of higher qualityWhile it takes more prompt engineering effort to apply few-shot learning, it will usually yield better results. A text completion model, such as GPT-3.5, will perform vastly better on few-shot learning than one-shot or zero-shot. As the name suggests, the model figures out the structure of the input prompt (i.e., the examples) and completes the text accordingly.Conversational AI, such as ChatGPT, is more performant in zero-shot learning. In the case of the preceding example, both models are able to output the correct answer, but as questions become more and more complex, there will be a noticeable difference in predictive performance. Additionally, GPT-3.5-Turbo will remember information from previous input prompts, whereas GPT-3.5 prompts are handled independently.Innovating with GPT-4With the arrival of GPT-4, the focus has shifted toward multimodality. Multimodality in AI refers to the ability of an AI system to process and interpret information from multiple modalities, such as text, speech, images, and videos. Essentially, it is the capability of AI models to understand and combine data from different sources and formats.GPT-4 is capable of additionally taking images as input and interpreting them. It has stronger reasoning and overall performance than its predecessors. There was a famous example where GPT-4 was able to deduce that balloons would fly upward when asked what would happen if someone cut the balloons' strings, as shown in the following photo.Figure 9.6 – The image in question that was used in the experiment. When asked what would happen if the strings were cut, GPT-4 replied that the balloons would start flying awaySome adaptations of GPT-4, such as the one used in Bing Chat, have the extra feature of citing sources in generated answers. This is a welcome addition, as hallucination was a significant flaw in earlier GPT models.HallucinationHallucination in the context of AI refers to generating wrong predictions with high confidence. It is obvious that this can cause a lot more harm than the model indicating it is not sure how to respond or knowing the answer.Next, we will look at the Codex model.CodexCodex is a model that is architecturally similar to GPT-3, but it fully focuses on code generation and understanding. Furthermore, an adaptation of Codex forms the underlying model for GitHub Copilot, a tool that provides suggestions and auto-completion for code based on context and natural language inputs, available for various integrated development environments (IDEs) such as Visual Studio Code. Instead of a ready-to-use solution, Codex is (like the other models in Azure OpenAI) available as a model endpoint and should be used for integration in custom apps.The Codex model is initially trained on a collection of 54 million code repositories, resulting in billions of lines of code, with the majority of training data written in Python.Codex can generate code in different programming languages based on an input prompt in natural language (text-to-code), explain the function of blocks of code (code-to-text), add comments to code, and debug existing code.Codex is available as a C (cushman) and D (Davinci) model. Lightweight Codex models (A series or B series) currently do not exist.Models such as Codex or GitHub Copilot are a great way to boost the productivity of software engineers, data analysts, data engineers, and data scientists. They do not replace these roles, as their accuracy is not perfect; rather, they give engineers the opportunity to start editing from a fairly well-written block of code instead of coding from scratch.DALL-E 2The DALL-E model family is used to generate visuals. By providing a description in natural language in the input prompt, it generates a series of matching images. While other models are often used at scale in large enterprises, DALL-E 2 tends to be more popular in smaller businesses. Organizations that lack an in-house graphic designer can make great use of DALL-E to generate visuals for banners, brochures, emails, web pages, and so on.DALL-E 2 only has a single model size to choose from, although open-source alternatives exist if a lightweight version is preferred. Fine-tuning and private deploymentsAs a data architect, it is important to understand the cost structure of these models. The first option is to use the base model in a serverless manner. Similar to how we work with Azure Cognitive Services, users will get a key for the model’s endpoint and simply pay per prediction. For DALL-E 2, costs are incurred per 100 images, while the GPT and Codex models are priced per 1,000 tokens. For every request made to a GPT or Codex model, all tokens of the input prompt and the output are added up to determine the cost of the prediction.TokensIn natural language processing, a token refers to a sequence of characters that represents a distinct unit of meaning in a text. These units do not necessarily correspond to words, although for short words, this is mostly the case. Tokens are used as the basic building blocks to process and analyze text data. A good rule of thumb for the English language is that one token is, on average, four characters. Dividing your total character count by four will make a good estimate of the number of tokens.Azure OpenAI Service also grants extensive fine-tuning functionalities. Up to 1 GB of data can be uploaded per Azure OpenAI instance for fine-tuning. This may not sound like a lot, but note that we are not training a new model from scratch. The goal of fine-tuning is to retrain the last few layers of the model to increase performance on specific tasks or company-specific knowledge. For this process, 1 GB of data is more than sufficient.When adding a fine-tuned model to a solution, two additional costs will be incurred. On top of the token-based inference cost, we need to take into account the training and hosting costs. The hourly training cost can be quite high due to the amount of hardware needed, but compared to the inference and hosting costs during a model’s life cycle, it remains a small percentage. Next, since we are not using the base model anymore and, instead, our own “version” of the model, we will need to host the model ourselves, resulting in an hourly hosting cost.Now that we have covered both pre-trained model collections, Azure Cognitive Services, and Azure OpenAI Service, let’s move on to custom development using Azure Machine Learning.Grounding LLMsOne of the most popular use cases for LLMs involves providing our own data as context to the model (often referred to as grounding). The reason for its popularity is partly due to the fact that many business cases can be solved using a consistent technological architecture. We can reuse the same solution, but by providing different knowledge bases, we can serve different end users.For example, by placing an LLM on top of public data such as product manuals or product specifics, it is easy to develop a customer support chatbot. If we swap out this knowledge base of product information with something such as HR documents, we can reuse the same tech stack to create an internal HR virtual assistant.A common misconception regarding grounding is that a model needs to be trained on our own data. This is not the case. Instead, after a user asks a question, the relevant document (or paragraphs) is injected into the prompt behind the scenes and lives in the memory of the model for the duration of the chat session (when working with conversational AI) or for a single prompt. The context, as we call it, is then wiped clean and all information is forgotten. If we wanted to cache this info, it is possible to make use of a framework such as LangChain or Semantic Kernel, but that is out of the scope of this book.The fact that a model does not get retrained on our own data plays a crucial role in terms of data privacy and cost optimization. As shown before in the section on fine-tuning, as soon as a base model is altered, an hourly operating cost is added to run a private deployment of the model. Also, information from the documents cannot be leaked to other users working with the same model.Figure 9.7 visualizes the architectural concepts to ground an LLM.Figure 9.7 – Architecture to ground an LLMThe first thing to do is turn the documents that should be accessible to the model into embeddings. Simply put, embeddings are mathematical representations of natural language text. By turning text into embeddings, it is possible to accurately calculate the similarity (from a semantics perspective) between two pieces of text.To do this, we can leverage Azure Functions, a service that allows pieces of code to run in a serverless function. It often forms the glue between different components by handling interactions. In this case, an Azure function (on the bottom left of Figure 9.7) will grab the relevant documents from the knowledge base, break them up into chunks (to accommodate for the maximum token limits of the model), and generate an embedding for each one. This embedding is then stored, alongside the natural language text, in a vector database. This function should be run for all historic data that will be accessible to the model, as well as triggered for every new, relevant document that is added to the knowledge base.Once the vector database is in place, users can start asking questions. However, the user questions are not directly sent to the model endpoint. Instead, another Azure function (shown at the top of Figure 9.7) will turn the user question into an embedding and check its similarity of it with the embeddings of the documents or paragraphs in the vector database. Then, the top X most relevant text chunks are injected into the prompt as context, and the prompt is sent over to the LLM. Finally, the response is returned to the user.ConclusionAzure OpenAI Service, a collaboration between OpenAI and Microsoft, delivers potent AI models. The GPT model family, from GPT-1 to GPT-4, has evolved impressively, with GPT-3.5-Turbo (ChatGPT) excelling in conversational AI. GPT-4 introduces multimodal capabilities, comprehending text, speech, images, and videos.Codex specializes in code generation, while DALL-E 2 creates visuals from text descriptions. These models empower developers and designers. Customization via fine-tuning offers cost-effective solutions for specific tasks. Leveraging Azure OpenAI Service for your projects enhances productivity.Grounding language models with user data ensures data privacy and cost efficiency. This collaboration holds promise for innovative AI applications across various domains.Author BioOlivier Mertens is a cloud solution architect for Azure data and AI at Microsoft, based in Dublin, Ireland. In this role, he assisted organizations in designing their enterprise-scale data platforms and analytical workloads. Next to his role as an architect, Olivier leads the technical AI expertise for Microsoft EMEA in the corporate market. This includes leading knowledge sharing and internal upskilling, as well as solving highly complex or strategic customer AI cases. Before his time at Microsoft, he worked as a data scientist at a Microsoft partner in Belgium.Olivier is a lecturer for generative AI and AI solution architectures, a keynote speaker for AI, and holds a master’s degree in information management, a postgraduate degree as an AI business architect, and a bachelor’s degree in business management.Breght Van Baelen is a Microsoft employee based in Dublin, Ireland, and works as a cloud solution architect for the data and AI pillar in Azure. He provides guidance to organizations building large-scale analytical platforms and data solutions. In addition, Breght was chosen as an advanced cloud expert for Power BI and is responsible for providing technical expertise in Europe, the Middle East, and Africa. Before his time at Microsoft, he worked as a data consultant at Microsoft Gold Partners in Belgium.Breght led a team of eight data and AI consultants as a data science lead. Breght holds a master’s degree in computer science from KU Leuven, specializing in AI. He also holds a bachelor’s degree in computer science from the University of Hasselt.

0
0
17954

article-image-deploying-llms-with-amazon-sagemaker-part-2

Joshua Arvin Lat

30 Nov 2023

19 min read

Deploying LLMs with Amazon SageMaker - Part 2

Joshua Arvin Lat

30 Nov 2023

19 min read

0
0
17834

article-image-fine-tuning-large-language-models-llms

Amita Kapoor

11 Sep 2023

12 min read

Fine-Tuning Large Language Models (LLMs)

Amita Kapoor

11 Sep 2023

12 min read

0
0
17691

article-image-text-classification-with-transformers

Saeed Dehqan

28 Aug 2023

9 min read

Text Classification with Transformers

Saeed Dehqan

28 Aug 2023

9 min read

IntroductionThis blog aims to implement binary text classification using a transformer architecture. If you're new to transformers, the "Transformer Building Blocks" blog explains the architecture and its text generation implementation. Beyond text generation and translation, transformers serve classification, sentiment analysis, and speech recognition. The transformer model comprises two parts: an encoder and a decoder. The encoder extracts features, while the decoder processes them. Just as a painter with tree features can draw, describe, visualize, categorize, or write about a tree, transformers encode knowledge (encoder) and apply it (decoder). This dual-part process is pivotal for text classification with transformers, allowing them to excel in diverse tasks like sentiment analysis, illustrating their transformative role in NLP.Deep Dive into Text Classification with TransformersWe train the model on the IMDB dataset. The dataset is ready and there’s no preprocessing needed. The model is vocab-based instead of character-based so that the model can converge faster. I limited the dataset vocabs to the 20000 most frequent vocabs. I also reduced the sequence to 200 so we can train faster. I tried to simplify the model and use torch.nn.MultiheadAttention it instead of writing the Multihead-attention ourselves. It makes the model faster since the nn.MultiheadAttention uses scaled_dot_product_attention under the hood. But if you want to know how MultiheadAttention works you can study the transformer building blocks blog or see the code here.Okay, now, let us add the feature extractor part:class transformer_block(nn.Module): def __init__(self): super(block, self).__init__() self.attention = nn.MultiheadAttention(embeds_size, num_heads, batch_first=True) self.ffn = nn.Sequential( nn.Linear(embeds_size, 4 * embeds_size), nn.LeakyReLU(), nn.Linear(4 * embeds_size, embeds_size), ) self.drop1 = nn.Dropout(drop_prob) self.drop2 = nn.Dropout(drop_prob) self.ln1 = nn.LayerNorm(embeds_size, eps=1e-6) self.ln2 = nn.LayerNorm(embeds_size, eps=1e-6) def forward(self, hidden_state): attn, _ = self.attention(hidden_state, hidden_state, hidden_state, need_weights=False) attn = self.drop1(attn) out = self.ln1(hidden_state + attn) observed = self.ffn(out) observed = self.drop2(observed) return self.ln2(out + observed)● hidden_state: A tensor with a shape (batch_size, block_size, embeds_size) goes to the transformer_block and a tensor with the same shape goes out of it.● self.attention: The transformer block tries to combine the information of tokens so that each token is aware of its neighbors or other tokens in the context. We may call this part the communication part. That’s what the nn.MultiheadAttention does. nn.MultiheadAttention is a ready multihead attention layer that can be faster than implementing it from scratch, just like what we did in the “Transformer Building Blocks” blog. The parameters of nn.MultiheadAttention are as follows: ○ embeds_size: token embedding size ○ num_heads: multihead, as the name suggests, consists of multiple heads and each head works on different parts of token embeddings. Suppose, your input data has shape (B,T,C) = (10, 32, 16). The token embedding size for this data is 16. If we specify the num_heads parameter to 2(divisible by 16), the multi-head splits data into two parts with shape (10, 32, 8). The first head works on the first part and the second head works on the second part. This is because transforming data into different subspaces can help the model to see different aspects of the data. Please note that the num_heads should be divisible by the embedding size so that at the end we can concatenate the split parts. ○ batch_first: True means the first dimension is batch.● Dropout: After the attention layer, the communication between tokens is closed and computations on tokens are done individually. We run a dropout on tokens. Dropout is a method of regularization. Regularization helps the training process to be based on generalization, not memorization. Without regularization, the model tries to memorize the training set and has poor performance on the test set. The dropout method turns off features with a probability of drop_prob.● self.ln1: Layer normalization normalizes embeddings so that they have zero mean and standard deviation one.● Residual connection: hidden_state + attn: Observe that before normalization, we added the input to the output of multihead attention, named residual connection. It has two benefits: ○ It helps the model to have the unchanged embedding information. ○ It helps to prevent gradient vanishing, which is common in deep networks where we stack multiple transformer layers.● self.ffn: After dropout, residual connection, and normalization, we forward data into a simple non-linear neural network to adjust the tokens one by one for better representation.● self.ln2(out + observed): Finally, another dropout, residual connection, and layer normalization.The transformer block is ready. And here is the final piece:class transformer(nn.Module): def __init__(self): super(transformer, self).__init__() self.tok_embs = nn.Embedding(vocab_size, embeds_size) self.pos_embs = nn.Embedding(block_size, embeds_size) self.block = block() self.ln1 = nn.LayerNorm(embeds_size) self.ln2 = nn.LayerNorm(embeds_size) self.classifier_head = nn.Sequential( nn.Linear(embeds_size, embeds_size), nn.LeakyReLU(), nn.Dropout(drop_prob), nn.Linear(embeds_size, embeds_size), nn.LeakyReLU(), nn.Linear(embeds_size, num_classes), nn.Softmax(dim=1), ) print("number of parameters: %.2fM" % (self.num_params()/1e6,)) def num_params(self): n_params = sum(p.numel() for p in self.parameters()) return n_params def forward(self, seq): B,T = seq.shape embedded = self.tok_embs(seq) embedded = embedded + self.pos_embs(torch.arange(T, device=device)) output = self.block(embedded) output = output.mean(dim=1) output = self.classifier_head(output) return output● self.tok_embs: nn.Embedding is like a lookup table that receives a sequence of indices, and returns their corresponding embeddings. These embeddings will receive gradients so that the model can update them to make better predictions.● self.tok_embs: To comprehend a sentence, you not only need words, you also need to have the order of words. Here, we embed positions and add them to the token embeddings. In this way, the model has both words and their order.● self.block: In this model, we only use one transformer block, but you can stack more blocks to get better results.● self.classifier_head: This is where we put the extracted information into action to classify the sequence. We call it the transformer head. It receives a fixed-size vector and classifies the sequence. The softmax as the final activation function returns a probability distribution for each class.● self.tok_embs(seq): Given a sequence of indices (batch_size, block_size), it returns (batch_size, block_size, embeds_size).● self.pos_embs(torch.arange(T, device=device)): Given a sequence of positions, i.e. [0,1,2], it returns embeddings of each position. Then, we add them to the token embeddings.● self.block(embedded): The embedding goes to the transformer block to extract features. Given the embedded shape (batch_size, block_size, embeds_size), the output has the same shape (batch_size, block_size, embeds_size).● output.mean(dim=1): The purpose of using mean is to aggregate the information from the sequence into a compact representation before feeding it into self.classifier_head. It helps in reducing the spatial dimensionality and extracting the most important features from the sequence. Given the input shape (batch_size, block_size, embeds_size), the output shape is (batch_size, embeds_size). So, one fixed-size vector for each batch.● self.classifier_head(output): And here we classify.The final code can be found here. The remaining code consists of downstream tasks such as the training loop, loading the dataset, setting the hyperparameters, and optimizer. I used RMSprop instead of Adam and AdamW. I also used BCEWithLogitsLoss instead of cross-entropy loss. BCE(Binary Cross Entropy) is for binary classification models and it combines sigmoid with cross entropy and it is numerically more stable. I also empirically got better accuracy. After 30 epochs, the final accuracy is ~84%.ConclusionThis exploration of text classification using transformers reveals their revolutionary potential. Beyond text generation, transformers excel in sentiment analysis. The encoder-decoder model, analogous to a painter interpreting tree feature, propels efficient text classification. A streamlined practical approach and the meticulously crafted transformer block enhance the architecture's robustness. Through optimization methods and loss functions, the model is honed, yielding an empirically validated 84% accuracy after 30 epochs. This journey highlights transformers' disruptive impact on reshaping AI-driven language comprehension, fundamentally altering the landscape of Natural Language Processing.Author BioSaeed Dehqan trains language models from scratch. Currently, his work is centered around Language Models for text generation, and he possesses a strong understanding of the underlying concepts of neural networks. He is proficient in using optimizers such as genetic algorithms to fine-tune network hyperparameters and has experience with neural architecture search (NAS) by using reinforcement learning (RL). He implements models starting from data gathering to monitoring, and deployment on mobile, web, cloud, etc.

0
0
17395

article-image-transformer-building-blocks

Saeed Dehqan

29 Aug 2023

22 min read

Transformer Building Blocks

Saeed Dehqan

29 Aug 2023

22 min read

0
0
17357

article-image-llm-powered-chatbots-for-financial-queries

Alan Bernardo Palacio

12 Sep 2023

27 min read

LLM-powered Chatbots for Financial Queries

Alan Bernardo Palacio

12 Sep 2023

27 min read

IntroductionIn the ever-evolving realm of digital finance, the convergence of user-centric design and cutting-edge AI technologies is pushing the boundaries of innovation. But with the influx of data and queries, how can we better serve users and provide instantaneous, accurate insights? Within this context, Large Language Models (LLMs) have emerged as a revolutionary tool, providing businesses and developers with powerful capabilities. This hands-on article will walk you through the process of leveraging LLMs to create a chatbot that can query real-time financial data extracted from the NYSE to address users' queries in real time about the current market state. We will dive into the world of LLMs, explore their potential, and understand how they seamlessly integrate with databases using LangChain. Furthermore, we'll fetch real-time data using the finance package offering the chatbot the ability to answer questions using current data.In this comprehensive tutorial, you'll gain proficiency in diverse aspects of modern software development. You'll first delve into the realm of database interactions, mastering the setup and manipulation of a MySQL database to store essential financial ticker data. Unveil the intricate synergy between Large Language Models (LLMs) and SQL through the innovative LangChain tool, which empowers you to bridge natural language understanding and database operations seamlessly. Moving forward, you'll explore the dynamic fusion of Streamlit and LLMs as you demystify the mechanics behind crafting a user-friendly front. Witness the transformation of your interface using OpenAI's Davinci model, enhancing user engagement with its profound knowledge. As your journey progresses, you'll embrace the realm of containerization, ensuring your application's agility and scalability by harnessing the power of Docker. Grasp the nuances of constructing a potent Dockerfile and orchestrating dependencies, solidifying your grasp on holistic software development practices.By the end of this guide, readers will be equipped with the knowledge to design, implement, and deploy an intelligent, finance-focused chatbot. This isn't just about blending frontend and backend technologies; it's about crafting a digital assistant ready to revolutionize the way users interact with financial data. Let's dive in!The Power of Containerization with Docker ComposeNavigating the intricacies of modern software deployment is simplified through the strategic implementation of Docker Compose. This orchestration tool plays a pivotal role in harmonizing multiple components within a local environment, ensuring they collaborate seamlessly.Docker allows to deploy multiple components seamlessly in a local environment. In our journey, we will use docker-compose to harmonize various components, including MySQL for data storage, a Python script as a data fetcher for financial insights, and a Streamlit-based web application that bridges the gap between the user and the chatbot.Our deployment landscape consists of several interconnected components, each contributing to the finesse of our intelligent chatbot. The cornerstone of this orchestration is the docker-compose.yml file, a blueprint that encapsulates the deployment architecture, coordinating the services to deliver a holistic user experience.With Docker Compose, we can efficiently and consistently deploy multiple interconnected services. Let's dive into the structure of our docker-compose.yml:version: '3' services: db: image: mysql:8.0 environment: - MYSQL_ROOT_PASSWORD=root_password - MYSQL_DATABASE=tickers_db - MYSQL_USER=my_user # Replace with your desired username - MYSQL_PASSWORD=my_pass # Replace with your desired password volumes: - ./db/setup.sql:/docker-entrypoint-initdb.d/setup.sql ports: - "3306:3306" # Maps port 3306 in the container to port 3306 on the host ticker-fetcher: image: ticker/python build: context: ./ticker_fetcher depends_on: - db environment: - DB_USER=my_user # Must match the MYSQL_USER from above - DB_PASSWORD=my_pass # Must match the MYSQL_PASSWORD from above - DB_NAME=tickers_db app: build: context: ./app ports: - 8501:8501 environment: - OPENAI_API_KEY=${OPENAI_API_KEY} depends_on: - ticker-fetcherContained within the composition are three distinctive services:db: A MySQL database container, configured with environmental variables for establishing a secure and efficient connection. This container is the bedrock upon which our financial data repository, named tickers_db, is built. A volume is attached to import a setup SQL script, enabling rapid setup.ticker-fetcher: This service houses the heart of our real-time data acquisition system. Crafted around a custom Python image, it plays the crucial role of fetching the latest stock information from Yahoo Finance. It relies on the db service to persistently store the fetched data, ensuring that our chatbot's insights are real-time data.app: The crown jewel of our user interface is the Streamlit application, which bridges the gap between users and the chatbot. This container grants users access to OpenAI's LLM model. It harmonizes with the ticker-fetcher service to ensure that the data presented to users is not only insightful but also dynamic.Docker Compose's brilliance lies in its capacity to encapsulate these services within isolated, reproducible containers. While Docker inherently fosters isolation, Docker Compose takes it a step further by ensuring that each service plays its designated role in perfect sync.The docker-compose.yml configuration file serves as the conductor's baton, ensuring each service plays its part with precision and finesse. As you journey deeper into the deployment architecture, you'll uncover the intricate mechanisms powering the ticker-fetcher container, ensuring a continuous flow of fresh financial data. Through the lens of Docker Compose, the union of user-centric design, cutting-edge AI, and streamlined deployment becomes not just a vision, but a tangible reality poised to transform the way we interact with financial data.Enabling Real-Time Financial Data AcquisitionAt the core of our innovative architecture lies the pivotal component dedicated to real-time financial data acquisition. This essential module operates as the engine that drives our chatbot's ability to deliver up-to-the-minute insights from the ever-fluctuating financial landscape.Crafted as a dedicated Docker container, this module is powered by a Python script that through the yfinance package, retrieves the latest stock information directly from Yahoo Finance. The result is a continuous stream of the freshest financial intelligence, ensuring that our chatbot remains armed with the most current and accurate market data.Our Python script, fetcher.py, looks as follows:import os import time import yfinance as yf import mysql.connector import pandas_market_calendars as mcal import pandas as pd import traceback DB_USER = os.environ.get('DB_USER') DB_PASSWORD = os.environ.get('DB_PASSWORD') DB_NAME = 'tickers_db' DB_HOST = 'db' DB_PORT = 3306 def connect_to_db(): return mysql.connector.connect( host=os.getenv("DB_HOST", "db"), port=os.getenv("DB_PORT", 3306), user=os.getenv("DB_USER"), password=os.getenv("DB_PASSWORD"), database=os.getenv("DB_NAME"), ) def wait_for_db(): while True: try: conn = connect_to_db() conn.close() return except mysql.connector.Error: print("Unable to connect to the database. Retrying in 5 seconds...") time.sleep(5) def is_market_open(): # Get the NYSE calendar nyse = mcal.get_calendar('NYSE') # Get the current timestamp and make it timezone-naive now = pd.Timestamp.now(tz='UTC').tz_localize(None) print("Now its:",now) # Get the market open and close times for today market_schedule = nyse.schedule(start_date=now, end_date=now) # If the market isn't open at all today (e.g., a weekend or holiday) if market_schedule.empty: print('market is empty') return False # Today's schedule print("Today's schedule") # Check if the current time is within the trading hours market_open = market_schedule.iloc[0]['market_open'].tz_localize(None) market_close = market_schedule.iloc[0]['market_close'].tz_localize(None) print("market_open",market_open) print("market_close",market_close) market_open_now = market_open <= now <= market_close print("Is market open now:",market_open_now) return market_open_now def chunks(lst, n): """Yield successive n-sized chunks from lst.""" for i in range(0, len(lst), n): yield lst[i:i + n] if __name__ == "__main__": wait_for_db() print("-"*50) tickers = ["AAPL", "GOOGL"] # Add or modify the tickers you want print("Perform backfill once") # historical_backfill(tickers) data = yf.download(tickers, period="5d", interval="1m", group_by="ticker", timeout=10) # added timeout print("Data fetched from yfinance.") print("Head") print(data.head().to_string()) print("Tail") print(data.head().to_string()) print("-"*50) print("Inserting data") ticker_data = [] for ticker in tickers: for idx, row in data[ticker].iterrows(): ticker_data.append({ 'ticker': ticker, 'open': row['Open'], 'high': row['High'], 'low': row['Low'], 'close': row['Close'], 'volume': row['Volume'], 'datetime': idx.strftime('%Y-%m-%d %H:%M:%S') }) # Insert data in bulk batch_size=200 conn = connect_to_db() cursor = conn.cursor() # Create a placeholder SQL query query = """INSERT INTO ticker_history (ticker, open, high, low, close, volume, datetime) VALUES (%s, %s, %s, %s, %s, %s, %s)""" # Convert the data into a list of tuples data_tuples = [] for record in ticker_data: for key, value in record.items(): if pd.isna(value): record[key] = None data_tuples.append((record['ticker'], record['open'], record['high'], record['low'], record['close'], record['volume'], record['datetime'])) # Insert records in chunks/batches for chunk in chunks(data_tuples, batch_size): cursor.executemany(query, chunk) print(f"Inserted batch of {len(chunk)} records") conn.commit() cursor.close() conn.close() print("-"*50) # Wait until starting to insert live values time.sleep(60) while True: if is_market_open(): print("Market is open. Fetching data.") print("Fetching data from yfinance...") data = yf.download(tickers, period="1d", interval="1m", group_by="ticker", timeout=10) # added timeout print("Data fetched from yfinance.") print(data.head().to_string()) ticker_data = [] for ticker in tickers: latest_data = data[ticker].iloc[-1] ticker_data.append({ 'ticker': ticker, 'open': latest_data['Open'], 'high': latest_data['High'], 'low': latest_data['Low'], 'close': latest_data['Close'], 'volume': latest_data['Volume'], 'datetime': latest_data.name.strftime('%Y-%m-%d %H:%M:%S') }) # Insert the data conn = connect_to_db() cursor = conn.cursor() print("Inserting data") total_tickers = len(ticker_data) for record in ticker_data: for key, value in record.items(): if pd.isna(value): record[key] = "NULL" query = f"""INSERT INTO ticker_history (ticker, open, high, low, close, volume, datetime) VALUES ( '{record['ticker']}',{record['open']},{record['high']},{record['low']},{record['close']},{record['volume']},'{record['datetime']}')""" print(query) cursor.execute(query) print("Data inserted") conn.commit() cursor.close() conn.close() print("Inserted data, waiting for the next batch in one minute.") print("-"*50) time.sleep(60) else: print("Market is closed. Waiting...") print("-"*50) time.sleep(60) # Wait for 60 seconds before checking againWithin its code, the script seamlessly navigates through a series of well-defined stages:Database Connectivity: The script initiates by establishing a secure connection to our MySQL database. With the aid of the connect_to_db() function, a connection is created while the wait_for_db() mechanism guarantees the script's execution occurs only once the database service is fully primed.Market Schedule Evaluation: Vital to the script's operation is the is_market_open() function, which determines the market's operational status. By leveraging the pandas_market_calendars package, this function ascertains whether the New York Stock Exchange (NYSE) is currently active.Data Retrieval and Integration: During its maiden voyage, fetcher.py fetches historical stock data from the past five days for a specified list of tickers—typically major entities such as AAPL and GOOGL. This data is meticulously processed and subsequently integrated into the tickers_db database. During subsequent cycles, while the market is live, the script periodically procures real-time data at one-minute intervals.Batched Data Injection: Handling substantial volumes of stock data necessitates an efficient approach. To address this, the script ingeniously partitions the data into manageable chunks and employs batched SQL INSERT statements to populate our database. This technique ensures optimal performance and streamlined data insertion.Now let’s discuss the Dockerfile that defines this container. The Dockerfile is the blueprint for building the ticker-fetcher container. It dictates how the Python environment will be set up inside the container.FROM python:3 WORKDIR /app COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt COPY . . CMD ["python", "-u", "fetcher.py"] Base Image: We start with a basic Python 3 image.Working Directory: The working directory inside the container is set to /app.Dependencies Installation: After copying the requirements.txt file into our container, the RUN command installs all necessary Python packages.Starting Point: The script's entry point, fetcher.py, is set as the command to run when the container starts.A list of Python packages needed to run our fetcher script:mysql-connector-python yfinance pandas-market-calendarsmysql-connector-python: Enables the script to connect to our MySQL database.yfinance: Fetches stock data from Yahoo Finance.pandas-market-calendars: Determines the NYSE's operational schedule.As a symphony of technology, the ticker-fetcher container epitomizes precision and reliability, acting as the conduit that channels real-time financial data into our comprehensive architecture.Through this foundational component, the chatbot's prowess in delivering instantaneous, accurate insights comes to life, representing a significant stride toward revolutionizing the interaction between users and financial data.With a continuously updating financial database at our disposal, the next logical step is to harness the potential of Large Language Models. The subsequent section will explore how we integrate LLMs using LangChain, allowing our chatbot to transform raw stock data into insightful conversations.Leveraging Large Language Models with SQL using LangChainThe beauty of modern chatbot systems lies in the synergy between the vast knowledge reservoirs of Large Language Models (LLMs) and real-time, structured data from databases. LangChain is a bridge that efficiently connects these two worlds, enabling seamless interactions between LLMs and databases such as SQL.The marriage between LLMs and SQL databases opens a world of possibilities. With LangChain as the bridge, LLMs can effectively query databases, offering dynamic responses based on stored data. This section delves into the core of our setup, the utils.py file. Here, we knit together the MySQL database with our Streamlit application, defining the agent that stands at the forefront of database interactions.LangChain is a library designed to facilitate the union of LLMs with structured databases. It provides utilities and agents that can direct LLMs to perform database operations via natural language prompts. Instead of a user having to craft SQL queries manually, they can simply ask a question in plain English, which the LLM interprets and translates into the appropriate SQL query.Below, we present the code if utils.py, that brings our LLM-database interaction to life:from langchain import PromptTemplate, FewShotPromptTemplate from langchain.prompts.example_selector import LengthBasedExampleSelector from langchain.llms import OpenAI from langchain.sql_database import SQLDatabase from langchain.agents import create_sql_agent from langchain.agents.agent_toolkits import SQLDatabaseToolkit from langchain.agents.agent_types import AgentType # Database credentials DB_USER = 'my_user' DB_PASSWORD = 'my_pass' DB_NAME = 'tickers_db' DB_HOST = 'db' DB_PORT = 3306 mysql_uri = f"mysql+mysqlconnector://{DB_USER}:{DB_PASSWORD}@{DB_HOST}:{DB_PORT}/{DB_NAME}" # Initialize the SQLDatabase and SQLDatabaseToolkit db = SQLDatabase.from_uri(mysql_uri) toolkit = SQLDatabaseToolkit(db=db, llm=OpenAI(temperature=0)) # Create SQL agent agent_executor = create_sql_agent( llm=OpenAI(temperature=0), toolkit=toolkit, verbose=True, agent_type=AgentType.ZERO_SHOT_REACT_DESCRIPTION, ) # Modified the generate_response function to now use the SQL agent def query_db(prompt): return agent_executor.run(prompt)Here's a breakdown of the key components:Database Credentials: These credentials are required to connect our application to the tickers_db database. The MySQL URI string represents this connection.SQLDatabase Initialization: Using the SQLDatabase.from_uri method, LangChain initializes a connection to our database. The SQLDatabaseToolkit provides a set of tools that help our LLM interact with the SQL database.Creating the SQL Agent: The SQL agent, in our case a ZERO_SHOT_REACT_DESCRIPTION type, is the main executor that takes a natural language prompt and translates it into SQL. It then fetches the data and returns it in a comprehensible manner. The agent uses the OpenAI model (an instance of LLM) and the aforementioned toolkit to accomplish this.The query_db function: This is the interface to our SQL agent. Upon receiving a prompt, it triggers the SQL agent to run and then returns the response.The architecture is now in place: on one end, we have a constant stream of financial data being fetched and stored in tickers_db. On the other, we have an LLM ready to interpret and answer user queries. The user might ask, "What was the closing price of AAPL yesterday?" and our system will seamlessly fetch this from the database and provide a well-crafted response, all thanks to LangChain.In the forthcoming section, we'll discuss how we present this powerful capability through an intuitive interface using Streamlit. This will enable end-users, irrespective of their technical proficiency, to harness the combined might of LLMs and structured databases, all with a simple chat interface.Building an Interactive Chatbot with Streamlit, OpenAI, and LangChainIn the age of user-centric design, chatbots represent an ideal solution for user engagement. But while typical chatbots can handle queries, imagine a chatbot powered by both Streamlit for its front end and a Large Language Model (LLM) for its backend intelligence. This powerful union allows us to provide dynamic, intelligent responses, leveraging our stored financial data to answer user queries.The grand finale of our setup is the Streamlit application. This intuitive web interface allows users to converse with our chatbot in natural language, making database queries feel like casual chats. Behind the scenes, it leverages the power of the SQL agent, tapping into the real-time financial data stored in our database, and presenting users with instant, accurate insights.Let's break down our chatbot's core functionality, designed using Streamlit:import streamlit as st from streamlit_chat import message from streamlit_extras.colored_header import colored_header from streamlit_extras.add_vertical_space import add_vertical_space from utils import * # Now the Streamlit app # Sidebar contents with st.sidebar: st.title('Financial QnA Engine') st.markdown(''' ## About This app is an LLM-powered chatbot built using: - Streamlit - Open AI Davinci LLM Model - LangChain - Finance ''') add_vertical_space(5) st.write('Running in Docker!') # Generate empty lists for generated and past. ## generated stores AI generated responses if 'generated' not in st.session_state: st.session_state['generated'] = ["Hi, how can I help today?"] ## past stores User's questions if 'past' not in st.session_state: st.session_state['past'] = ['Hi!'] # Layout of input/response containers input_container = st.container() colored_header(label='', description='', color_name='blue-30') response_container = st.container() # User input ## Function for taking user provided prompt as input def get_text(): input_text = st.text_input("You: ", "", key="input") return input_text ## Applying the user input box with input_container: user_input = get_text() # Response output ## Function for taking user prompt as input followed by producing AI generated responses def generate_response(prompt): response = query_db(prompt) return response ## Conditional display of AI generated responses as a function of user provided prompts with response_container: if user_input: response = generate_response(user_input) st.session_state.past.append(user_input) st.session_state.generated.append(response) if st.session_state['generated']: for i in range(len(st.session_state['generated'])): message(st.session_state['past'][i], is_user=True, key=str(i) + '_user',avatar_style='identicon',seed=123) message(st.session_state["generated"][i], key=str(i),avatar_style='icons',seed=123)The key features of the application are in general:Sidebar Contents: The sidebar provides information about the chatbot, highlighting the technologies used to power it. With the aid of streamlit-extras, we've added vertical spacing for visual appeal.User Interaction Management: Our chatbot uses Streamlit's session_state to remember previous user interactions. The 'past' list stores user queries, and 'generated' stores the LLM-generated responses.Layout: With Streamlit's container feature, the layout is neatly divided into areas for user input and the AI response.User Input: Users interact with our chatbot using a simple text input box, where they type in their query.AI Response: Using the generate_response function, the chatbot processes user input, fetching data from the database using LangChain and LLM to generate an appropriate response. These responses, along with past interactions, are then dynamically displayed in the chat interface using the message function from streamlit_chat.Now in order to ensure portability and ease of deployment, our application is containerized using Docker. Below is the Dockerfile that aids in this process:FROM python:3 WORKDIR /app COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt COPY . . CMD ["streamlit", "run", "streamlit_app.py"]This Dockerfile:Starts with a base Python 3 image.Sets the working directory in the container to /app.Copies the requirements.txt into the container and installs the necessary dependencies.Finally, it copies the rest of the application and sets the default command to run the Streamlit application.Our application depends on several Python libraries which are specified in requirements.txt:streamlit streamlit-chat streamlit-extras mysql-connector-python openai==0.27.8 langchain==0.0.225These include:streamlit: For the main frontend application.streamlit-chat & streamlit-extras: For enhancing the chat interface and adding utility functions like colored headers and vertical spacing.mysql-connector-python: To interact with our MySQL database.openai: For accessing OpenAI's Davinci model.langchain: To bridge the gap between LLMs and our SQL database.Our chatbot application combines now a user-friendly frontend interface with the immense knowledge and adaptability of LLMs.We can verify that the values provided by the chatbot actually reflect the data in the database:In the next section, we'll dive deeper into deployment strategies, ensuring users worldwide can benefit from our financial Q&A engine.ConclusionAs we wrap up this comprehensive guide, we have used from raw financial data to a fully-fledged, AI-powered chatbot, we've traversed a myriad of technologies, frameworks, and paradigms to create something truly exceptional.We initiated our expedition by setting up a MySQL database brimming with financial data. But the real magic began when we introduced LangChain to the mix, establishing a bridge between human-like language understanding and the structured world of databases. This amalgamation ensured that our application could pull relevant financial insights on the fly, using natural language queries. Streamlit stood as the centerpiece of our user interaction. With its dynamic and intuitive design capabilities, we crafted an interface where users could communicate effortlessly. Marrying this with the vast knowledge of OpenAI's Davinci LLM, our chatbot could comprehend, reason, and respond, making financial inquiries a breeze. To ensure that our application was both robust and portable, we harnessed the power of Docker. This ensured that irrespective of the environment, our chatbot was always ready to assist, without the hassles of dependency management.We've showcased just the tip of the iceberg. The bigger picture is captivating, revealing countless potential applications. Deploying such tools in firms could help clients get real-time insights into their portfolios. Integrating such systems with personal finance apps can help individuals make informed decisions about investments, savings, or even daily expenses. The true potential unfolds when you, the reader, take these foundational blocks and experiment, innovate, and iterate. The amalgamation of LLMs, databases, and interactive interfaces like Streamlit opens a realm of possibilities limited only by imagination.As we conclude, remember that in the world of technology, learning is a continuous journey. What we've built today is a stepping stone. Challenge yourself, experiment with new data sets, refine the user experience, or even integrate more advanced features. The horizon is vast, and the opportunities, endless. Embrace this newfound knowledge and craft the next big thing in financial technology. After all, every revolution starts with a single step, and today, you've taken many. Happy coding!Author BioAlan Bernardo Palacio is a data scientist and an engineer with vast experience in different engineering fields. His focus has been the development and application of state-of-the-art data products and algorithms in several industries. He has worked for companies such as Ernst and Young, and Globant, and now holds a data engineer position at Ebiquity Media helping the company to create a scalable data pipeline. Alan graduated with a Mechanical Engineering degree from the National University of Tucuman in 2015, participated as the founder of startups, and later on earned a Master's degree from the faculty of Mathematics at the Autonomous University of Barcelona in 2017. Originally from Argentina, he now works and resides in the Netherlands.LinkedIn

0
0
16792

article-image-large-language-models-llms-and-knowledge-graphs

Mostafa Ibrahim

15 Nov 2023

7 min read

Large Language Models (LLMs) and Knowledge Graphs

Mostafa Ibrahim

15 Nov 2023

7 min read

Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!IntroductionHarnessing the power of AI, this article explores how Large Language Models (LLMs) like OpenAI's GPT can analyze data from Knowledge Graphs to revolutionize data interpretation, particularly in healthcare. We'll illustrate a use case where an LLM assesses patient symptoms from a Knowledge Graph to suggest diagnoses, showcasing LLM’s potential to support medical diagnostics with precision.Brief Introduction Into Large Language Models (LLMs)Large Language Models (LLMs), such as OpenAI's GPT series, represent a significant advancement in the field of artificial intelligence. These models are trained on vast datasets of text, enabling them to understand and generate human-like language.LLMs are adept at understanding complex questions and providing appropriate responses, akin to human analysis. This capability stems from their extensive training on diverse datasets, allowing them to interpret context and generate relevant text-based answers.While LLMs possess advanced data processing capabilities, their effectiveness is often limited by the static nature of their training data. Knowledge Graphs step in to fill this gap, offering a dynamic and continuously updated source of information. This integration not only equips LLMs with the latest data, enhancing the accuracy and relevance of their output but also empowers them to solve more complex problems with a greater level of sophistication. As we harness this powerful combination, we pave the way for innovative solutions across various sectors that demand real-time intelligence, such as the ever-fluctuating stock market.Exploring Knowledge Graphs and How LLMs Can Benefit From ThemKnowledge Graphs represent a pivotal advancement in organizing and utilizing data, especially in enhancing the capabilities of Large Language Models (LLMs).Knowledge Graphs organize data in a graph format, where entities (like people, places, and things) are nodes, and the relationships between them are edges. This structure allows for a more nuanced representation of data and its interconnected nature. Take the above Knowledge Graph as an example.Doctor Node: This node represents the doctor. It is connected to the patient node with an edge labeled "Patient," indicating the doctor-patient relationship.Patient Node (Patient123): This is the central node representing a specific patient, known as "Patient123." It serves as a junction point connecting to various symptoms that the patient is experiencing.Symptom Nodes: There are three separate nodes representing individual symptoms that the patient has: "Fever," "Cough," and "Shortness of breath." Each of these symptoms is connected to the patient node by edges labeled "Symptom," indicating that these are the symptoms experienced by "Patient123. To simplify, the Knowledge Graph shows that "Patient123" is a patient of the "Doctor" and is experiencing three symptoms: fever, cough, and shortness of breath. This type of graph is useful in medical contexts where it's essential to model the relationships between patients, their healthcare providers, and their medical conditions or symptoms. It allows for easy querying of related data—for example, finding all symptoms associated with a particular patient or identifying all patients experiencing a certain symptom.Practical Integration of LLMs and Knowledge GraphsStep 1: Installing and Importing the Necessary LibrariesIn this step, we're going to bring in two essential libraries: rdflib for constructing our Knowledge Graph and openai for tapping into the capabilities of GPT, the Large Language Model.!pip install rdflib !pip install openai==0.28 import rdflib import openaiStep 2: Import your Personal OPENAI API KEYopenai.api_key = "Insert Your Personal OpenAI API Key Here"Step 3: Creating a Knowledge Graph# Create a new and empty Knowledge graph g = rdflib.Graph() # Define a Namespace for health-related data namespace = rdflib.Namespace("http://example.org/health/")Step 4: Adding data to Our GraphIn this part of the code, we will introduce a single entry to the Knowledge Graph pertaining to patient124. This entry will consist of three distinct nodes, each representing a different symptom exhibited by the patient.def add_patient_data(patient_id, symptoms): patient_uri = rdflib.URIRef(patient_id) for symptom in symptoms: symptom_predicate = namespace.hasSymptom g.add((patient_uri, symptom_predicate, rdflib.Literal(symptom))) # Example of adding patient data add_patient_data("Patient123", ["fever", "cough", "shortness of breath"])Step 5: Identifying the get_stock_price functionWe will utilize a simple query in order to extract the required data from the knowledge graph.def get_patient_symptoms(patient_id): # Correctly reference the patient's URI in the SPARQL query patient_uri = rdflib.URIRef(patient_id) sparql_query = f""" PREFIX ex: <http://example.org/health/> SELECT ?symptom WHERE {{ <{patient_uri}> ex:hasSymptom ?symptom. }} """ query_result = g.query(sparql_query) symptoms = [str(row.symptom) for row in query_result] return symptomsStep 6: Identifying the generate_llm_response functionThe generate_daignosis_response function takes as input the user’s name along with the list of symptoms extracted from the graph. Moving on, the LLM uses such data in order to give the patient the most appropriate diagnosis.def generate_diagnosis_response(patient_id, symptoms): symptoms_list = ", ".join(symptoms) prompt = f"A patient with the following symptoms - {symptoms_list} - has been observed. Based on these symptoms, what could be a potential diagnosis?" # Placeholder for LLM response (use the actual OpenAI API) llm_response = openai.Completion.create( model="text-davinci-003", prompt=prompt, max_tokens=100 ) return llm_response.choices[0].text.strip() # Example usage patient_id = "Patient123" symptoms = get_patient_symptoms(patient_id) if symptoms: diagnosis = generate_diagnosis_response(patient_id, symptoms) print(diagnosis) else: print(f"No symptoms found for {patient_id}.")Output: The potential diagnosis could be pneumonia. Pneumonia is a type of respiratory infection that causes symptoms including fever, cough, and shortness of breath. Other potential diagnoses should be considered as well and should be discussed with a medical professional.As demonstrated, the LLM connected the three symptoms—fever, cough, and shortness of breath—to suggest that patient123 may potentially be diagnosed with pneumonia.ConclusionIn summary, the collaboration of Large Language Models and Knowledge Graphs presents a substantial advancement in the realm of data analysis. This article has provided a straightforward illustration of their potential when working in tandem, with LLMs to efficiently extract and interpret data from Knowledge Graphs.As we further develop and refine these technologies, we hold the promise of significantly improving analytical capabilities and informing more sophisticated decision-making in an increasingly data-driven world.Author BioMostafa Ibrahim is a dedicated software engineer based in London, where he works in the dynamic field of Fintech. His professional journey is driven by a passion for cutting-edge technologies, particularly in the realms of machine learning and bioinformatics. When he's not immersed in coding or data analysis, Mostafa loves to travel.Medium

0
0
16177

Prakhar Mishra

06 Nov 2023

9 min read

Fine-Tuning LLaMA 2

Prakhar Mishra

06 Nov 2023

9 min read

Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!IntroductionLarge Language Models have recently become the talk of the town. I am very sure, you must have heard of ChatGPT. Yes, that’s an LLM, and that’s what I am talking about. Every few weeks, we have been witnessing newer, better but not necessarily larger LLMs coming out either as open-source or closed-source. This is probably the best time to learn about them and make these powerful models work for your specific use case.In today’s blog, we will look into one of the recent open-source models called Llama2 and try to fine-tune it on a standard NLP task of recognizing entities from text. We will first look into what are large language models, what are open-source and closed-source models, and some examples of them. We will then move to learning about Llama2 and why is it so special. We then describe our NLP task and dataset. Finally, we get into coding.About Large Language Models (LLMs)Language models are artificial intelligence systems that have been trained to understand and generate human language. Large Language Models (LLMs) like GPT-3, ChatGPT, GPT-4, Bard, and similar can perform diverse sets of tasks out of the box. Often the quality of output from these large language models is highly dependent on the finesse of the prompt given by the user.These Language models are trained on vast amounts of text data from the Internet. Most of the language models are trained in an auto-regressive way i.e. they try to maximize the probability of the next word based on the words they have produced or seen in the past. This data includes a wide range of written text, from books and articles to websites and social media posts. Language models have a wide range of applications, including chatbots, virtual assistants, content generation, and more. They can be used in industries like customer service, healthcare, finance, and marketing.Since these models are trained on enormous data, they are already good at zero-shot inference and can be steered to perform better with few-shot examples. Zero-shot is a setup in which a model can learn to recognize things that it hasn't explicitly seen before in training. In a Few-shot setting, the goal is to make predictions for new classes based on the few examples of labeled data that is provided to it at inference time.Despite their amazing capabilities of generating text, these humongous models come with a few limitations that must be thought of when building an LLM-based production pipeline. Some of these limitations are hallucinations, biases, and more.Closed and Open-source Language ModelsLarge language models from closed-source are those employed by some companies and are not readily accessible to the public. Training data for these models are typically kept private. While they can be highly sophisticated, this limits transparency, potentially leading to concerns about bias, and data privacy.In contrast, open-source projects like GPT-3, are designed to be freely available to researchers and developers. These models are trained on extensive, publicly available datasets, allowing for a degree of transparency and collaboration.The decision between closed- and open-source language models is influenced by several variables, such as the project's goals, the need for openness, and others.About LLama2Meta's open-source LLM is called Llama 2. It was trained with 2 trillion "tokens" from publicly available sources like Wikipedia, Common Crawl, and books from the Gutenberg project. Three different parameter level model versions are available, i.e. 7 billion, 13 billion, and 70 billion parameter models. There are two types of completion models available: Chat-tuned and General. The chat-tuned models that have been fine-tuned for chatbot-like dialogue are denoted by the suffix '-chat'. We will use general Meta's 7b Llama-2 huggingface model as the base model that we fine-tune. Feel free to use any other version of llama2-7b.Also, if you are interested, there are several threads that you can go through to understand how good is Llama family w.r.t GPT family is - source, source, source.About Named Entity RecognitionAs a component of information extraction, named-entity recognition locates and categorizes specific entities inside the unstructured text by allocating them to pre-defined groups, such as individuals, organizations, locations, measures, and more. NER offers a quick way to understand the core idea or content of a lengthy text.There are many ways of extracting entities from a given text, in this blog, we will specifically delve into fine-tuning Llama2-7b using PEFT techniques on Colab Notebook.We will transform the SMSSpamCollection classification data set for NER. Pretty interesting 😀We search through all 10 letter words and tag them as 10_WORDS_LONG. And this is the entity that we want our Llama to extract. But why this bizarre formulation? I did it intentionally to show that this is something that the pre-trained model would not have seen during the pre-training stage. So it becomes essential to fine-tune it and make it work for our use case 👍. But surely we can add logic to our formulation - think of these words as probable outliers/noisy words. The larger the words, the higher the possibility of it being noise/oov. However, you will have to come up with the extract letter count after seeing the word length distribution. Please note that the code is generic enough for fine-tuning any number of entities. It’s just a change in the data preparation step that we will make to slice out only relevant entities.Code for Fine-tuning Llama2-7b# Importing Libraries from transformers import LlamaTokenizer, LlamaForCausalLM import torch from datasets import Dataset import transformers import pandas as pd from peft import get_peft_model, LoraConfig, TaskType, prepare_model_for_int8_training, get_peft_model_state_dict, PeftModel from sklearn.utils import shuffleData Preparation Phasedf = pd.read_csv('SMSSpamCollection', sep='\t', header=None) all_text = df[1].str.lower().tolist() input, output = [], [] for text in all_text: input.append(text) output.append({word: '10_WORDS_LONG' for word in text.split() if len(word)==10}) df = pd.DataFrame([input, output]).T df.rename({0:'input_text', 1: 'output_text'}, axis=1, inplace=True) print (df.head(5)) total_ds = shuffle(df, random_state=42) total_train_ds = total_ds.head(4000) total_test_ds = total_ds.tail(1500) total_train_ds_hf = Dataset.from_pandas(total_train_ds) total_test_ds_hf = Dataset.from_pandas(total_test_ds) tokenized_tr_ds = total_train_ds_hf.map(generate_and_tokenize_prompt) tokenized_te_ds = total_test_ds_hf.map(generate_and_tokenize_prompt) Fine-tuning Phase# Loading Modelmodel_name = "meta-llama/Llama-2-7b-hf" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name) def create_peft_config(m): peft_cofig = LoraConfig( task_type=TaskType.CAUSAL_LM, inference_mode=False, r=8, lora_alpha=16, lora_dropout=0.05, target_modules=['q_proj', 'v_proj'], ) model = prepare_model_for_int8_training(model) model.enable_input_require_grads() model = get_peft_model(model, peft_cofig) model.print_trainable_parameters() return model, peft_cofig model, lora_config = create_peft_config(model) def generate_prompt(data_point): return f"""Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. ### Instruction: Extract entity from the given input: ### Input: {data_point["input_text"]} ### Response: {data_point["output_text"]}""" tokenizer.pad_token_id = 0 def tokenize(prompt, add_eos_token=True): result = tokenizer( prompt, truncation=True, max_length=128, padding=False, return_tensors=None, ) if ( result["input_ids"][-1] != tokenizer.eos_token_id and len(result["input_ids"]) < 128 and add_eos_token ): result["input_ids"].append(tokenizer.eos_token_id) result["attention_mask"].append(1) result["labels"] = result["input_ids"].copy() return result def generate_and_tokenize_prompt(data_point): full_prompt = generate_prompt(data_point) tokenized_full_prompt = tokenize(full_prompt) return tokenized_full_prompt training_arguments = transformers.TrainingArguments( per_device_train_batch_size=1, gradient_accumulation_steps=16, learning_rate=4e-05, logging_steps=100, optim="adamw_torch", evaluation_strategy="steps", save_strategy="steps", eval_steps=100, save_steps=100, output_dir="saved_models/" ) data_collator = transformers.DataCollatorForSeq2Seq(tokenizer) trainer = transformers.Trainer(model=model, tokenizer=tokenizer, train_dataset=tokenized_tr_ds, eval_dataset=tokenized_te_ds, args=training_arguments, data_collator=data_collator) with torch.autocast("cuda"): trainer.train()InferenceLoaded_tokenizer = LlamaTokenizer.from_pretrained(model_name) Loaded_model = LlamaForCausalLM.from_pretrained(model_name, load_in_8bit=True, torch.dtype=torch.float16, device_map=’auto’) Model = PeftModel.from_pretrained(Loaded_model, “saved_model_path”, torch.dtype=torch.float16) Model.config.pad_tokeni_id = loaded_tokenizer.pad_token_id = 0 Model.eval() def extract_entity(text): inp = Loaded_tokenizer(prompt, return_tensor=’pt’).to(“cuda”) with torch.no_grad(): P_ent = Loaded_tokenizer.decode(model.generate(**inp, max_new_tokens=128)[0], skip_special_tokens=True) int_idx = P_ent.find(‘Response:’) P_ent = P_ent[int_idx+len(‘Response:’):] return P_ent.strip() extracted_entity = extract_entity(text) print (extracted_entity) ConclusionWe covered the process of optimizing the llama2-7b model for the Named Entity Recognition job in this blog post. For that matter, it can be any task that you are interested in. The core concept that one must learn from this blog is PEFT-based training of large language models. Additionally, as pre-trained LLMs might not always perform well in your work, it is best to fine-tune these models.Author BioPrakhar Mishra has a Master’s in Data Science with over 4 years of experience in industry across various sectors like Retail, Healthcare, Consumer Analytics, etc. His research interests include Natural Language Understanding and generation, and has published multiple research papers in reputed international publications in the relevant domain. Feel free to reach out to him on LinkedIn

0
0
16143

article-image-making-the-best-out-of-hugging-face-hub-using-langchain

Ms. Valentina Alto

17 Jun 2023

6 min read

Making the best out of Hugging Face Hub using LangChain

Ms. Valentina Alto

17 Jun 2023

6 min read

Since the launch of ChatGPT in November 2022, everyone is talking about GPT models and OpenAI. There is no doubt that the Generative Pre-trained Transformers (GPT) architecture developed by OpenAI has demonstrated incredible results, also given the investments in training (almost 500 billion tokens) and complexity of the model (175 billion parameters for the GPT-3).Nevertheless, there is an incredible number of open-source Large Language Models (LLMs) that have been widespread in the last months. Below are some examples:Dolly: 12 billion parameters LLM developed by Databricks and trained on their ML platform. Source codeàhttps://github.com/databrickslabs/dollyStableML: This is a series of LLM developed by StabilityAI, the company behind the popular image generation model Stable Diffusion. The series encompasses a variety of LLMs, some of which are fine-tuned on specific use cases. Source codeàhttps://github.com/Stability-AI/StableLMFalcon LLM: A 40 billion parameters LLM developed by the Technology Innovation Institute and trained on a particularly high-quality dataset called RefinedWeb. Plus, as for now (June 2023) ranks 1 globally in the latest Hugging Face independent verification of open-source AI models. Source codeà https://huggingface.co/tiiuaeGPT NeoX and GPT-J: An open-source reproduction of the OpenAI’s GPT series developed by Eulether AI, with respectively 40 and 6 billion parameters. Source codeà https://huggingface.co/EleutherAI/gpt-neox-20b and https://huggingface.co/EleutherAI/gpt-j-6bOpenLLaMa: As for the previous class of models, also this one is an open-source reproduction of Meta AI’s LLaMA and has 3.7 billion parameters. Source codeàhttps://github.com/openlm-research/open_llamaIf you are interested in getting deeper into those models and their performance, you can reference the Hugging Face leaderboard here.Image1: Hugging Face Leaderboard Now, LLMs are great, yet to unlock their real power we need them to be positioned within an applicative logic. In other words, we want our LLMs to infuse intelligence within our applications.For this purpose, we will be using LangChain, a powerful lightweight SDK which makes it easier to integrate and orchestrate LLMs within applications. LangChain is one of the most popular LLMs orchestrators, yet if you want to explore further packages I encourage you to read about Semantic Kernel and Jarvis.One of the nice things about LangChain is its integration with external tools: those might be OpenAI (and other LLMs vendors), data sources, search APIs, and so on. In this article, we are going to explore how LangChain makes it easier to leverage open-source LLMs by leveraging its integration with the Hugging Face Hub.Welcome to the realm of open source LLMsThe Hugging Face Hub serves as a comprehensive platform comprising more than 120k models, 20kdatasets, and 50k demo apps (Spaces), all of which are openly accessible and shared as open-source projects. It provides an online environment where developers can effortlessly collaborate and collectively develop machine learning solutions. Thanks to LangChain, it is way easier to start interacting with open-source LLMs. Plus, you can also surround those models with all the libraries provided by LangChain in terms of prompt design, memory retention, chain management, and so on.Let’s see an implementation with Python. To reproduce the code, make sure to have: Python 3.7.1 or higheràyou can check your Python version running python --version in your terminalLangChain installedàyou can install it via pip install langchainThe huggingface_hub Python package installedà you can install it via pip install huggingface_butHugging Face Hub API keyàto get the API key, you can register into the portal here and then generate your secret key.For this example, I’m going to use the lightest version of Dolly, developed by Databricks and available in three sizes: 3, 7, and 12 billion parameters.from langchain import HuggingFaceHub from getpass import getpass HUGGINGFACEHUB_API_TOKEN = "your-api-key" import os os.environ["HUGGINGFACEHUB_API_TOKEN"] = HUGGINGFACEHUB_API_TOKEN repo_id = " databricks/dolly-v2-3b" llm = HuggingFaceHub(repo_id=repo_id, model_kwargs={"temperature":0, "max_length":64})As you can see from the above code, the only information we need are our Hugging Face Hub API key and the model’s repo ID; then, LangChain will take care of initializing our model thanks to the direct integration with Hugging Face Hub.Now that we have initialized our model, it is time to define the structure of the prompt:from langchain import PromptTemplate, LLMChain template = """Question: {question} Answer: Let's think step by step.""" prompt = PromptTemplate(template=template, input_variables=["question"]) llm_chain = LLMChain(prompt=prompt, llm=llm)Finally, we can feed our model with a first question:question = "In the first movie of Harry Potter, what is the name of the three-headed dog? “ print(llm_chain.run(question)) Output: The name of the three-headed dog in Harry Potter and the Philosopher Stone is Fuffy.Even though I tested the light version of Dolly with “only” 3 billion parameters, it came with pretty accurate results. Of course, for more complex tasks or real-world projects, heavier models might be taken into consideration, like the one emerging as top performers in the Hugging Face leaderboard mentioned at the beginning of this article.ConclusionThe realm of open-source LLM is growing exponentially, and this creates a vibrant environment of experimentation and tuning from which anyone can benefit. Plus, some interesting trends are rising, like the reduction of the number of models’ parameters in favour of an increase in quality of the training dataset. In fact, we saw that the current top performer among the open source model is Falcon LLM, with “only” 40 billion parameters, which gained its strength from the high-quality training dataset. Finally, with the development of orchestration frameworks like LangChain and similar, it’s getting easier and easier to leverage open source LLMs and integrate them into our applications. Referenceshttps://huggingface.co/docs/hub/indexOpen LLM Leaderboard — a Hugging Face Space by HuggingFaceH4Hugging Face Hub — 🦜🔗 LangChain 0.0.189Overview (huggingface.co)stabilityai (Stability AI) (huggingface.co)Stability-AI/StableLM: StableLM: Stability AI Language Models (github.com)Author BioValentina Alto graduated in 2021 in data science. Since 2020, she has been working at Microsoft as an Azure solution specialist, and since 2022, she has been focusing on data and AI workloads within the manufacturing and pharmaceutical industries. She has been working closely with system integrators on customer projects to deploy cloud architecture with a focus on modern data platforms, data mesh frameworks, IoT and real-time analytics, Azure Machine Learning, Azure Cognitive Services (including Azure OpenAI Service), and Power BI for dashboarding. Since commencing her academic journey, she has been writing tech articles on statistics, machine learning, deep learning, and AI in various publications and has authored a book on the fundamentals of machine learning with Python.Author of the book: Modern Generative AI with ChatGPT and OpenAI ModelsLink - Medium LinkedIn

0
0
15487

article-image-detecting-and-mitigating-hallucinations-in-llms

Ryan Goodman

25 Oct 2023

10 min read

Detecting and Mitigating Hallucinations in LLMs

Ryan Goodman

25 Oct 2023

10 min read

0
0
15296

How-To Tutorials - LLM

Evaluating Large Language Models

Build Your LLM-Powered Personal Website

Using LangChain for Large Language Model — Powered Applications

Optimized LLM Serving with TGI

Building and deploying Web App using LangChain

Demystifying Azure OpenAI Service

Deploying LLMs with Amazon SageMaker - Part 2

Fine-Tuning Large Language Models (LLMs)

Text Classification with Transformers

Transformer Building Blocks

Trending Topics

LLM-powered Chatbots for Financial Queries

Large Language Models (LLMs) and Knowledge Graphs

Fine-Tuning LLaMA 2

Making the best out of Hugging Face Hub using LangChain

Detecting and Mitigating Hallucinations in LLMs

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access