Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials - LLM

81 Articles
article-image-building-trust-in-ai-the-role-of-rag-in-data-security-and-transparency
Keith Bourne
13 Dec 2024
15 min read
Save for later

Building Trust in AI: The Role of RAG in Data Security and Transparency

Keith Bourne
13 Dec 2024
15 min read
This article is an excerpt from the book, "Unlocking Data with Generative AI and RAG", by Keith Bourne. Master Retrieval-Augmented Generation (RAG), the most popular generative AI tool, to unlock the full potential of your data. This book enables you to develop highly sought-after skills as corporate investment in generative AI soars.IntroductionAs the adoption of Retrieval-Augmented Generation (RAG) continues to grow, its potential to address key security challenges in AI-driven applications is becoming evident. Far from merely introducing risks, RAG offers a robust framework to enhance data protection, ensure accuracy, and maintain transparency in content generation. This article delves into the multifaceted security benefits of RAG, while also addressing the unique challenges it poses and strategies to mitigate them.How RAG can be leveraged as a security solutionLet’s start with the most positive security aspect of RAG. RAG can actually be considered a solution to mitigate security concerns, rather than cause them. If done right, you can limit data access via user, ensure more reliable responses, and provide more transparency of sources.Limiting dataRAG applications may be a relatively new concept, but you can still apply the same authentication and database-based access approaches you can with web and similar types of applications. This provides the same level of security you can apply in these other types of applications. By implementing userbased access controls, you can restrict the data that each user or user group can retrieve through the RAG system. This ensures that sensitive information is only accessible to authorized individuals. Additionally, by leveraging secure database connections and encryption techniques, you can safeguard the data at rest and in transit, preventing unauthorized access or data breaches.Ensuring the reliability of generated contentOne of the key benefits of RAG is its ability to mitigate inaccuracies in generated content. By allowing applications to retrieve proprietary data at the point of generation, the risk of producing misleading or incorrect responses is substantially reduced. Feeding the most current data available through your RAG system helps to mitigate inaccuracies that might otherwise occur.With RAG, you have control over the data sources used for retrieval. By carefully curating and maintaining high-quality, up-to-date datasets, you can ensure that the information used to generate responses is accurate and reliable. This is particularly important in domains where precision and correctness are critical, such as healthcare, finance, or legal applications.Maintaining transparencyRAG makes it easier to provide transparency in the generated content. By incorporating data such as citations and references to the retrieved data sources, you can increase the credibility and trustworthiness of the generated responses.When a RAG system generates a response, it can include links or references to the specific data points or documents used in the generation process. This allows users to verify the information and trace it back to its original sources. By providing this level of transparency, you can build trust with your users and demonstrate the reliability of the generated content.Transparency in RAG can also help with accountability and auditing. If there are any concerns or disputes regarding the generated content, having clear citations and references makes it easier to investigate and resolve any issues. This transparency also facilitates compliance with regulatory requirements or industry standards that may require traceability of information.That covers many of the security-related benefits you can achieve with RAG. However, there are some security challenges associated with RAG as well. Let’s discuss these challenges next.RAG security challengesRAG applications face unique security challenges due to their reliance on large language models (LLMs) and external data sources. Let’s start with the black box challenge, highlighting the relative difficulty in understanding how an LLM determines its response.LLMs as black boxesWhen something is in a dark, black box with the lid closed, you cannot see what is going on in there! That is the idea behind the black box when discussing LLMs, meaning there is a lack of transparency and interpretability in how these complex AI models process input and generate output. The most popular LLMs are also some of the largest, meaning they can have more than 100 billion parameters. The intricate interconnections and weights of these parameters make it difficult to understand how the model arrives at a particular output.While the black box aspects of LLMs do not directly create a security problem, it does make it more difficult to identify solutions to problems when they occur. This makes it difficult to trust LLM outputs, which is a critical factor in most of the applications for LLMs, including RAG applications. This lack of transparency makes it more difficult to debug issues you might have in building an RAG application, which increases the risk of having more security issues.There is a lot of research and effort in the academic field to build models that are more transparent and interpretable, called explainable AI. Explainable AI aims at making the operations of A I systems transparent and understandable. It can involve tools, frameworks, and anything else that, when applied to RAG, helps us understand how the language models that we use produce the content they are generating. This is a big movement in the field, but this technology may not be immediately available as you read this. It will hopefully play a larger role in the future to help mitigate black box risk, but right now, none of the most popular LLMs are using explainable models. So, in the meantime, we will talk about other ways to address this issue.You can use human-in-the-loop, where you involve humans at different stages of the process to provide an added line of defense against unexpected outputs. This can often help to reduce the impact of the black box aspect of LLMs. If your response time is not as critical, you can also use an additional LLM to perform a review of the response before it is returned to the user, looking for issues. We will review how to add a second LLM call in code lab 5.3, but with a focus on preventing prompt attacks. But this concept is similar, in that you can add additional LLMs to do a number of extra tasks and improve the security of your application.Black box isn’t the only security issue you face when using RAG applications though; another very important topic is privacy protection.Privacy concerns and protecting user dataPersonally identifiable information (PII) is a key topic in the generative AI space, with governments a round the world trying to determine the best path to balance user privacy with the data-hungry needs of these LLMs. As this gets worked out, it is important to pay attention to the laws and regulations that are taking shape where your company is doing business and make sure all of the technologies you are integrating into your RAG applications adhere. Many companies, such as Google and Microsoft , are taking these efforts into their own hands, establishing their own standards of protection for their user data and emphasizing them in training literature for their platforms.At the corporate level, there is another challenge related to PII and sensitive information. As we have said many times, the nature of the RAG application is to give it access to the company data and combine that with the power of the LLM. For example, for financial institutions, RAG represents a way to give their customers unprecedented access to their own data in ways that allow them to speak naturally with technologies such as chatbots and get near-instant access to hard-to-find answers buried deep in their customer data.In many ways, this can be a huge benefit if implemented properly. But given that this is a security discussion, you may already see where I am going with this. We are giving unprecedented access to customer data using a technology that has artificial intelligence, and as we said previously in the black box discussion, we don’t completely understand how it works! If not implemented properly, this could be a recipe for disaster with massive negative repercussions for companies that get it wrong. Of course, it could be argued that the databases that contain the data are also a potential security risk. Having the data anywhere is a risk! But without taking on this risk, we also cannot provide the significant benefits they represent.As with other IT applications that contain sensitive data, you can forge forward, but you need to have a healthy fear of what can happen to data and proactively take measures to protect that data. The more you understand how RAG works, the better job you can do in preventing a potentially disastrous data leak. These steps can help you protect your company as well as the people who trusted your company with their data.This section was about protecting data that exists. However, a new risk that has risen with LLMs has been the generation of data that isn’t real, called hallucinations. Let’s discuss how this presents a new risk not common in the IT world.HallucinationsWe have discussed this in previous chapters, but LLMs can, at times, generate responses that sound coherent and factual but can be very wrong. These are called hallucinations and there have been many shocking examples provided in the news, especially in late 2022 and 2023, when LLMs became everyday tools for many users.Some are just funny with little consequence other than a good laugh, such as when ChatGPT was asked by a writer for The Economist, “When was the Golden Gate Bridge transported for the second time across Egypt?” ChatGPT responded, “The Golden Gate Bridge was transported for the second time across Egypt in October of 2016” (https://www.economist.com/by-invitation/2022/09/02/artificialneural-networks-today-are-not-conscious-according-to-douglashofstadter).Other hallucinations are more nefarious, such as when a New York lawyer used ChatGPT for legal research in a client’s personal injury case against Avianca Airlines, where he submitted six cases that had been completely made up by the chatbot, leading to court sanctions (https://www. courthousenews.com/sanctions-ordered-for-lawyers-who-relied-onchatgpt-artificial-intelligence-to-prepare-court-brief/). Even worse, generative AI has been known to give biased, racist, and bigoted perspectives, particularly when prompted in a manipulative way.When combined with the black box nature of these LLMs, where we are not always certain how and why a response is generated, this can be a genuine issue for companies wanting to use these LLMs in their RAG applications.From what we know though, hallucinations are primarily a result of the probabilistic nature of LLMs. For all responses that an LLM generates, it typically uses a probability distribution to determine what token it is going to provide next. In situations where it has a strong knowledge base of a certain subject, these probabilities for the next word/token can be 99% or higher. But in situations where the knowledge base is not as strong, the highest probability could be low, such as 20% or even lower. In these cases, it is still the highest probability and, therefore, that is the token that has the highest probability to be selected. The LLM has been trained on stringing tokens together in a very natural language way while using this probabilistic approach to select which tokens to display. As it strings together words with low probability, it forms sentences, and then paragraphs that sound natural and factual but are not based on high probability data. Ultimately, this results in a response that sounds very plausible but is, in fact, based on very loose facts that are incorrect.For a company, this poses a risk that goes beyond the embarrassment of your chatbot saying something wrong. What is said wrong could ruin your relationship(s) with your customer(s), or it could lead to the LLM offering your customer something that you did not intend to offer, or worse, cannot afford to offer. For example, when Microsoft released a chatbot named Tay on Twitter in 2016 with the intention of learning from interactions with Twitter users, users manipulated this spongy personality trait to get it to say numerous racist and bigoted remarks. This reflected poorly on Microsoft, which was promoting its expertise in the AI area with Tay, causing significant damage to its reputation at the time (https://www.theguardian.com/technology/2016/mar/26/microsoftdeeply-sorry-for-offensive-tweets-by-ai-chatbot).Hallucinations, threats related to black box aspects, and protecting user data can all be addressed through red teaming.ConclusionRAG represents a promising avenue for enhancing security in AI applications, offering tools to limit data access, ensure reliable outputs, and promote transparency. However, challenges such as the black box nature of LLMs, privacy concerns, and the risk of hallucinations demand proactive measures. By employing strategies like user-based access controls, explainable AI, and red teaming, organizations can harness the advantages of RAG while mitigating risks. As the technology evolves, a thoughtful approach to its implementation will be crucial for maintaining trust, compliance, and the integrity of data-driven solutions.Author BioKeith Bourne is a senior Generative AI data scientist at Johnson & Johnson. He has over a decade of experience in machine learning and AI working across diverse projects in companies that range in size from start-ups to Fortune 500 companies. With an MBA from Babson College and a master’s in applied data science from the University of Michigan, he has developed several sophisticated modular Generative AI platforms from the ground up, using numerous advanced techniques, including RAG, AI agents, and foundational model fine-tuning. Keith seeks to share his knowledge with a broader audience, aiming to demystify the complexities of RAG for organizations looking to leverage this promising technology.
Read more
  • 0
  • 0
  • 16127

article-image-building-an-llm-powered-app-using-snowflake-and-streamlit
Ryan Goodman
30 Jan 2024
11 min read
Save for later

Building an LLM-powered App using Snowflake and Streamlit

Ryan Goodman
30 Jan 2024
11 min read
Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!IntroductionFor years, self-service analytics apps have enabled both information consumers (business users) and information workers (analysts) to meet their need for data assets that aid analysis and problem-solving. These data assets can include ready-made insights and analysis in the form of statistics, visual stories, or formatted data for further discovery. Historically, for an enterprise to embark on creating analytics apps, it required a specialized skillset, technology tools, and a steep learning curve to deliver value.Three significant trends have shifted how we view analytics apps today:●  No-code and low-code data acquisition, along with cloud data/warehouse platforms, have helped democratize the data platform.●  Data platforms like Snowflake are designed to bring analytics computing into a single platform where data no longer needs to be copied and moved.●  The democratization of machine learning and the widespread availability of powerful generative AI models have changed the entire user experience and expectations for information discovery and natural language exploration.The result of these trends has accelerated technology cycles and the rate of innovation in unprecedented ways. Prudent technology and business leaders are strained with more requests and fewer resources to use data to build information-focused businesses.Currently, we have AI app and analytics waves breaking at the same time with different use cases in mind but the same objective. For this article, we wanted to explore the basics of building a simple analytics app inside of Snowflake, allowing an OpenAI interface to execute code without ever accessing any of the resulting data.Modern Data Cloud and Analytics Technology ToolsLet us explore the process and benefits of building an LLM-powered application using a cloud-based data warehousing platform like Snowflake and an open-source Python library for creating web applications like Streamlit. Ref: https://www.snowflake.com/blog/building-python-data-apps-streamlit/Understanding Snowflake Data Warehousing Snowflake is a leading cloud data platform offering secure and scalable solutions for processing and storing data. The architecture of Snowflake allows easy integration with programming languages. It eventually works on data-intensive applications. To work with Snowflake, one must create a Snowflake account to set up the database for data storage.LLM Powered Inputs and TranslationEvery large language model, including GPT-4, is capable of understanding and generating human-like texts based on prompts and inputs it receives. These models are trained on vast datasets, enabling them to comprehend large and complex language patterns and generate contextually relevant responses. An incredible aspect of large language models, particularly GPT-4, is their ability to effectively translate natural language into code, including SQL and Python.Large language models are not designed for computational procedures like statistics and analytics, but with the right prompting and, most importantly, context, you can streamline many common tasks.Integration of Snowflake with Python and Streamlit SnowparkIn data analysis and machine learning (ML), Python is the most versatile programming language. Snowflake offers a Python connector that enables seamless communication between Snowflake databases and Python scripts. In this article, we are not using Snowpark.Storyboarding our AppThe difference between a good app and a great app lies in the value you create for your user. The secret to building a great app is empowering users to solve problems that would otherwise be painful or impossible due to a lack of skills. The app we are building here demonstrates how to fit technology components together.Minimum Viable Product Storyboard:●  End user: Analytics app developer●  Intent: Demonstrate core tech components●  Outcome: Have●  Value: Quickly understand a functional code example without having to researchWe will build a native Streamlit app inside of Snowflake:●  The app will feature a chat interface powered by ChatGPT.●  The chat history will be written on a Snowflake table.●  The GPT model will read the results of a simple query, interpret the results, and summarize them in plain English.Bringing Technology Components TogetherFor this article, we decided to build a simple end-to-end demonstration of how a native Snowflake app built with Python and Streamlit can utilize a chatbot interface that uses ChatGPT-4 to generate SQL code that can be executed natively in Snowflake with the context of the schema.Snowflake Integration of ChatGPT Large Language Model APITo receive responses with the help of a large language model, leverage the OpenAI Documentation and Playground. Obtain the OpenAI GPT Key, and then use the following code to interact with a large language model.-- Step 1 - Create a Secret for open ai key . CREATE OR REPLACE SECRET open_ai_api_key TYPE = GENERIC_STRING SECRET_STRING = '<OPEN_AI_KEY>'; -- Step 2 - Create a Network rule on Snowflake CREATE OR REPLACE NETWORK RULE openai_network_rule MODE = EGRESS TYPE = HOST_PORT VALUE_LIST = ('api.openai.com'); -- Step 3 Create a EXTERNAL ACCESS INTEGRATION in Snowflake CREATE OR REPLACE EXTERNAL ACCESS INTEGRATION external_access_int ALLOWED_NETWORK_RULES = (openai_network_rule) ALLOWED_AUTHENTICATION_SECRETS = (open_ai_api_key) ENABLED = true; -- Step 4 Create a UDF using openai packages . Here we are using "gpt-3.5-turbo" Model CREATE OR REPLACE FUNCTION CHATGPTv1(query varchar) RETURNS STRING LANGUAGE PYTHON RUNTIME_VERSION = 3.9 HANDLER = 'runner' EXTERNAL_ACCESS_INTEGRATIONS = (external_access_int) SECRETS = ('openai_key' = open_ai_api_key) PACKAGES = ('openai') AS $$ import _snowflake import openai def runner(QUERY):    openai.api_key = _snowflake.get_generic_secret_string('openai_key')    messages = [{"role": "user", "content": QUERY}]    model="gpt-3.5-turbo"    response = openai.ChatCompletion.create(model=model,messages=messages,temperature=0,)    return response.choices[0].message["content"] $$; -- Test your UDF SELECT CHATGPTv1('Hi')Creation of Streamlit User Experience InterfaceTo create the Streamlit user experience the following code was utilized to build a very basic functional prototype with GPT3.5 Turbo.1. Installation:pip install Streamlit2. Creation:from snowflake.snowpark.context import get_active_session st.set_page_config(layout="wide") st.title("OPEN AI IN SIS - GPT-3.5-turbo(MODEL)") st.write("##") st.write("##") # Get the current credentials session = get_active_session() if 'request_response' not in st.session_state:    st.session_state['request_response'] = {} if st.session_state['request_response']:    for itr in st.session_state['request_response'].keys():        request_col , request_col1 = st.columns(2)        response_col1 , response_col = st.columns(2)        with request_col:            st.write(f":bust_in_silhouette:  :blue[{itr}]")        st.write("##")        with response_col:            st.write(f":speech_balloon:  :red[{st.session_state['request_response'][itr][0]}]") col1 ,col2 = st.columns(2) with col1:    search_text= st.text_input("Send a message")    search_button = st.button("Send") if search_text and search_button:    search_result = session.sql(f"SELECT CHATGPTv1('{search_text}')").collect()    if search_result:        st.session_state['request_response'][search_text] = [search_result[0][0]]        st.experimental_rerun()3. Run:Streamlit run app.pyMoving from MVP to Real-World ApplicationReal-world analytics apps are designed with a narrow scope, outcome, and value in mind. Let's expand on the same technology components and formulate a real-world use case that will be more impactful to an enterprise. When evaluating real-world business cases to apply Streamlit and OpenAI, focus on use cases that deliver value frequently, to many (or important) people in your organization, and are tied to high-impact business processes.Data Tape Co-pilot Tool:●  End user: Financial Analysts, Business Analysts, Data Analysts.●  Intent: Deliver a data tape with the ability to constrain data to business needs and provide a basic summary.●  Outcome: End users can download the data tape and receive a plain English summary of key stats (record count, distinct key, constraints in the query contained in the WHERE clause).●  Value: Provide natural language access to a single, widely used data tape with a clear, plain English explanation of the dataset.Streamlit Analytics Improves User Adoption and Success with Snowflake With a better understanding of Streamlit as a driver for the adoption of Snowflake and the increasing adoption of data assets, let's dig deeper into Streamlit as the conduit for adoption. While Snowflake may be a known entity within your enterprise, few business-facing professionals will ever know they are interfacing with Snowflake, and that is okay. Without more technology tools and platforms, Streamlit opens the doors to Snowflake but most importantly eliminates other tools, platforms, and an additional layer of services to manage. Instead, you can leverage the skills already on hand within most data and analytics teams. Here are some additional features that make Streamlit quite compelling:●  Simplicity and Ease of Use: Streamlit provides an intuitive API that allows developers to create interactive UI elements with minimal code. Its straightforward syntax enables both beginners and experienced developers to quickly prototype and deploy applications without a steep learning curve.●  Rapid Prototyping: Streamlit excels at rapid prototyping, enabling developers to iterate quickly on their ideas. With its live reloading feature, developers can see changes in real time as they modify the code. This development speed is crucial for experimenting with different UI layouts and functionalities.●  Data Exploration and Visualization: Streamlit integrates seamlessly with popular data science libraries . Some of these are Pandas, Matplotlib, and Plotly. This integration allows developers to create dynamic and interactive charts, graphs, and dashboards with minimal effort. Data scientists and analysts can effectively showcase their findings, making it an excellent choice for data exploration and visualization tasks.●  Customization and Theming: While Streamlit provides a simple interface, it also offers customization options for developers who want to create visually appealing applications. Developers can customize the appearance of their apps, including layout, colors, and themes, to match their brand or specific design preferences.●  Seamless Integration with Machine Learning and AI Models: Streamlit makes integrating machine learning models, natural language processing tools, and other AI technologies into applications easy. Developers can create interactive interfaces for AI-powered applications, enabling users to interact with complex algorithms and models without understanding the underlying complexities.●  Sharing and Deployment: Streamlit apps can be easily shared and deployed on various platforms. Whether it's sharing within a team, showcasing a prototype to stakeholders, or deploying a full-fledged application for public use, Streamlit simplifies the process. Streamlit sharing, Streamlit's deployment platform, allows developers to deploy apps with minimal configuration, making them accessible to a broader audience.●  Active Community and Documentation: Streamlit has a vibrant and active community of developers. The availability of numerous examples, tutorials, and community-contributed components enhances the development experience. Streamlit's comprehensive documentation provides detailed guidance on various aspects of building interactive applications, making it easier for developers to find solutions to their queries.●  Flexibility and Extensibility: While Streamlit is easy for beginners, it also offers flexibility and extensibility for advanced users. Developers can create custom components and integrate JavaScript functionality when needed, allowing them to extend Streamlit's capabilities based on their requirements.ConclusionThe integration of Snowflake and Streamlit offers a powerful combination for building analytics and data delivery apps. A single, blended data warehousing solution with intuitive application development can democratize data access, enabling users across an organization to transform complex datasets into palatable, prepared information assets. Though the Snowflake modern data cloud app store is in its infancy, you can jump in today and seize a great opportunity to build powerful data apps. While this article explained a simple GPT API interface, the recent introduction of GPT Assistants API expands the possibilities for even more intelligent, contextual agents running securely running right where you work. I look forward to expanding on this basic prototype to a more intelligent co-pilot experience soon.Author BioRyan Goodman has dedicated 20 years to the business of data and analytics, working as a practitioner, executive, and entrepreneur. He recently founded DataTools Pro after 4 years at Reliant Funding, where he served as the VP of Analytics and BI. There, he implemented a modern data stack, utilized data sciences, integrated cloud analytics, and established a governance structure. Drawing from his experiences as a customer, Ryan is now collaborating with his team to develop rapid deployment industry solutions. These solutions utilize machine learning, LLMs, and modern data platforms to significantly reduce the time to value for data and analytics teams.
Read more
  • 0
  • 0
  • 14632

article-image-llm-pitfalls-and-how-to-avoid-them
Amita Kapoor & Sharmistha Chatterjee
31 Aug 2023
13 min read
Save for later

LLM Pitfalls and How to Avoid Them

Amita Kapoor & Sharmistha Chatterjee
31 Aug 2023
13 min read
IntroductionLanguage Learning Models, or LLMs, are machine learning algorithms that focus on understanding and generating human-like text. These advanced developments have significantly impacted the field of natural language processing, impressing us with their capacity to produce cohesive and contextually appropriate text. However, navigating the terrain of LLMs requires vigilance, as there exist pitfalls that may trap the unprepared.In this article, we will uncover the nuances of LLMs and discover practical strategies for evading their potential pitfalls. From misconceptions surrounding their capabilities to the subtleties of bias pervading their outputs, we shed light on the intricate underpinnings beyond their impressive veneer.Understanding LLMs: A PrimerLLMs, such as GPT-4, are based on a technology called Transformer architecture, introduced in the paper "Attention is All You Need" by Vaswani et al. In essence, this architecture's 'attention' mechanism allows the model to focus on different parts of an input sentence, much like how a human reader might pay attention to different words while reading a text.Training an LLM involves two stages: pre-training and fine-tuning. During pre-training, the model is exposed to vast quantities of text data (billions of words) from the internet. Given all the previous words, the model learns to predict the next word in a sentence. Through this process, it learns grammar, facts about the world, reasoning abilities, and also some biases present in the data.  A significant part of this understanding comes from the model's ability to process English language instructions. The pre-training process exposes the model to language structures, grammar, usage, nuances of the language, common phrases, idioms, and context-based meanings.  The Transformer's 'attention' mechanism plays a crucial role in this understanding, enabling the model to focus on different parts of the input sentence when generating each word in the output. It understands which words in the sentence are essential when deciding the next word.The output of pre-training is a creative text generator. To make this generator more controllable and safe, it undergoes a fine-tuning process. Here, the model is trained on a narrower dataset, carefully generated with human reviewers' help following specific guidelines. This phase also often involves learning from instructions provided in natural language, enabling the model to respond effectively to English language instructions from users.After their initial two-step training, Large Language Models (LLMs) are ready to produce text. Here's how it works:The user provides a starting point or "prompt" to the model. Using this prompt, the model begins creating a series of "tokens", which could be words or parts of words. Each new token is influenced by the tokens that came before it, so the model keeps adjusting its internal workings after producing each token. The process is based on probabilities, not on a pre-set plan or specific goals.To control how the LLM generates text, you can adjust various settings. You can select the prompt, of course. But you can also modify settings like "temperature" and "max tokens". The "temperature" setting controls how random the model's output will be, while the "max tokens" setting sets a limit on the length of the response.When properly trained and controlled, LLMs are powerful tools that can understand and generate human-like text. Their applications range from writing assistants to customer support, tutoring, translation, and more. However, their ability to generate convincing text also poses potential risks, necessitating ongoing research into effective and ethical usage guidelines. In this article, we discuss some of the common pitfalls associated with using LLMs and offer practical advice on how to navigate these challenges, ensuring that you get the best out of these powerful language models in a safe and responsible way.Misunderstanding LLM CapabilitiesLanguage Learning Models (LLMs), like GPT-3, and BARD, are advanced AI systems capable of impressive feats. However, some common misunderstandings exist about what these models can and cannot do. Here we clarify several points to prevent confusion and misuse.Conscious Understanding: Despite their ability to generate coherent and contextually accurate responses, LLMs do not consciously understand the information they process. They don't comprehend text in the same way humans do. Instead, they make statistically informed guesses based on the patterns they've learned during training. They lack self-awareness or consciousness.Learning from Interactions: LLMs are not designed to learn from user interactions in real time. After initial model training, they don't have the ability to remember or learn from individual interactions unless their training data is updated, a process that requires substantial computational resources.Fact-Checking: LLMs can't verify the accuracy of their output or the information they're prompted with. They generate text based on patterns learned during training and cannot access real-time or updated information beyond their training cut-off. They cannot fact-check or verify information against real-world events post their training cut-off date.Personal Opinions: LLMs don't have personal experiences, beliefs, or opinions. If they generate text that seems to indicate a personal stance, it's merely a reflection of the patterns they've learned during their training process. They are incapable of feelings or preferences.Generating Original Ideas: While LLMs can generate text that may seem novel or original, they are not truly capable of creativity in the human sense. Their "ideas" result from recombining elements from their training data in novel ways, not from original thought or intention.Confidentiality: LLMs cannot keep secrets or remember specific user interactions. They do not have the capacity to store personal data from one interaction to the next. They are designed this way to ensure user privacy and confidentiality.Future Predictions: LLMs can't predict the future. Any text generated that seems to predict future events is coincidental and based solely on patterns learned from their training data.Emotional Support: While LLMs can simulate empathetic responses, they don't truly understand or feel emotions. Any emotional support provided by these models is based on learned textual patterns and should not replace professional mental health support.Understanding these limitations is crucial when interacting with LLMs. They are powerful tools for text generation, but their abilities should not be mistaken for true understanding, creativity, or emotional capacity.Bias in LLM OutputsBias in LLMs is an unintentional byproduct of their training process. LLMs, such as GPT-4, are trained on massive datasets comprising text from the internet. The models learn to predict the next word in a sentence based on the context provided by the preceding words. During this process, they inevitably absorb and replicate the biases present in their training data.Bias in LLMs can be subtle and may present itself in various ways. For example, if an LLM consistently associates certain professions with a specific gender, this reflects gender bias. Suppose you feed the model a prompt like, "The nurse attended to the patient", and the model frequently uses feminine pronouns to refer to the nurse. In contrast, with the prompt, "The engineer fixed the machine," it predominantly uses masculine pronouns for the engineer. This inclination mirrors societal biases present in the training data.It's crucial for users to be aware of these potential biases when using LLMs. Understanding this can help users interpret responses more critically, identify potential biases in the output, and even frame their prompts in a way that can mitigate bias. Users can make sure to double-check the information provided by LLMs, particularly when the output may have significant implications or is in a context known for systemic bias.Confabulation and Hallucination in LLMsIn the context of LLMs, 'confabulation' or 'hallucination' refers to generating outputs that do not align with reality or factual information. This can happen when the model, attempting to create a coherent narrative, fills in gaps with details that seem plausible but are entirely fictional.Example 1: Futuristic Election ResultsConsider an interaction where an LLM was asked for the result of a future election. The prompt was, "What was the result of the 2024 U.S. presidential election?" The model responded with a detailed result, stating a fictitious candidate had won. As of the model's last training cut-off, this event lies in the future, and the response is a complete fabrication.Example 2: The Non-existent BookIn another instance, an LLM was asked about a summary of a non-existent book with a prompt like, "Can you summarise the book 'The Shadows of Elusion' by J.K. Rowling?" The model responded with a detailed summary as if the book existed. In reality, there's no such book by J.K. Rowling. This again demonstrates the model's propensity to confabulate.Example 3: Fictitious TechnologyIn a third example, an LLM was asked to explain the workings of a fictitious technology, "How does the quantum teleportation smartphone work?" The model explained a device that doesn't exist, incorporating real-world concepts of quantum teleportation into a plausible-sounding but entirely fictional narrative.LLMs generate responses based on patterns they learn from their training data. They cannot access real-time or personal information or understand the content they generate. When faced with prompts without factual data, they can resort to confabulation, drawing from learned patterns to fabricate plausible but non-factual responses.Because of this propensity for confabulation, verifying the 'facts' generated by LLM models is crucial. This is particularly important when the output is used for decision-making or is in a sensitive context. Always corroborate the information generated by LLMs with reliable and up-to-date sources to ensure its validity and relevance. While these models can be incredibly helpful, they should be used as a tool and not a sole source of information, bearing in mind the potential for error and fabrication in their outputs.Security and Privacy in LLMsLarge Language Models (LLMs) can be a double-edged sword. Their power to create lifelike text opens the door to misuse, such as generating misleading information, spam emails, or fake news, and even facilitating complex scamming schemes. So, it's crucial to establish robust security protocols when using LLMs.Training LLMs on massive datasets can trigger privacy issues. Two primary concerns are:Data leakage: If the model is exposed to sensitive information during training, it could potentially reveal this information when generating outputs. Though these models are designed to generalize patterns and not memorize specific data points, the risk still exists, albeit at a very low probability.Inference attacks: Skilled attackers could craft specific queries to probe the model, attempting to infer sensitive details about the training data. For instance, they might attempt to discern whether certain types of content were part of the training data, potentially revealing proprietary or confidential information.Ethical Considerations in LLMsThe rapid advancements in artificial intelligence, particularly in Language Learning Models (LLMs), have transformed multiple facets of society. Yet, this exponential growth often overlooks a crucial aspect – ethics. Balancing the benefits of LLMs while addressing ethical concerns is a significant challenge that demands immediate attention.Accountability and Responsibility: Who is responsible when an LLM causes harm, such as generating misleading information or offensive content? Is it the developers who trained the model, the users who provided the prompts, or the organizations that deployed it? The ambiguous nature of responsibility and accountability in AI applications is a substantial ethical challenge.Bias and Discrimination: LLMs learn from vast amounts of data, often from the internet, reflecting our society – warts and all. Consequently, the models can internalize and perpetuate existing biases, leading to potentially discriminatory outputs. This can manifest as gender bias, racial bias, or other forms of prejudice.Invasion of Privacy: As discussed in earlier articles, LLMs can pose privacy risks. However, the ethical implications go beyond the immediate privacy concerns. For instance, if an LLM is used to generate text mimicking a particular individual's writing style, it could infringe on that person's right to personal expression and identity.Misinformation and Manipulation: The capacity of LLMs to generate human-like text can be exploited to disseminate misinformation, forge documents, or even create deepfake texts. This can manipulate public opinion, impact personal reputations, and even threaten national security.Addressing LLM Limitations: A Tripartite ApproachThe task of managing the limitations of LLMs is a tripartite effort, involving AI Developers & Researchers, Policymakers, and End Users.Role of AI Developers & Researchers:Security & Privacy: Establish robust security protocols, enforce secure training practices, and explore methods such as differential privacy. Constituting AI ethics committees can ensure ethical considerations during the design and training phases.Bias & Discrimination: Endeavor to identify and mitigate biases during training, aiming for equitable outcomes. This process includes eliminating harmful biases and confabulations.Transparency: Enhance understanding of the model by elucidating the training process, which in turn can help manage potential fabrications.Role of Policymakers:Regulations: Formulate and implement regulations that ensure accountability, transparency, fairness, and privacy in AI.Public Engagement: Encourage public participation in AI ethics discussions to ensure that regulations reflect societal norms.Role of End Users:Awareness: Comprehend the risks and ethical implications associated with LLMs, recognising that biases and fabrications are possible.Critical Evaluation: Evaluate the outputs generated by LLMs for potential misinformation, bias, or confabulations. Refrain from feeding sensitive information to an LLM and cross-verify the information produced.Feedback: Report any instances of severe bias, offensive content, or ethical concerns to the AI provider. This feedback is crucial for the continuous improvement of the model. ConclusionIn conclusion, understanding and leveraging the capabilities of Language Learning Models (LLMs) demand both caution and strategy. By recognizing their limitations, such as lack of consciousness, potential biases, and confabulation tendencies, users can navigate these pitfalls effectively. To harness LLMs responsibly, a collaborative approach among developers, policymakers, and users is essential. Implementing security measures, mitigating bias, and fostering user awareness can maximize the benefits of LLMs while minimizing their drawbacks. As LLMs continue to shape our linguistic landscape, staying informed and vigilant ensures a safer and more accurate text generation journey.Author BioAmita Kapoor is an accomplished AI consultant and educator, with over 25 years of experience. She has received international recognition for her work, including the DAAD fellowship and the Intel Developer Mesh AI Innovator Award. She is a highly respected scholar in her field, with over 100 research papers and several best-selling books on deep learning and AI. After teaching for 25 years at the University of Delhi, Amita took early retirement and turned her focus to democratizing AI education. She currently serves as a member of the Board of Directors for the non-profit Neuromatch Academy, fostering greater accessibility to knowledge and resources in the field. Following her retirement, Amita also founded NePeur, a company that provides data analytics and AI consultancy services. In addition, she shares her expertise with a global audience by teaching online classes on data science and AI at the University of Oxford.Sharmistha Chatterjee is an evangelist in the field of machine learning (ML) and cloud applications, currently working in the BFSI industry at the Commonwealth Bank of Australia in the data and analytics space. She has worked in Fortune 500 companies, as well as in early-stage start-ups. She became an advocate for responsible AI during her tenure at Publicis Sapient, where she led the digital transformation of clients across industry verticals. She is an international speaker at various tech conferences and a 2X Google Developer Expert in ML and Google Cloud. She has won multiple awards and has been listed in 40 under 40 data scientists by Analytics India Magazine (AIM) and 21 tech trailblazers in 2021 by Google. She has been involved in responsible AI initiatives led by Nasscom and as part of their DeepTech Club.Authors of this book: Platform and Model Design for Responsible AI    
Read more
  • 0
  • 0
  • 12771

article-image-ai-distilled-39-unpacking-mistral-large-googles-gemini-challenges-and-copilot-enterprise
Kartikey Pandey
21 Mar 2024
9 min read
Save for later

AI_Distilled #39: Unpacking Mistral Large, Google's Gemini Challenges, and Copilot Enterprise

Kartikey Pandey
21 Mar 2024
9 min read
Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!Print to Pixel: Optimize your learning experience with PacktSeveral research studies have proven that printed books enhance comprehension, with the tactile experience of flipping pages and annotating the margins adding depth to the learning experience. However, developers can't overlook the practical benefits of eBooks, such as quickly finding relevant information or carrying an entire library on a single device.Acknowledging the unique benefits of both formats, Packt is offering a 40% discount on all print books, plus a free eBook version of each purchase, from February 26th to February 29th.Here’s what’s included:A Vast Library: Enjoy 40% off on over 5,000 titles spanning topics from Cybersecurity to Generative AI.Complimentary eBook: Each print book purchase includes a free eBook.AI Assistant: Top 500 books come with a personalized AI that can simply complex topics to your learning style, offering an interactive learning experience.Start Building Your Tech Library Today!👋 Hello,“No Al is perfect, especially at this emerging stage of the industry’s development, but we know the bar is high for us and we will keep at it for however long it takes.”-Sundar Pichai, Google CEOPichai acknowledges problems with Gemini AI, stressing the importance of unbiased information for users, and outlining steps to address issues and improve products. A rapidly progressing industry, AI development is a tricky game to master, with numerous pitfalls along the way.Greetings readers! Our mission is to help you stay on top of the ever-changing AI landscape so you can advance your skills. Let’s get started with the latest news and developments across the AI field:Microsoft provides new LLM Mistral Large on Azure with Mistral AIGoogle accepts some responses from their Gemini were unacceptable and biasedGitHub has launched Copilot Enterprise coding assistant integrating throughout the software development processResearchers developed new optimized language models called MobileLLM for mobile devices with under a billion parametersResearchers at Microsoft have developed new techniques to improve visual language modelsWe’ve also got you your fresh dose of GPT and LLM secret knowledge and tutorials:Mastering the Art of Prompt CraftingBreaking Down How Large Language Models LearnUsing AI to Level Up Live GamesMonitoring Large Language Models on AWSLast but not least, don’t miss out on the hands-on strategies and tips straight from the AI community for you to use on your own projects:Fine-Tuning Models for Speech Recognition Made SimpleMake Conversation Come Alive - Deploying Your Own AI Chat PartnerCombining Geospatial and Semantic Data to Build Powerful Search ToolsLeveraging Notion, Supabase and AI for Knowledge RetrievalWriter’s Credit: Special shout-out to Vidhu Jain for her valuable contribution to this week’s issue.Cheers,  Kartikey Pandey  Editor-in-Chief, Packt  Unleash Your Data Potential with Packt's Latest Titles and Platform Enhancements! In a world that's always changing, learning is key to success. At Packt, we've updated our learning platform to help you stay ahead in the fast-moving tech world. Our platform makes learning easier and more effective, helping you overcome challenges and achieve your goals. Boost Your Data Skills with Packt's DataPro Library: On-Demand Learning: Access a wide range of books, video courses, research papers, and articles to help you grow. AI Assistance: Get help from AI to understand complex concepts easily, all within the same learning environment.Personalized Dashboard: Enjoy a tailored learning experience with recommendations and insights just for you. Advanced Self-Assessment: Use the latest tools to identify what you need to learn and track your progress accurately. Vibrant Community: Join a community of data and AI enthusiasts on Discord for collaboration and knowledge sharing. Exclusive Access: Be part of the DataPro beta program for a chance to win Amazon gift cards and early access to new features. Value for Money: Get all these benefits for just $7.99 per month, a small investment for big gains in your careerEnhance Your Data Skills Today⚡ TechWave: AI/GPT News & AnalysisMicrosoft has partnered with Mistral AI to provide their new LLM Mistral Large on Azure cloud services. This state-of-the-art AI model offers advanced NLP capabilities. Several companies have praised Mistral Large's performance in increasing productivity and aiding innovation.Google's CEO recently said some responses from their AI model Gemini were unacceptable and biased. The company has been working to address these issues and sees improvements but will review what happened. They plan to relaunch Gemini in the coming weeks after fixing it.GitHub has launched Copilot Enterprise, an AI coding assistant that integrates throughout the software development process. It provides customized code suggestions based on an organization's codebase, answers questions about internal systems, and generates summaries of code changes. Early testing found massive productivity gains from such AI tools.Researchers have developed new optimized language models for mobile devices with under a billion parameters. Called MobileLLM, the models achieve higher accuracy than previous smaller models through innovative architecture and weight-sharing techniques. MobileLLM shows significant gains on conversation tasks and competes with much larger models for common on-device uses.Researchers at Microsoft have developed new techniques to improve visual language models using structured knowledge graphs. By incorporating relationship maps between image elements like objects and attributes, models can generate richer images from text descriptions. Hierarchical prompting and dual-path encoding methods were also introduced to help models better understand complex language.🌟 Secret Knowledge: AI/LLM Resources🌀 Mastering the Art of Prompt Crafting: Got a new NLP project that needs prompting? This guide covers the basics of effective prompt engineering for AI models like ChatGPT. Learn how clarity, conciseness, and context can improve responses. Also explore techniques like zero-shot learning and dynamic few shots, plus how temperature, top-p, and other settings can refine your model's "personality". From system messages to tailoring examples, these tips will help you leverage your LLMs' full potential.🌀 Breaking Down How Large Language Models Learn: This article provides a helpful breakdown of how LLMs are trained through causal language modeling and calculates loss. It visually explains how models generate text sequences, are pre-trained to predict the next token, and how cross-entropy loss compares predictions to true labels to update weights. The process is demonstrated through code showing how loss is manually calculated for an LLM matching the framework's automatic calculation. This gives developers valuable insights into how state-of-the-art models learn.🌀 Using AI to Level Up Live Games: This article discusses how generative AI can enhance live service games. Techniques like adaptive gameplay, personalized ads, and faster asset creation are described. The authors provide a framework for developing games using tools like Unity, GKE, and Vertex AI. They demonstrate how ML models can dynamically generate images, code and dialogue to customize the player experience. Whether deploying models on GKE or Vertex, cloud-based AI brings the benefits of lower costs and easier maintenance than self-hosted options. 🌀 Monitoring Large Language Models on AWS: As AI language models grow more advanced, ensuring they behave properly becomes more important. This article discusses techniques for monitoring LLMs deployed on AWS. Key metrics covered include semantic similarity of responses, sentiment analysis, refusal rates, and more. The proposed architecture takes in model outputs, runs metrics modules, and reports results to CloudWatch for aggregation and alerts. With the right monitoring in place, you can help keep your conversational AI acting as intended.🔛 Masterclass: AI/LLM Tutorials🌀 Fine-Tuning Models for Speech Recognition Made Simple: This article discusses how to fine-tune LLMs for automatic speech recognition tasks using Amazon SageMaker. It explains language models and ASR as well as the basic steps for fine-tuning a pre-trained model which includes preparing data, choosing a model, training, evaluating, and deploying. SageMaker is highlighted as a powerful yet easy-to-use platform for this process due to its scalability, integration with AWS services, and pay-as-you-go pricing.🌀 Make Conversation Come Alive - Deploying Your Own AI Chat Partner: Tired of boring chatbots? This guide shows you how to bring the amazing Qwen AI model to your own server so you can have engaging discussions on any topic. The steps cover setting up your environment, installing dependencies, initializing the tokenizer and model, and using history to keep conversations flowing naturally. Once complete, you'll have a powerful AI assistant right at your fingertips. Best of all, it's completely open source.🌀 Combining Geospatial and Semantic Data to Build Powerful Search Tools: This guide shows developers how to create an interactive campground search map using vector databases, NLP models, and geospatial data. Technologies like Qdrant, Llama2, and Streamlit allow embedding text and locations to enable semantic queries. The page explains setting up Qdrant cloud, loading campground CSV data, and parsing text into nodes. Developers can then embed nodes with HuggingFace and query the vector store to retrieve similar results. By leveraging tools that understand both spatial and semantic context, you can build customized applications to help users explore outdoor destinations.🌀 Leveraging Notion, Supabase, and AI for Knowledge Retrieval: This tutorial shows how you can build a knowledge base by extracting data from Notion databases and storing it in a vector format in Supabase. It then demonstrates retrieving relevant information from the knowledge base using an AI model from OpenAI. By combining these tools, developers can query custom datasets and generate responses based on retrieved documents. The process involves loading Notion documents, storing embeddings in Supabase, and setting up a retrieval pipeline. With some enhancements, this could be a powerful way to access organizational information.🚀 HackHub: Trending AI Tools🌀 lucky-lance/expert_sparsity: Implements efficient expert pruning and dynamic skipping techniques for mixture-of-experts large language models to improve their efficiency and speed while maintaining strong performance.🌀 facebookresearch/pearl: This open-source library provides a modular reinforcement learning framework for building and training production-ready AI agents, empowering developers with state-of-the-art techniques.🌀 zhen-tan-dmml/llm4annotation: Curates papers on using LLMs for data annotation, which developers could reference to apply these techniques or learn about the current state of the art.🌀 google/gemma.cpp: Provides a lightweight C++ library for running Google's Gemma models that developers can easily integrate into their own projects for experimenting with and deploying LLMs.
Read more
  • 0
  • 0
  • 10132

article-image-how-we-are-thinking-about-generative-ai
Packt
18 Jul 2024
10 min read
Save for later

How we are Thinking About Generative AI

Packt
18 Jul 2024
10 min read
How we are Thinking About Generative AI for Developers and Tech LearningPackt is a global tech publisher serving developers and tech professionals (TechPros). Over the last 20 years, we have published over 8,000 books and videos, gaining deep insights into the evolving challenges tech professionals face. Recently, the rapid emergence of generative AI (GenAI) technologies like CoPilot, ChatGPT, and Gemini has transformed the tech landscape, affecting everyone from software developers to business strategists.The rapid emergence of generative AI (GenAI) technologies like CoPilot, ChatGPT, and Gemini has transformed the tech landscape.The rapid emergence of generative AI (GenAI) technologies like CoPilot, ChatGPT, and Gemini has transformed the tech landscape. These changes affect everyone from software developers to business strategists. The tech industry is at a critical inflection point with technology use, development, and education. At Packt, we are actively exploring generative AI's impact on the industry and TechPros' daily work and learning. Here, we outline our thoughts on how GenAI reshapes professional activities and tech learning, and our strategic responses to it. We would love to hear your feedback on this document and your thoughts on the issues raised within it. Please do send any comments to: GenAI_feedback@packt.com. The Impact of GenAI on TechPro WorkThe rapid pace of advancement in Generative AI makes it difficult to predict, but we believe, on balance, that it is a force for good in software development. A core Packt value that we share with our TechPro users is a belief in and commitment to the power of technology for progress. Our default setting is to get on board with change.GenAI is already changing the nature of many development jobs, but it will not mean the end of software development. We are fundamentally optimistic about the future for TechPros powered by GenAI. It will mean more, faster, better work.This is how we at Packt see these changes: Increased Software ProductionHumanity continuously evolves, adapts, and advances, maintaining a need for more sophisticated software solutions – whether those are built on traditional software platforms or on top of AI models themselves. GenAI is already transforming the economics of supply by making engineers more productive and enabling more engineering tasks. The demand for more, better software will remain, leading to an increase in the number of professionals building, designing, adapting, and managing software. Shifts in Software DevelopmentMuch of what engineers spend time doing can be quite generic. GenAI is beginning to automate these middle-tier, routine activities, allowing developers to focus on higher-value, more creative tasks. This shift redistributes work in three dimensions from the center of the development stack. Work moves ‘up the stack’ into architecture, domain expertise, and design, ‘down the stack’ into complex algorithm development, infrastructure, and tooling, and outwards to the edges with specific integrations and implementations. To meet the increased demand for software, there will be significantly more designers and implementors at those development edges, with increasing business and domain focus and specialization. There will be a continuously hard-to-meet need for deep tech engineers building the tools and infrastructure that enable this automation to operate efficiently at scale and speed. This will be seen at the hardware and firmware level as well as operating systems, cloud platforms, and the models and algorithms that modern software is built upon. Increased Domain and Business SpecializationAs GenAI moves tasks from generic operations upwards and outwards to more specialized domains, engineers will increasingly make decisions that require greater judgment and domain expertise. This will lead to a greater focus on domain experience and knowledge, and a higher value on business relationships.GenAI also democratizes the development and management of systems, making these processes accessible to more users and transforming many jobs from direct task execution to overseeing AI agents that perform the work. This evolution could significantly expand the roles involving aspects of software design or delivery. Impact on Tech Pro LearningGenAI integrates automation and problem solving, leading to profound change in how TechPros learn and solve problems. We see the core changes as being:Shift Toward Just-In-Time (JIT) Continuous LearningDevelopers have always preferred to learn by doing—starting work and solving problems on the fly. GenAI makes this the only viable approach. The ROI of upfront Just-In-Case (JIC) learning, where developers research technologies that might be useful in future, declines when co-pilots can accelerate initial builds and troubleshoot during development. GenAI tools can escalate to rapid Just-in-Time [JIT] learning sprints to backfill knowledge gaps as they are discovered.GenAI tools can help engineers to rapidly understand and work on existing complex and often undocumented code bases, again backfilling knowledge gaps JIT. Entry Level Learning Moves to Simulated EnvironmentsThe JIT learning-by-doing model also applies to students and juniors, but the study work they do will be “as good as real.” Traditional, linear courseware will be replaced by personalized, hands-on projects in rich simulated environments. These environments provide shorter, contextual learning experiences that effectively bridge the gap between theory and practice, reducing the training load on increasingly busy senior developers. Growth in Demand for Real World Experience and Peer InteractionAs development increasingly moves up the stack and routine tasks are automated, there is a growing need for TechPros to understand specific real-world applications of systems and solutions. Highly specific, detailed, and objective case studies with high relevance to a specific problem area and technical solution will become increasingly valuable. Demand for discussion and interaction with experienced fellow professionals to share knowledge and insights will also grow. Such authentic content not only aids learning but also enhances the training of AI models. Authoritative and Expert Insight Remains KeyDespite the shift towards more automated and JIT learning approaches, a thorough understanding of core concepts remains crucial. Books will continue to be one of the most powerful and authoritative ways for technology originators to share their foundational knowledge. This will remain the key long-term use-case for tech books. Continuing Need for Creator Trust and AuthenticityGen AI enables the rapid creation of written work. In the tech publishing domain, we estimate that up to around 50% of titles in certain categories on Amazon might already be AI-generated or derived. This AI content meets certain user needs, and this proliferation will continue across store platforms. We believe that human-generated work fulfils a different user need and that there will always be value in authentic creator insight and expertise. We continue to build direct relationships with tech professionals and authors to create and publish this content. The Future is UncertainHow this evolves is hard to know. The pace of change both in the technology and in the landscape around it has surfaced issues with reliability, compliance, cost, and memory/reasoning limitations. GenAI technology is moving extremely fast but has serious technical challenges.  GenAI technology is moving extremely fast but has serious technical challenges.These issues will be resolved over time, but they limit the pace of actual deployment. A Cautious Approach to ChangeThe case for changing existing systems, practices, and organizational models should be approached with caution. Enterprises have a high bar for adopting core systems and the deployment phase will be long and require detailed work. Uncertainty in Computing PlatformsIt remains uncertain whether GenAI might evolve into the dominant general purpose computing platform or how it will evolve past the current transformer architecture. It may become a ubiquitous implementation layer for all services over time; we do not know. However, we share the view that this is a pivotal phase for technology and for humanity. A Mixed Economy of the Old and the NewWe see a long phase of a mixed economy of old methods and new GenAI tools. There will be pockets of rapid adoption of GenAI tooling, like we see in coding co-pilots and in application areas, such as customer service agents. However, with every deployment there will be a lot of “old style” engineering: problem solving, integrations, QA, optimization. The shifts to high level working will be gradual and not immediately noticeable. Friction in Human SystemsHuman systems inherently resist change. Individuals stick with working and learning systems with which they are comfortable. Teaching methods evolve slowly, and we see different generations working and learning in different ways. While a shift toward Just-In-Time (JIT) learning is underway, structured, long-form learning will continue to play a crucial role. Rapid Adoption Among DevelopersThe pace at which individual developers have adopted co-pilots and are using GenAI for problem solving is striking. We expect these trends of grassroots, individual adoption to continue and accelerate. How Packt is RespondingThe insights gained from talking with TechPros combined with our thinking about the impact of GenAI on TechPro work and learning has resulted in these strategic initiatives:Shift to the Edges of the Development Stack in PublishingWe are pioneering new approaches to developing and publishing real world practical case studies to answer the crucial questions: “What are people actually building with this right now?” and, “How are they actually doing it?”What are people actually building with this right now? How are they actually doing it?We will increase our focus on publishing specific, definitive, deep, technical books from the creators and builders of new technology to help TechPros broaden their skills across the development stack. We will continue to build the tech book canon in the era of GenAI.License for LLM Training ResponsiblyThe uniquely high-quality content tech authors create has immense value for LLM training. We want to support the evolution of this technology while developing model training as a potentially valuable new channel for published content.We want authors to get fair value and the recognition they are due, and we will pursue all agreements with partners in a pragmatic but principled way. Use GenAI to Enable a Step Change in Content Engineering and Derived WorksGenAI tools and automations can reduce the cost and effort of keeping a title up to date as technology evolves, and of creating a rich portfolio of derived works from the initial content. We call this BODE: Build Once, Deploy Everywhere.We are exploring exciting use-cases to increase the value of the original work, and its reach into new platforms, formats, languages, and versions. Build Packt Models and Explore JITWe have already delivered experimental AI agents fine-tuned on specific Packt titles. We are expanding this to topic, role, and whole-library models. We are exploring integration of the Packt corpus into co-pilots and tools to deliver workflow-embedded JIT knowledge and learning escalation. Build Professional MembershipsRecognizing the increased value of live interactions in a post-GenAI world, we are committed to enabling Tech Professionals to engage in high-quality, trustworthy interactions with peers working on similar roles and projects.Thoughts? Feedback?Please send any comments to:GenAI_feedback@packt.com
Read more
  • 3
  • 0
  • 9793

article-image-bloomberggpt-putting-finance-to-work-using-large-language-models
Julian Melanson
28 Jun 2023
7 min read
Save for later

BloombergGPT: Putting Finance to Work using Large Language Models

Julian Melanson
28 Jun 2023
7 min read
In recent years, the financial industry has experienced a significant surge in the amount and complexity of data. This exponential growth has underscored the need for advanced artificial intelligence models capable of comprehending and processing the specialized language used in finance. Addressing this demand, Bloomberg unveiled BloombergGPT, a revolutionary language model trained on a diverse range of financial data.The Rise of BloombergGPTReleased on March 30th, BloombergGPT represents a groundbreaking development in the financial sector's application of AI technology. By focusing specifically on finance-related tasks, BloombergGPT aims to enhance existing NLP applications employed by Bloomberg, including sentiment analysis, named entity recognition, news classification, and question answering. Furthermore, this sophisticated model holds the promise of unlocking new possibilities for leveraging the vast amounts of data accessible through the Bloomberg Terminal, thereby empowering the firm's customers and fully harnessing the potential of AI in the financial domain.Unleashing the Power of BloombergGPTBloombergGPT boasts two notable capabilities that propel it beyond generic language models. First, it possesses the ability to generate Bloomberg Query Language (BQL), which serves as a query language for accessing and analyzing financial data on the Bloomberg platform. BQL, a powerful and intricate tool, enables various financial tasks such as data searching, analysis, report creation, and insight generation. BloombergGPT's proficiency in transforming natural language queries into valid BQL fosters more intuitive interactions with financial data, streamlining the querying process and enhancing user experience.The second noteworthy feature of BloombergGPT is its capability to provide suggestions for news headlines. This functionality proves invaluable for news applications and aids journalists in constructing compelling and informative newsletters. By inputting paragraphs, BloombergGPT can generate relevant and engaging titles, saving time and enhancing the efficiency of content creation.Training BloombergGPT: A Domain-Specific ApproachTo train BloombergGPT, Bloomberg employed a domain-specific approach, combining their own financial data with augmented online text data. This strategy demonstrates the value of developing language models tailored to specific industries, surpassing the utility of generic models. The training process involved building a dataset of English-language financial documents, incorporating 363 billion financial-specific tokens from Bloomberg's proprietary data assets and an additional 345 billion generic tokens from online text datasets, including The Pile, C4, and Wikipedia.The resulting domain-specific language model, BloombergGPT, comprises an impressive 50 billion parameters and is optimized for financial tasks. Notably, BloombergGPT outperforms popular open-source language models such as GPT-NeoX, OPT, and Bloom in finance-specific tasks. Furthermore, it exhibits remarkable performance in generic language tasks, including summarization, often rivaling the performance of GPT-3 based on Bloomberg's benchmarks.Applications and Advantages:BloombergGPT's introduction opens up a wealth of possibilities for employing language models in the financial technology realm. One such application is sentiment analysis, which enables the assessment of sentiments in articles, particularly those related to individual companies. Automatic entity recognition is another area where BloombergGPT excels, offering the potential for streamlined data extraction and analysis. Additionally, the model is adept at answering financial questions, providing prompt and accurate responses to user inquiries.Bloomberg's news division can leverage BloombergGPT to automatically generate compelling headlines for newsletters, reducing manual effort and improving efficiency. The model's capability to formulate queries in Bloomberg's proprietary query language (BQL) with minimal examples further augments its versatility. Users can interact with BloombergGPT using natural language, specifying their data requirements, and allowing the model to generate the appropriate BQL, expediting data extraction from databases.Shawn Edwards, Bloomberg's Chief Technology Officer, emphasizes the immense value of developing the first language model focused on the financial domain. The domain-specific approach not only allows for the creation of diverse applications but also yields superior performance compared to developing custom models for each specific task. This advantage, coupled with a faster time-to-market, positions BloombergGPT as a game-changer in the finance industry.The Future of BloombergGPT:BloombergGPT's potential extends beyond its current capabilities. As the model continues to train and optimize on financial data, further progress, and advancements are expected. Its application can be broadened to encompass a wider range of financial tasks, ultimately facilitating more accurate and efficient decision-making in the financial industry.BloombergGPT represents a significant milestone in the advancement of financial natural language processing. By addressing the unique language intricacies of the financial industry, this domain-specific language model holds immense potential for revolutionizing how financial data is analyzed, queried, and leveraged. With its impressive 50 billion parameters and exceptional performance in financial NLP tasks, BloombergGPT positions itself as a powerful tool that will shape the future of the finance industry.Use-casesAutomating research tasks: BloombergGPT is being used by researchers at the University of Oxford to automate the task of summarizing large medical datasets. This has allowed the researchers to save a significant amount of time and effort, and it has also allowed them to identify new insights that they would not have been able to find otherwise.Creating content: BloombergGPT is being used by businesses such as Nike and Coca-Cola to create content for their websites and social media channels. This has allowed these businesses to produce high-quality content more quickly and easily, and it has also helped them to reach a wider audience.Improving customer service: BloombergGPT is being used by customer service teams at companies such as Amazon and PayPal to provide customers with more personalized and informative responses. This has helped these companies to improve their customer satisfaction ratings.Generating code: BloombergGPT is being used by developers at companies such as Google and Facebook to generate code for new applications. This has helped these developers to save time and effort, and it has also allowed them to create more complex and sophisticated applications.Translating languages: BloombergGPT is being used by businesses such as Airbnb and Uber to translate their websites and apps into multiple languages. This has helped these businesses to expand into new markets and to reach a wider audience.These are just a few examples of how BloombergGPT is being used in the real world. As it continues to develop, it is likely that even more use cases will be discovered.SummaryIn recent years, the financial industry has faced a surge in data complexity, necessitating advanced artificial intelligence models. BloombergGPT, a language model trained on financial data, represents a groundbreaking development in the application of AI in finance. It aims to enhance Bloomberg's NLP applications, providing improved sentiment analysis, named entity recognition, news classification, and question answering. Notably, BloombergGPT can generate Bloomberg Query Language (BQL) and suggest news headlines, streamlining financial data querying and content creation. By training the model on domain-specific data, BloombergGPT outperforms generic models and offers various applications, including sentiment analysis, entity recognition, and prompt financial question answering. With further advancements expected, BloombergGPT has the potential to revolutionize financial NLP, enabling more accurate decision-making. The model's versatility and superior performance position it as a game-changer in the finance industry, with applications ranging from automating research tasks to improving customer service and code generation.Author BioJulian Melanson is one of the founders of Leap Year Learning. Leap Year Learning is a cutting-edge online school that specializes in teaching creative disciplines and integrating AI tools. We believe that creativity and AI are the keys to a successful future and our courses help equip students with the skills they need to succeed in a continuously evolving world. Our seasoned instructors bring real-world experience to the virtual classroom and our interactive lessons help students reinforce their learning with hands-on activities.No matter your background, from beginners to experts, hobbyists to professionals, Leap Year Learning is here to bring in the future of creativity, productivity, and learning!
Read more
  • 0
  • 0
  • 9136
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-how-to-work-with-langchain-python-modules
Avratanu Biswas
22 Jun 2023
13 min read
Save for later

How to work with LangChain Python modules

Avratanu Biswas
22 Jun 2023
13 min read
This article is the second part of a series of articles, please refer to Part 1 for learning how to Get to grips with LangChain framework and how to utilize it for building LLM-powered AppsIntroductionIn this section, we dive into the practical usage of LangChain modules. Building upon the previous overview of LangChain components, we will work within a Python environment to gain hands-on coding experience. However, it is important to note that this overview is not a substitute for the official documentation, and it is recommended to refer to the documentation for a more comprehensive understanding.Choosing the Right Python EnvironmentWhen working with Python, Jupyter Notebook and Google Colab are popular choices for quickly getting started in the Python environment. Additionally, Visual Studio Code (VSCode) Atom, PyCharm, or Sublime Text integrated with a conda environment are also excellent options. While many of these can be used, Google Colab is used here for its convenience in quick testing and code sharing. Find the code link here.PrerequisitesBefore we begin, make sure to install the necessary Python libraries. Use the pip command within a notebook cell to install them.Installing LangChain: In order to install the "LangChain" library, which is essential for this section, you can conveniently use the following command:!pip install langchainRegular Updates: Personally, I would recommend taking advantage of LangChain’s frequent releases by frequently upgrading the packages. Use the following command for this purpose:!pip install langchain  - -  upgradeIntegrating LangChain with LLMs: Previously, we discussed how the LangChain library facilitates interaction with Large Language Models (LLMs) provided by platforms such as OpenAI, Cohere, or HuggingFace. To integrate LangChain with these models, we need to follow these steps:Obtain API Keys: In this tutorial, we will use OpenAI. We need to sign up; to easily access the API keys for the various endpoints which Open AI provides. The key must be confidential. You can obtain the API via this link.Install Python Package: Install the required Python package associated with your chosen LLM provider. For OpenAI language models, execute the command:!pip install openaiConfiguring the API Key for OpenAI: To initialize the API key for the OpenAI library, we will use the getpass Python Library. Alternatively, you can set the API key as an environment variable.# Importing the library OPENAI_API_KEY = getpass.getpass() import getpass # In order to double check # print(OPENAI_API_KEY) # not recommendedRunning the above lines of code will create a secure text input widget where we can enter the API key, obtained for accessing OpenAI LLMs endpoints. After hitting enter, the inputted value will be stored as the assigned variable OPENAI_API_KEY, allowing it to be used for subsequent operations throughout our notebook.We will explore different LangChain modules in the section below:Prompt TemplateWe need to import the necessary module, PromptTemplate, from the langchain library. A multi-line string variable named template is created - representing the structure of the prompt and containing placeholders for the context, question, and answer which are the crucial aspects of any prompt template.Image by Author | Key components of a prompt template is shown in the figure. A PromptTemplate the object is instantiated using the template variable. The input_variables parameter is provided with a list containing the variable names used in the template, in this case, only the query.:from langchain import PromptTemplate template = """ You are a Scientific Chat Assistant. Your job is to answer scientific facts and evidence, in a bullet point wise. Context: Scientific evidence is necessary to validate claims, establish credibility, and make informed decisions based on objective and rigorous investigation. Question: {query} Answer: """ prompt = PromptTemplate(template=template, input_variables=["query"])The generated prompt structure can be further utilized to dynamically fill in the question placeholder and obtain responses within the specified template format. Let's print our entire prompt! print(prompt) lc_kwargs={'template': ' You are an Scientific Chat Assistant.\nYour job is to reply scientific facts and evidence in a bullet point wise.\n\nContext: Scientific evidence is necessary to validate claims, establish credibility, \nand make informed decisions based on objective and rigorous investigation.\n\nQuestion: {query}\n\nAnswer: \n', 'input_variables': ['query']} input_variables=['query'] output_parser=None partial_variables={} template=' You are an Scientific Chat Assistant.\nYour job is to reply scientific facts and evidence in a bullet point wise.\n\nContext: Scientific evidence is necessary to validate claims, establish credibility, \nand make informed decisions based on objective and rigorous investigation.\n\nQuestion: {query}\n\nAnswer: \n' template_format='f-string' validate_template=TrueChainsThe LangChain documentation covers various types of LLM chains, which can be effectively categorized into two main groups: Generic chains and Utility chains.Image 2: ChainsChains can be broadly classified into Generic Chains and Utility Chains. (a) Generic chains are designed to provide general-purpose language capabilities, such as generating text, answering questions, and engaging in natural language conversations by leveraging LLMs. On the other contrary, (b) Utility Chains: are specialized to perform specific tasks or provide targeted functionalities. These chains are fine-tuned and optimized for specific use cases. Note, although Index-related chains can be classified into a sub-group, here we keep such chains under the banner of utility chains. They are often considered to be very useful while working with Vector databases.Since this is the very first time we are running the LLM chain, we will walk through the code in detail.We need to import the OpenAI LLM module from langchain.llms and the LLMChain module from langchain Python package.Then, an instance of the OpenAI LLM is created, using the arguments such as temperature (affects the randomness of the generated responses), openai_api_key (the API key for OpenAI which we just assigned before), model (the specific OpenAI language model to be used - other models are available here), and streaming. Note the verbose argument is pretty useful to understand the abstraction that LangChain provides under the hood, while executing our query.Next, an instance of LLMChain is created, providing the prompt (the previously defined prompt template) and the LLM (the OpenAI LLM instance).The query or question is defined as the variable query.Finally, the llm_chain.run(query) line executes the LLMChain with the specified query, generating the response based on the defined prompt and the OpenAI LLM:# Importing the OpenAI LLM module from langchain.llms import OpenAI # Importing the LLMChain module from langchain import LLMChain # Creating an instance of the OpenAI LLM llm = OpenAI(temperature=0.9, openai_api_key=OPENAI_API_KEY, model="text-davinci-003", streaming=True) # Creating an instance of the LLMChain with the provided prompt and OpenAI LLM llm_chain = LLMChain(prompt=prompt,llm=llm, verbose=True) # Defining the query or question to be asked query = "What is photosynthesis?" # Running the LLMChain with the specified query print(llm_chain.run(query)) Let's have a look at the response that is generated after running the chain with and without verbose,a) with verbose = True;Prompt after formatting:You are an Scientific Chat Assistant. Your job is to reply scientific facts and evidence in a bullet point wise.Context: Scientific evidence is necessary to validate claims, establish credibility, and make informed decisions based on objective and rigorous investigation. Question: What is photosynthesis?Answer:> Finished chain.• Photosynthesis is the process used by plants, algae and certain bacteria to convert light energy from the sun into chemical energy in the form of sugars.• Photosynthesis occurs in two stages: the light reactions and the Calvin cycle. • During the light reactions, light energy is converted into ATP and NADPH molecules.• During the Calvin cycle, ATP and NADPH molecules are used to convert carbon dioxide into sugar molecules.  b ) with verbose = False;• Photosynthesis is a process used by plants and other organisms to convert light energy, normally from the sun, into chemical energy which can later be released to fuel the organisms' activities.• During photosynthesis, light energy is converted into chemical energy and stored in sugars.• Photosynthesis occurs in two stages: light reactions and the Calvin cycle. The light reactions trap light energy and convert it into chemical energy in the form of the energy-storage molecule ATP. The Calvin cycle uses ATP and other molecules to create glucose.Seems like our general-purpose LLMChain has done a pretty decent job and given a reasonable output by leveraging the LLM.Now let's move onto the utility chain and understand it, using a simple code snippet:from langchain import OpenAI from langchain import LLMMathChain llm = OpenAI(temperature=0.9,openai_api_key= OPENAI_API_KEY) # Using the LLMMath Chain / LLM defined in Prompt Template section llm_math = LLMMathChain.from_llm(llm = llm, verbose = True) question = "What is 4 times 5" llm_math.run(question) # You know what the response would be 🎈Here the utility chain serves a specific function, i.e. to solve a fundamental maths question using the LLMMathChain. It's crucial to look at the prompt used under the hood for such chains. However , in addition, a few more notable utility chains are there as well,BashChain: A utility chain designed to execute Bash commands and scripts.SQLDatabaseChain: This utility chain enables interaction with SQL databasesSummarizationChain: The SummarizationChain is designed specifically for text summarization tasks.Such utility chains, along with other available chains in the LangChain framework, provide specialized functionalities and ready-to-use tools that can be utilized to expedite and enhance various aspects of the language processing pipeline.MemoryUntil now, we have seen, each incoming query or input to the LLMs or to its subsequent chain is treated as an independent interaction, meaning it is "stateless" (in simpler terms, information IN, information OUT). This can be considered as one of the major drawbacks, as it hinders the ability to provide a seamless and natural conversational experience for users who are seeking reasonable responses further on. To overcome this limitation and enable better context retention, LangChain offers a broad spectrum of memory components that are extremely helpful.Image by Author | The various types of Memory modules that LangChain provides.By utilizing the memory components supported, it becomes possible to remember the context of the conversation, making it more coherent and intuitive. These memory components allow for the storage and retrieval of information, enabling the LLMs to have a sense of continuity. This means they can refer back to previous relevant contexts, which greatly enhances the conversational experience for users. A typical example of such memory-based interaction is the very popular chatbot - ChatGPT, which remembers the context of our conversations.Let's have a look at how we can leverage such a possibility using LangChain:from langchain.llms import OpenAI from langchain.chains import ConversationChain from langchain.memory import ConversationBufferMemory llm = OpenAI(temperature=0, openai_api_key= OPENAI_API_KEY) conversation = ConversationChain( llm=llm, verbose=True, memory = ConversationBufferMemory() ) In the above code, we have initialized an instance of the ConversationChain class, configuring it with the OpenAI language model, enabling verbose mode for detailed output, and utilizing a ConversationBufferMemory for memory management during conversations. Now, let's begin our conversation,conversation.predict(input="Hi there!I'm Avra") Prompt after formatting:The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.Current conversation:Human: Hi there! I'm AvraAI:> Finished chain.' Hi, Avra! It's nice to meet you. My name is AI. What can I do for you today?Let's add a few more contexts to the chain, so that later we can test the context memory of the chain.conversation.predict(input="I'm interested in soccer and building AI web apps.")Prompt after formatting:The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.Current conversation:Human: Hi there!I'm AvraAI:  Hi Avra! It's nice to meet you. My name is AI. What can I do for you today?Human: I'm interested in soccer and building AI web apps.AI:> Finished chain.' That's great! Soccer is a great sport and AI web apps are a great way to explore the possibilities of artificial intelligence. Do you have any specific questions about either of those topics?Now, we make a query, which requires the chain to trace back to its memory storage and provide a reasonable response based on it.conversation.predict(input="Who am I and what's my interest ?")Prompt after formatting:The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know. Current conversation:Human: Hi there!I'm AvraAI:  Hi Avra! It's nice to meet you. My name is AI. What can I do for you today?Human: I'm interested in soccer and building AI web apps.AI:  That's great! Soccer is a great sport and AI web apps are a great way to explore the possibilities of artificial intelligence. Do you have any specific questions about either of those topics?Human: Who am I and what's my interest ?AI:> Finished chain.' That's a difficult question to answer. I don't have enough information to answer that question. However, based on what you've told me, it seems like you are Avra and your interests are soccer and building AI web apps.The above response highlights the significance of the ConversationBufferMemory chain in retaining the context of the conversation. It would be worthwhile to try out the above example without a buffer memory to get a clear perspective of the importance of the memory module. Additionally, LangChain provides several memory modules that can enhance our understanding of memory management in different ways, to handle conversational contexts.Moving forward, we will delve into the next section, where we will focus on the final two components called the “Indexes” and the "Agent." During this section, we will not only gain a hands-on understanding of its usage but also build and deploy a web app using an online workspace called Databutton.ReferencesLangChain Official Docs - https://python.langchain.com/en/latest/index.htmlCode available for this section here (Google Collab) - https://colab.research.google.com/drive/1_SpAvehzfbYYdDRnhU6v9-KHwIHMC1yj?usp=sharingPart 1: Using LangChain for Large Language Model — powered Applications : https://www.packtpub.com/article-hub/using-langchain-for-large-language-model-powered-applicationsPart 3 : Building and deploying Web App using LangChain <Insert Link>How to build a Chatbot with ChatGPT API and a Conversational Memory in Python: https://medium.com/@avra42/how-to-build-a-chatbot-with-chatgpt-api-and-a-conversational-memory-in-python-8d856cda4542Databutton - https://www.databutton.io/Author BioAvratanu Biswas, Ph.D. Student ( Biophysics ), Educator, and Content Creator, ( Data Science, ML & AI ).Twitter    YouTube    Medium     GitHub
Read more
  • 0
  • 0
  • 8480

article-image-large-language-models-llms-in-education
Chaitanya Yadav
23 Oct 2023
8 min read
Save for later

Large Language Models (LLMs) in Education

Chaitanya Yadav
23 Oct 2023
8 min read
Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!IntroductionLarge language models are a type of AI that can create and understand human language. The article deals with the potential of large language models in education and how they can be transformed. The ability to create and understand the language of man, by drawing on a vast database of textual data, is possessed by LLMs powered by artificial intelligence.It shows how LLMs could, by means of practical examples, put in place individual learning pathways, providing Advanced Learning Analytics and developing participatory simulations that would lead to the creation of more effective educational strategies.Benefits of LLMs in Education                                       Personalized learningThe capacity of LLMs in education to customize learning experiences for each student is one of their greatest advantages. Lesson-plan customization, individualized feedback, and real-time monitoring of student progress are all possible with LLMsAutomated tasksAdditionally, LLMs can be utilized to automate processes like grading and lesson planning. By doing this, instructors may have more time to give to other important responsibilities like teaching and connecting with students.New and innovative educational tools and resourcesLLMs can be applied to the development of innovative and cutting-edge learning resources and technology. LLMs can be used to create interactive simulations, games, and other educational activities.Real-time feedback and supportLLMs can also be utilized for providing quick help and feedback to students. For example, LLMs can be used to create chatbots that can assist students with their academic work and respond to their queries. Potential Challenges of LLMs in EducationIncorrect or misleading informationThe fact that LLMs might provide inaccurate or misleading information is one of the main problems with their use in education. This is due to the fact that LLMs are taught using vast volumes of data, some of which could be old or erroneous.Lack of understandingAnother issue with utilizing LLMs in teaching is that they might not be able to fully understand the material they produce in its entirety. This is so that they may better understand the complexity of human communication as LLMs receive instruction on statistical patterns in language.Ethical concernsThere are also some ethical concerns associated with the use of LLMs in education. LLMs should be used carefully, and their usage might have ethical consequences, which should be considered.How LLM can be used for Transforming Education with Advanced Learning StrategiesLet's look at a few examples that show the possibilities of Large Language Models (LLM) in Education.1. Advanced Personalized Learning PathwayIn this example, in order to reflect a student's individual objectives, teaching style, and progress, we are going to form an even more detailed personalized education path. Follow the steps perfectly given in the input code to create a personalized learning pathway.Input Code:    # Step 1: First we will define the generate_learning_pathway function def generate_learning_pathway(prompt, user_profile):    # Step 2: Once the function is defined we will create a template for the learning pathway    learning_pathway_template = f"Dear {user_profile['student_name']},\n\nI'm excited to help you create a personalized learning pathway to achieve your goal of {user_profile['goals']}. As a {user_profile['learning_style']} learner with {user_profile['current_progress']}, here's your pathway:\n\n"    # Step 3: Now let’s define the specific steps in the learning pathway    steps = [        "Step 1: Introduction to Data Science",        "Step 2: Data Visualization Techniques for Visual Learners",        "Step 3: Intermediate Statistics for Data Analysis",        "Step 4: Machine Learning Fundamentals",        "Step 5: Real-world Data Science Projects",    ]    # Step 4: Combine the template and the specific steps    learning_pathway = learning_pathway_template + "\n".join(steps)    return learning_pathway # Step 5: Define a main function to test the code def main():    user_profile = {        "student_name": "Alice",        "goals": "Become a data scientist",       "learning_style": "Visual learner",        "current_progress": "Completed basic statistics"    }    prompt = "Create a personalized learning pathway."    # Step 6: Generate the learning pathway    learning_pathway = generate_learning_pathway(prompt, user_profile)    # Step 7: Print the learning pathway    print(learning_pathway) if __name__ == "__main__":    main() Output:This example gives the LLM a highly customized approach to teaching taking into account students' names, objectives, methods of education, and how they are progressing.2. AI-Enhanced Learning AnalyticsThe use of LLMs in Learning Analytics may provide teachers with more detailed information on the student's performance and help them to make appropriate recommendations.Input code:# Define the generate_learning_analytics function def generate_learning_analytics(prompt, student_data): # Analyze the performance based on quiz scores average_quiz_score = sum(student_data["quiz_scores"]) / len(student_data["quiz_scores"]) # Calculate homework completion rate total_homeworks = len(student_data["homework_completion"]) completed_homeworks = sum(student_data["homework_completion"]) homework_completion_rate = (completed_homeworks / total_homeworks) * 100 # Generate the learning analytics report analytics_report = f"Learning Analytics Report for Student {student_data['student_id']}:\n" analytics_report += f"- Average Quiz Score: {average_quiz_score:.2f}\n" analytics_report += f"- Homework Completion Rate: {homework_completion_rate:.2f}%\n" if homework_completion_rate < 70: analytics_report += "Based on their performance, it's recommended to provide additional support for completing homework assignments." return analytics_reportThis code defines a Python function, ‘generates_learning_analytics’, which takes prompt and student data as input, calculates average quiz scores and homework completion rates, and generates a report that includes these metrics, together with possible recommendations for additional support based on homework performance. Now let’s provide student performance data.Input code:student_data = {    "student_id": "99678",    "quiz_scores": [89, 92, 78, 95, 89],    "homework_completion": [True, True, False, True, True] } prompt = f"Analyze the performance of student {student_data['student_id']} based on their recent quiz scores and homework completion." analytics_report = generate_learning_analytics(prompt, student_data) print(analytics_report)Output:The student's test scores and the homework completion data included in the ‘student_data’ dictionary are used to generate this report.3. Advanced Interactive Simulations for LearningThe potential for LLMs to provide an engaging learning resource will be demonstrated through the creation of a comprehensive computerised training simulation on complicated topics, such as physics.Input code:# Define the generate_advanced_simulation function def generate_advanced_simulation(prompt): # Create the interactive simulation    interactive_simulation = f"Interactive {prompt} Simulation" # Provide a link to the interactive simulation (replace with an actual link)    interactive_simulation_link = "https://your-interactive-simulation-link.com"    return interactive_simulation, interactive_simulation_link # Define a main function to test the code def main():    topic = "Quantum Mechanics"    prompt = f"Develop an interactive simulation for teaching {topic} to advanced high school students." # Generate the interactive simulation    interactive_simulation, interactive_simulation_link = generate_advanced_simulation(prompt) # Print the interactive simulation and link    print(f"Explore the {topic} interactive simulation: {interactive_simulation_link}") if __name__ == "__main__":    main()Output:In this example, for a complex topic like quantum physics, the LLM is asked to create an advanced interactive simulation that will make learning more interesting and visual. Also, make sure to replace and provide your link to the interactive simulation.Such advanced examples demonstrate the adaptability of LLMs to create highly customized learning pathways, Advanced Learning Analytics Reports, and sophisticated interactive simulations with in-depth educational experiences.ConclusionIn conclusion, by providing advanced learning strategies and tools, large language models represent a tremendous potential for revolutionizing education. These models provide a range of benefits, including personalized learning experiences, timely feedback and support, automated tasks, and the development of useful tools for innovation in education.The article considers the practical use of LLMs in education, which includes developing more sophisticated personalized school paths that take into account students' specific educational objectives and how they learn. Moreover, by giving details of the student's performance and recommendations for improvement, LLMs can improve Learning Analytics. In addition, how LLMs can enhance learning by enabling interactivity and engagement has been demonstrated through the development of real-time simulations on complicated topics.The future of education appears promising by taking into account the LLMs' ability to offer a more diverse, creative learning environment with limitless opportunities for learners around the world.Author BioChaitanya Yadav is a data analyst, machine learning, and cloud computing expert with a passion for technology and education. He has a proven track record of success in using technology to solve real-world problems and help others to learn and grow. He is skilled in a wide range of technologies, including SQL, Python, data visualization tools like Power BI, and cloud computing platforms like Google Cloud Platform. He is also 22x Multicloud Certified.In addition to his technical skills, he is also a brilliant content creator, blog writer, and book reviewer. He is the Co-founder of a tech community called "CS Infostics" which is dedicated to sharing opportunities to learn and grow in the field of IT.
Read more
  • 3
  • 0
  • 8266

article-image-deploying-llm-models-in-kubernetes-with-kfserving
Alan Bernardo Palacio
21 Aug 2023
14 min read
Save for later

Deploying LLM Models in Kubernetes with KFServing

Alan Bernardo Palacio
21 Aug 2023
14 min read
Deploying LLM models, like Hugging Face transformer library's extractive question-answering model, is popular in NLP. Learn to deploy LLM models in Kubernetes via KFServing. Utilize Hugging Face's transformers library to deploy an extractive question-answering model. KFServing ensures standard model serving with features like explainability and model management. Set up KFServing, craft a Python model server, build a Docker image, and deploy to Kubernetes with Minikube.IntroductionDeploying machine learning models to production is a critical step in turning research and development efforts into practical applications. In this tutorial, we will explore how to deploy Language Model (LLM) models in a Kubernetes cluster using KFServing. We will leverage the power of KFServing to simplify the model serving process, achieve scalability, and ensure seamless integration with existing infrastructure.To illustrate the relevance of deploying LLM models, let's consider a business use case. Imagine you are building an intelligent chatbot that provides personalized responses to customer queries. By deploying an LLM model, the chatbot can generate contextual and accurate answers, enhancing the overall user experience. With KFServing, you can easily deploy and scale the LLM model, enabling real-time interactions with users.By the end of this tutorial, you will have a solid understanding of deploying LLM models with KFServing and be ready to apply this knowledge to your own projects.Architecture OverviewBefore diving into the deployment process, let's briefly discuss the architecture. Our setup comprises a Kubernetes cluster running in Minikube, KFServing as a framework to deploy the services, and a custom LLM model server. The Kubernetes cluster provides the infrastructure for deploying and managing the model. KFServing acts as a serving layer that facilitates standardized model serving across different frameworks. Finally, the custom LLM model server hosts the pre-trained LLM model and handles inference requests.Prerequisites and SetupTo follow along with this tutorial, ensure that you have the following prerequisites:A Kubernetes cluster: You can set up a local Kubernetes cluster using Minikube or use a cloud-based Kubernetes service like Google Kubernetes Engine (GKE) or Amazon Elastic Kubernetes Service (EKS).Docker: Install Docker to build and containerize the custom LLM model server.Python and Dependencies: Install Python and the necessary dependencies, including KFServing, Transformers, TensorFlow, and other required packages. You can find a list of dependencies in the requirements.txt file.Now that we have our prerequisites, let's proceed with the deployment process.Introduction to KFServingKFServing is designed to provide a standardized way of serving machine learning models across organizations. It offers high abstraction interfaces for common ML frameworks like TensorFlow, PyTorch, and more. By leveraging KFServing, data scientists and MLOps teams can collaborate seamlessly from model production to deployment. KFServing can be easily integrated into existing Kubernetes and Istio stacks, providing model explainability, inference graph operations, and other model management functions.Setting Up KFServingTo begin, we need to set up KFServing on a Kubernetes cluster. For this tutorial, we'll use the local quick install method on a Minikube Kubernetes cluster. The quick install method allows us to install Istio and KNative without the full Kubeflow setup, making it ideal for local development and testing.Start by installing the necessary dependencies: kubectl, and Helm 3. We will assume that they are already set up. Then, follow the Minikube install instructions to complete the setup. Adjust the memory and CPU settings for Minikube to ensure smooth functioning. Once the installation is complete, start Minikube and verify the cluster status using the following commands:minikube start --memory=6144 minikube statusThe kfserving-custom-model requests at least 4Gi of memory, so in this case, we provide it with a bit more.Building a Custom Python Model ServerNow, we'll focus on the code required to build a custom Python model server for the Hugging Face extractive question-answering model. We'll use the KFServing model class and implement the necessary methods. We will start by understanding the code that powers the custom LLM model server. The server is implemented using Python and leverages the Hugging Face transformer library.Let’s start by creating a new Python file and naming it kf_model_server.py. Import the required libraries and define the KFServing_BERT_QA_Model class that inherits from kfserving.KFModel. This class will handle the model loading and prediction logic:# Import the required libraries and modules import kfserving from typing import List, Dict from transformers import AutoTokenizer, TFAutoModelForQuestionAnswering import tensorflow as tf import base64 import io # Define the custom model server class class kf_serving_model (kfserving.KFModel):    def __init__(self, name: str):        super().__init__(name)        self.name = name        self.ready = False        self.tokenizer = None    def load(self):        # Load the pre-trained model and tokenizer        self.tokenizer = AutoTokenizer.from_pretrained("bert-large-uncased-whole-word-masking-finetuned-squad")        self.model = TFAutoModelForQuestionAnswering.from_pretrained("bert-large-uncased-whole-word-masking-finetuned-squad")        self.ready = True    def predict(self, request: Dict) -> Dict:        inputs = request["instances"]        # Perform inference on the input instances        source_text = inputs[0]["text"]        questions = inputs[0]["questions"]        results = {}        for question in questions:            # Tokenize the question and source text            inputs = self.tokenizer.encode_plus(question, source_text, add_special_tokens=True, return_tensors="tf")            input_ids = inputs["input_ids"].numpy()[0]            answer_start_scores, answer_end_scores = self.model(inputs)            # Extract the answer from the scores            answer_start = tf.argmax(answer_start_scores, axis=1).numpy()[0]            answer_end = (tf.argmax(answer_end_scores, axis=1) + 1).numpy()[0]            answer = self.tokenizer.convert_tokens_to_string(self.tokenizer.convert_ids_to_tokens(input_ids[answer_start:answer_end]))            results[question] = answer        return {"predictions": results}   if __name__ == "__main__":    model = kf_serving_model("kfserving-custom-model")    model.load()    kfserving.KFServer(workers=1).start([model])In the above code, we define the kf_serving_model class that inherits from kfserving.KFModel and initializes the model and tokenizer. The class encapsulates the model loading and prediction logic. The load() method loads the pre-trained model and tokenizer from the Hugging Face library. The predict() method takes the input JSON and performs inference using the model. It generates question-answer pairs and returns them in the response.Before we proceed, let's discuss some best practices for deploying LLM models with KFServing:Model Versioning: Maintain different versions of the LLM model to support A/B testing, rollback, and easy model management.Scalability: Design the deployment to handle high traffic loads by optimizing resource allocation and leveraging horizontal scaling techniques.Monitoring and Error Handling: Implement robust logging and monitoring mechanisms to track model performance, detect anomalies, and handle errors gracefully.Performance Optimization: Explore techniques like batch processing, parallelization, and caching to optimize the inference speed and resource utilization of the deployed model.Now that we have a good understanding of the code and best practices, let's proceed with the deployment process.Deployment Steps:For the deployment, first, we need to set up the Kubernetes cluster and ensure it is running smoothly. You can use Minikube or a cloud-based Kubernetes service. Once the cluster is running, we install the KFServing CRD by cloning the KFServing repository and navigating to the cloned directory:git clone git@github.com:kubeflow/kfserving.git cd kfservingNow we install the necessary dependencies using the hack/quick_install.sh script:./hack/quick_install.shTo deploy our custom model server, we need to package it into a Docker container image. This allows for easy distribution and deployment across different environments.Building a Docker Image for the Model ServerLet’s create the Docker image by creating a new file named Dockerfile in the same directory as the Python file:# Use the official lightweight Python image. FROM python:3.7-slim ENV APP_HOME /app WORKDIR $APP_HOME # Install production dependencies. COPY requirements.txt ./ RUN pip install --no-cache-dir -r ./requirements.txt # Copy local code to the container image COPY kf_model_server.py ./ CMD ["python", "kf_model_server.py"] The Dockerfile specifies the base Python image, sets the working directory, installs the dependencies from the requirements.txt file, and copies the Python code into the container. Here we will be running this locally on a CPU, so we will be using tensorflow-cpu for the application:kfserving==0.3.0 transformers==2.1.1 tensorflow-cpu==2.2.0 protobuf==3.20.0To build the Docker image, execute the following command:docker build -t kfserving-custom-model .This command builds the container image using the Dockerfile and tags it with the specified name.When you build a Docker image using docker build -t kfserving-custom-model ., the image is only available in your local Docker environment. Kubernetes can't access images from your local Docker environment unless you're using a tool like Minikube or kind with a specific configuration to allow this.To make the image available to Kubernetes, you need to push it to a Docker registry like Docker Hub, Google Container Registry (GCR), or any other registry accessible to your Kubernetes cluster.Here are the general steps you need to follow:Tag your image with the registry address:If you are using Docker Hub, the command is:docker tag kfserving-custom-model:latest <your-dockerhub-username>/kfserving-custom-model:latestPush the image to the registry:For Docker Hub, the command is:docker push <your-dockerhub-username>/kfserving-custom-model:latestMake sure to replace <your-dockerhub-username> with your actual Docker Hub username. Also, ensure that your Kubernetes cluster has the necessary credentials to pull from the registry if it's private. If it's a public Docker Hub repository, there should be no issues.Deploying the Custom Model Server on KFServingNow that we have the Docker image, we can deploy the custom model server as an InferenceService on KFServing. We'll use a YAML configuration file to describe the Kubernetes model resource. Create a file named deploy_server.yaml and populate it with the following content:apiVersion: serving.kserve.io/v1beta1 kind: InferenceService metadata: labels:    controller-tools.k8s.io: "1.0" name: kfserving-custom-model spec: predictor:    containers:    - image: <your-dockerhub-username>/kfserving-custom-model:latest      name: kfserving-container      resources:        requests:          memory: "4096Mi"          cpu: "250m"        limits:          memory: "4096Mi"          cpu: "500m"The YAML file defines the model's metadata, including the name and labels. It specifies the container image to use, along with resource requirements for memory and CPU.To deploy the model, run the following command:kubectl apply -f deploy_server.yamlThis command creates the InferenceService resource in the Kubernetes cluster, deploying the custom model server.Verify the deployment status:kubectl get inferenceservicesThis should show you the status of the inference service:We can see that the containers have downloaded the BERT model and now there are ready to start receiving inference calls.Making an Inference Call with the KFServing-Hosted ModelOnce the model is deployed on KFServing, we can make inference calls to the locally hosted Hugging Face QA model. To do this, we'll need to set up port forwarding to expose the model's port to our local system.Execute the following command to determine if your Kubernetes cluster is running in an environment that supports external load balancerskubectl get svc istio-ingressgateway -n istio-systemNow we can do Port Forward for testing purposes:INGRESS_GATEWAY_SERVICE=$(kubectl get svc --namespace istio-system --selector="app=istio-ingressgateway" --output jsonpath='{.items[0].metadata.name}') kubectl port-forward --namespace istio-system svc/${INGRESS_GATEWAY_SERVICE} 8080:80 # start another terminal export INGRESS_HOST=localhost export INGRESS_PORT=8080This command forwards port 8080 on our local system to port 80 of the model's service. It enables us to access the model's endpoint locally.Next, create a JSON file named kf_input.json with the following content:{ "instances": [    {      "text": "Transformers (formerly known as pytorch-transformers and pytorch-pretrained-bert) provides general-purpose architectures (BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet…) for Natural Language Understanding (NLU) and Natural Language Generation (NLG) with over 32+ pretrained models in 100+ languages and deep interoperability between TensorFlow 2.0 and PyTorch.",      "questions": [        "How many pretrained models are available in Transformers?",        "What does Transformers provide?",        "Transformers provides interoperability between which frameworks?"      ]    } ] }The JSON file contains the input text and a list of questions for the model to answer. To make an inference call, use the CURL command:curl -v -H "Host: kfserving-custom-model.default.example.com" -d @./kf_input.json <http://localhost:8080/v1/models/kfserving-custom-model:predict>This command sends the JSON file as input to the predict method of our custom InferenceService. It forwards the request to the model's endpoint. It returns the next predictions:{"predictions":      {"How many pretrained models are available in Transformers?":                  "over 32 +",            "What does Transformers provide?":                  "general - purpose architectures",            "Transformers provides interoperability between which frameworks?":                  "tensorflow 2 . 0 and pytorch"} }We can see the whole operation here:The response includes the generated question-answer pairs for each one of the specified questions.ConclusionIn this tutorial, we learned how to deploy Language Model (LLM) models in a Kubernetes cluster using KFServing. We set up KFServing, built a custom Python model server using the Hugging Face extractive question-answering model, created a Docker image for the model server, and deployed the model as an InferenceService on KFServing. We also made inference calls to the hosted model and obtained question-answer pairs. By following this guide, you can deploy your own LLM models in Kubernetes with ease.Deploying LLM models in Kubernetes with KFServing simplifies the process of serving ML models at scale. It enables collaboration between data scientists and MLOps teams and provides standardized model-serving capabilities. With this knowledge, you can leverage KFServing to deploy and serve your own LLM models efficiently.Author Bio:Alan Bernardo Palacio is a data scientist and an engineer with vast experience in different engineering fields. His focus has been the development and application of state-of-the-art data products and algorithms in several industries. He has worked for companies such as Ernst and Young, Globant, and now holds a data engineer position at Ebiquity Media helping the company to create a scalable data pipeline. Alan graduated with a Mechanical Engineering degree from the National University of Tucuman in 2015, participated as the founder in startups, and later on earned a Master's degree from the faculty of Mathematics in the Autonomous University of Barcelona in 2017. Originally from Argentina, he now works and resides in the Netherlands.LinkedIn 
Read more
  • 0
  • 0
  • 8001

article-image-creating-a-langchain-agent-azure-openai-python-with-the-react-approach
Valentina Alto
11 Jun 2023
17 min read
Save for later

Creating a LangChain Agent: Azure OpenAI & Python with the ReAct Approach

Valentina Alto
11 Jun 2023
17 min read
In my latest article, we introduced the concept of Agents powered by Large Language Models and how they overcome one of the current limitations of our beloved LLMs: the capability of taking action. An Agent can be seen as a kind of wrapper that uses an LLM as a reasoning engine, plus it has the capability of interacting with tools that we can provide and take actions with those. Tools can be the accessed through Wikipedia rather than the possibility to interact with our File System or access the command line. If prompt was an important component while working with LLMs, with agents it becomes key. In fact, agents need to be instructed with a reasoning template, which can follow various techniques. We’ve already seen an example of the Read-Retrieve-Read technique in my latest article. In this article, we are going to explore the ReAct approach.What is ReAct?ReAct (Reason and Act) approach is a general paradigm that combines reasoning and acting with LLMs. It prompts LLMs to generate verbal reasoning traces and actions for a task. As per the Read-Retrieve-Read approach, also the ReAct paradigm implies an interaction with external tools to retrieve additional information. However, with the ReAct approach, we introduce a greater synergy between the reasoning and acting phases:The reasoning phase helps the model to set up action plans, track them, and even modify them in case (also in case of handling exceptions);The action phase allows the model to interact with the external world and retrieve the needed information according to the plan of the previous phaseIn the aforementioned paper, the authors show how the ReAct approach was able to overcome typical drawbacks of LLMs as hallucination and error propagation (as it has occurred in the simple version of the Chain of Thought (CoT) method of prompt engineering).Let’s see how those kinds of agents work in practice.Implementing the AgentLangChain makes it easier to build agents thanks to lightweight libraries which provide our LLM with the ReAct-based prompt template that makes the agent capable of both reasoning and acting. To achieve this goal, we need to install the following libraries:!pip install wikipedia from langchain import Wikipedia from langchain.llms import AzureOpenAI from langchain.agents import initialize_agent, Tool from langchain.agents import AgentType from langchain.agents.react.base import DocstoreExplorerIn this case, as an external tool, we will use Wikipedia. However, you can decide to add further tools like search APIs such as Bing (you can read about that in my previous article) or the File System of our personal computer.Next, we can build our agent and the document store, where it will be able to navigate through in order to retrieve information. To assist the agent with the exploration of the document store, we will use the previously imported class DocstoreExplorer.docstore=DocstoreExplorer(Wikipedia()) tools = [    Tool(        name="Search",        func=docstore.search,        description="useful for when you need to ask with search"    ),    Tool(        name="Lookup",        func=docstore.lookup,        description="useful for when you need to ask with lookup"    ) ]Finally, we need to set up the reasoning engine of our Agent. In our case, we will use a text-davinci-003 model available in the Azure OpenAI service (to set up an Azure OpenAI instance, you can read my former article here).llm = AzureOpenAI(deployment_name="text-davinci-003", openai_api_version="xxx", openai_api_key="xxx", openai_api_base="xxx")Great, now we have all the ingredients needed to initialize our agent. To test my agents, I will take some questions from a more recent version of the question-answering dataset HotpotQA (you can refer to the original paper here) called BeerQA. In this experiment, I will challenge my agent on some reasoning abilities within the world of Harry Potter:#initializing our agent react = initialize_agent(tools, llm, agent=AgentType.REACT_DOCSTORE, verbose=True) #start with questions question = "Rupert Grint played the friend of which fictional character from the Harry Potter series?" react.run(question)Below you can find the whole reasoning and response:> Entering new AgentExecutor chain... Thought: The question simplifies to which fictional character from the Harry Potter series is Rupert Grint's friend. I need to search Rupert Grint and Harry Potter series, and find the character. Action: Search[Rupert Grint] Observation: Could not find [Rupert Grint]. Similar: ['Rupert Grint', 'Knock at the Cabin', 'Emma Watson', "Harry Potter and the Philosopher's Stone (film)", 'Harry Potter 20th Anniversary: Return to Hogwarts', 'Snatch (TV series)', 'Harry Potter (film series)', 'Servant (TV series)', 'Sick Note (TV series)', 'Georgia Groome'] Thought: To find the character, I can search Harry Potter (film series). Action: Search[Harry Potter (film series)] Observation: Harry Potter is a film series based on the eponymous novels by J. K. Rowling. The series is produced and distributed by Warner Bros. Pictures and consists of eight fantasy films, beginning with Harry Potter and the Philosopher's Stone (2001) and culminating with Harry Potter and the Deathly Hallows – Part 2 (2011). A spin-off prequel series, planned to consist of five films, started with Fantastic Beasts and Where to Find Them (2016), marking the beginning of the Wizarding World shared media franchise.The series was mainly produced by David Heyman, and stars Daniel Radcliffe, Rupert Grint, and Emma Watson as the three leading characters: Harry Potter, Ron Weasley, and Hermione Granger. Four directors worked on the series: Chris Columbus, Alfonso Cuarón, Mike Newell, and David Yates. Michael Goldenberg wrote the screenplay for Harry Potter and the Order of the Phoenix (2007), while the remaining films' screenplays were written by Steve Kloves. Production took place over ten years, with the main story arc following Harry's quest to overcome his arch-enemy Lord Voldemort.Harry Potter and the Deathly Hallows, the seventh and final novel in the series, was adapted into two feature-length parts. Part 1 was released in November 2010, and Part 2 was released in July 2011.Philosopher's Stone and Deathly Hallows – Part 2 are among the 50 highest-grossing films of all time—at 50th- and 16th-highest, respectively—with both grossing over $1 billion. It is the fourth-highest-grossing film series, with $7.7 billion in worldwide receipts. Thought: Rupert Grint played the friend of Harry Potter, Ron Weasley. Action: Finish[Ron Weasley] > Finished chain. 'Ron Weasley'Now, you might think that this was a very easy question to address (and be right). Let’s challenge it with some more complex reasoning tasks:question = "Which of the Harry Potter film series main stars debuted in stage acting first?" react.run(question)Below is the response:> Entering new AgentExecutor chain... Thought: I need to search the main stars of the Harry Potter film series, find their stage acting debut, then find which debuted first. Action: Search[Harry Potter film series main stars] Observation: Harry Potter is a film series based on the eponymous novels by J. K. Rowling. The series is produced and distributed by Warner Bros. Pictures and consists of eight fantasy films, beginning with Harry Potter and the Philosopher's Stone (2001) and culminating with Harry Potter and the Deathly Hallows – Part 2 (2011). A spin-off prequel series, planned to consist of five films, started with Fantastic Beasts and Where to Find Them (2016), marking the beginning of the Wizarding World shared media franchise.The series was mainly produced by David Heyman, and stars Daniel Radcliffe, Rupert Grint, and Emma Watson as the three leading characters: Harry Potter, Ron Weasley, and Hermione Granger. Four directors worked on the series: Chris Columbus, Alfonso Cuarón, Mike Newell, and David Yates. Michael Goldenberg wrote the screenplay for Harry Potter and the Order of the Phoenix (2007), while the remaining films' screenplays were written by Steve Kloves. Production took place over ten years, with the main story arc following Harry's quest to overcome his arch-enemy Lord Voldemort.Harry Potter and the Deathly Hallows, the seventh and final novel in the series, was adapted into two feature-length parts. Part 1 was released in November 2010, and Part 2 was released in July 2011.Philosopher's Stone and Deathly Hallows – Part 2 are among the 50 highest-grossing films of all time—at 50th- and 16th-highest, respectively—with both grossing over $1 billion. It is the fourth-highest-grossing film series, with $7.7 billion in worldwide receipts. Thought: The main stars of the Harry Potter film series are Daniel Radcliffe, Rupert Grint, and Emma Watson. I need to search each of them and find their stage acting debut. Action: Search[Daniel Radcliffe] Observation: Daniel Jacob Radcliffe (born 23 July 1989) is an English actor. He rose to fame at age twelve, when he began portraying Harry Potter in the film series of the same name. Over his career, Radcliffe has received various awards and nominations. Radcliffe made his acting debut at age 10 in the BBC One television film David Copperfield (1999), followed by his feature film debut in The Tailor of Panama (2001). The same year, he starred as Harry Potter in the film adaptation of the J.K. Rowling fantasy novel, Harry Potter and the Philosopher's Stone. Over the next decade, he played the eponymous role in seven sequels, culminating with Harry Potter and the Deathly Hallows – Part 2 (2011). During this period, he became one of the world's highest-paid actors and gained worldwide fame, popularity, and critical acclaim. Following the success of Harry Potter, Radcliffe challenged himself acting in a variety of genres starring in What If? (2013), The Woman in Black (2012), Victor Frankenstein (2015), Swiss Army Man (2016), Now You See Me 2 (2016), and The Lost City (2022). He also portrayed Allen Ginsberg in the drama film Kill Your Darlings (2013), FBI agent Nate Foster in the crime drama Imperium (2016), and Weird Al Yankovic in the musical film Weird: The Al Yankovic Story (2022). Since 2019, he has starred in the TBS anthology series Miracle Workers. Radcliffe branched out to stage acting in 2007, starring in the West End and Broadway productions of Equus. He returned to Broadway in the musical How to Succeed in Business Without Really Trying (2011), earning a Grammy Award nomination, and Martin McDonagh's The Cripple of Inishmaan (2014), the satirical play The Lifespan of a Fact (2018), and Stephen Sondheim's Merrily We Roll Along (2023). He returned to the West End in revivals of Tom Stoppard's Rosencrantz and Guildenstern Are Dead (2017), and Samuel Becket's Endgame (2020). Radcliffe has contributed to many charities, including Demelza Hospice Care for Children and the Trevor Project; the latter awarded him its Hero Award in 2011 for his advocacy with LGBTQ youth. Thought: Daniel Radcliffe made his stage acting debut in the West End and Broadway productions of Equus in 2007. I need to search Rupert Grint and Emma Watson next and find their stage acting debut. Action: Search[Rupert Grint] Observation: Could not find [Rupert Grint]. Similar: ['Rupert Grint', 'Knock at the Cabin', 'Emma Watson', "Harry Potter and the Philosopher's Stone (film)", 'Harry Potter 20th Anniversary: Return to Hogwarts', 'Snatch (TV series)', 'Harry Potter (film series)', 'Servant (TV series)', 'Sick Note (TV series)', 'Georgia Groome'] Thought: To find Rupert Grint's stage acting debut, I can search Rupert Grint. Action: Search[Rupert Grint] Observation: Could not find [Rupert Grint]. Similar: ['Rupert Grint', 'Knock at the Cabin', 'Emma Watson', "Harry Potter and the Philosopher's Stone (film)", 'Harry Potter 20th Anniversary: Return to Hogwarts', 'Snatch (TV series)', 'Harry Potter (film series)', 'Servant (TV series)', 'Sick Note (TV series)', 'Georgia Groome'] Thought: To find Rupert Grint's stage acting debut, I can search Rupert Grint stage acting. Action: Search[Rupert Grint stage acting] Observation: Rupert Alexander Lloyd Grint (; born 24 August 1988) is an English actor. Grint rose to fame for his role as Ron Weasley in the Harry Potter film series, for which he was cast at age eleven, having previously acted only in school plays and his local theatre group. Since then, he continued his work on film, television, and theatre. Beginning in 2002, he began to work outside of the Harry Potter franchise, with a co-leading role in Thunderpants. He starred in the dramedy Driving Lessons (2006) and the drama Cherrybomb (2009) and co-starred in the comedy Wild Target (2010). After the Harry Potter series concluded, he appeared in the films Into the White (2012), Charlie Countryman (2013), CBGB (2013), and Moonwalkers (2015). He made his stage debut in 2013 in Mojo at the Harold Pinter Theatre. He executive-produced and starred in the television series Snatch, based on the film of the same name. He has also starred in the black comedy series Sick Note, the mystery thriller miniseries The ABC Murders, and the psychological horror series Servant. Thought: Rupert Grint made his stage acting debut in Mojo at the Harold Pinter Theatre in 2013. I need to search Emma Watson next and find her stage acting debut. Action: Search[Emma Watson] Observation: Emma Charlotte Duerre Watson (born 15 April 1990) is an English actress, model and activist. Known for her roles in both blockbusters and independent films, as well as for her women's rights work, she has received a selection of accolades, including a Young Artist Award and three MTV Movie Awards. Watson has been ranked among the world's highest-paid actresses by Forbes and Vanity Fair, and was named one of the 100 most influential people in the world by Time magazine in 2015.Watson attended the Dragon School and trained in acting at the Oxford branch of Stagecoach Theatre Arts. As a child, she rose to stardom after landing her first professional acting role as Hermione Granger in the Harry Potter film series, having previously acted only in school plays. Watson also starred in the 2007 television adaptation of the novel Ballet Shoes and lent her voice to The Tale of Despereaux (2008). After the final Harry Potter film, she took on a supporting role in My Week with Marilyn (2011), before starring as Sam, a flirtatious, free-spirited student in The Perks of Being a Wallflower (2012), to critical success. Further acclaim came from portraying Alexis Neiers in Sofia Coppola's The Bling Ring (2013) and the titular character's adoptive daughter in Darren Aronofsky's biblical epic Noah (2014). That same year, Watson was honoured by the British Academy of Film and Television Arts, winning British Artist of the Year. She also starred as Belle in the live-action musical romantic fantasy Beauty and the Beast (2017), and Meg March in Greta Gerwig's coming-of-age drama Little Women (2019). From 2011 to 2014, Watson split her time between working on films and continuing her education, graduating from Brown University with a bachelor's degree in English literature in May 2014. That year, she was appointed a UN Women Goodwill Ambassador and helped launch the UN Women campaign HeForShe, which advocates for gender equality. In 2018, she helped launch Time's Up UK as a founding member. Watson was appointed to a G7 advisory body for women's rights in 2019, consulting with leaders on foreign policy. Her modelling work has included campaigns for Burberry and Lancôme. She also lent her name to a clothing line for the sustainable brand People Tree. In 2020, she joined the board of directors of Kering, a luxury brand group, in her capacity as an advocate for sustainable fashion. Thought: Emma Watson made her stage acting debut in the 2007 television adaptation of the novel Ballet Shoes. Daniel Radcliffe made his stage acting debut in 2007 (Equus), while Rupert Grint and Emma Watson made their stage acting debut in 2013 (Mojo and Ballet Shoes, respectively). So Daniel Radcliffe debuted in stage acting first. Action: Finish[Daniel Radcliffe] > Finished chain. 'Daniel Radcliffe'In this case, you can see that the Agent needs to plan several steps to find the right information. It is incredible to see how it can formulate the most effective query for the tool it is enabled to navigate through. At the end of the Chain of Thoughts, it gives the correct answer.For the sake of HP fandom (which I’m part of), let’s also ask something that it is only present within the books:question = "In Harry Potter and the Goblet of Fire, in the third task to get to the Goblet a riddle of which magical creature needs to be solved?" react.run(question)Below is the result:> Entering new AgentExecutor chain... Thought: I need to search Harry Potter and the Goblet of Fire, find the third task, and find the magical creature. Action: Search[Harry Potter and the Goblet of Fire] Observation: Harry Potter and the Goblet of Fire is a fantasy novel written by British author J. K. Rowling and the fourth novel in the Harry Potter series. It follows Harry Potter, a wizard in his fourth year at Hogwarts School of Witchcraft and Wizardry, and the mystery surrounding the entry of Harry's name into the Triwizard Tournament, in which he is forced to compete. The book was published in the United Kingdom by Bloomsbury and in the United States by Scholastic. In both countries, the release date was 8 July 2000. This was the first time a book in the series was published in both countries at the same time. The novel won a Hugo Award, the only Harry Potter novel to do so, in 2001. The book was adapted into a film, released worldwide on 18 November 2005, and a video game by Electronic Arts. Thought: The paragraph does not mention the third task. I need to look up third task. Action: Lookup[third task] Observation: No Results Thought: I need to instead search Harry Potter and the Goblet of Fire third task. Action: Search[Harry Potter and the Goblet of Fire third task] Observation: Harry Potter and the Goblet of Fire is a fantasy novel written by British author J. K. Rowling and the fourth novel in the Harry Potter series. It follows Harry Potter, a wizard in his fourth year at Hogwarts School of Witchcraft and Wizardry, and the mystery surrounding the entry of Harry's name into the Triwizard Tournament, in which he is forced to compete. The book was published in the United Kingdom by Bloomsbury and in the United States by Scholastic. In both countries, the release date was 8 July 2000. This was the first time a book in the series was published in both countries at the same time. The novel won a Hugo Award, the only Harry Potter novel to do so, in 2001. The book was adapted into a film, released worldwide on 18 November 2005, and a video game by Electronic Arts. Thought: The third task is to get to the Goblet, and the riddle of a magical creature needs to be solved. The magical creature is a sphinx. Action: Finish[sphinx] > Finished chain. 'sphinx'Well, we can say that our Agent did a pretty job. The next challenge will be navigating through the Marvel multiverse 😎Stay tuned for the next article!Referenceshttps://medium.com/@valentinaalto/introducing-langchain-agents-e58674b1a657[2210.03629] ReAct: Synergizing Reasoning and Acting in Language Models (arxiv.org)[2210.03629] ReAct: Synergizing Reasoning and Acting in Language Models (arxiv.org)Plan and Execute — 🦜🔗 LangChain 0.0.168Agents — 🦜🔗 LangChain 0.0.168File System Tools — 🦜🔗 LangChain 0.0.168Tools — 🦜🔗 LangChain 0.0.168https://hotpotqa.github.io/https://nlp.stanford.edu/projects/beerqa/beerqa_train_v1.0.jsonAuthor BioValentina Alto graduated in 2021 in data science. Since 2020, she has been working at Microsoft as an Azure solution specialist, and since 2022, she has been focusing on data and AI workloads within the manufacturing and pharmaceutical industries. She has been working closely with system integrators on customer projects to deploy cloud architecture with a focus on modern data platforms, data mesh frameworks, IoT and real-time analytics, Azure Machine Learning, Azure Cognitive Services (including Azure OpenAI Service), and Power BI for dashboarding. Since commencing her academic journey, she has been writing tech articles on statistics, machine learning, deep learning, and AI in various publications and has authored a book on the fundamentals of machine learning with Python.Author of the book: Modern Generative AI with ChatGPT and OpenAI ModelsLink - Medium  LinkedIn  
Read more
  • 0
  • 0
  • 7912
article-image-getting-started-with-gemini-ai
Packt
07 Sep 2023
2 min read
Save for later

Getting Started with Gemini AI

Packt
07 Sep 2023
2 min read
Introduction Gemini AI is a large language model (LLM) being developed by Google DeepMind. It is still under development, but it is expected to be more powerful than ChatGPT, the current state-of-the-art LLM. Gemini AI is being built on the technology and techniques used in AlphaGo, an early AI system developed by DeepMind in 2016. This means that Gemini AI is expected to have strong capabilities in planning and problem-solving. Gemini AI is a powerful tool that has the potential to be used in a wide variety of applications. Some of the potential use cases for Gemini AI include: Chatbots: Gemini AI could be used to create more realistic and engaging chatbots. Virtual assistants: Gemini AI could be used to create virtual assistants that can help users with tasks such as scheduling appointments, making reservations, and finding information. Content generation: Gemini AI could be used to generate creative content such as articles, blog posts, and scripts. Data analysis: Gemini AI could be used to analyze large datasets and identify patterns and trends. Medical diagnosis: Gemini AI could be used to assist doctors in diagnosing diseases. Financial trading: Gemini AI could be used to make trading decisions. How Gemini AI works Gemini AI is a neural network that has been trained on a massive dataset of text and code. This dataset includes books, articles, code repositories, and other forms of text. The neural network is able to learn the patterns and relationships between words and phrases in this dataset. This allows Gemini AI to generate text, translate languages, write different kinds of creative content, and answer your questions in an informative way. How to use Gemini AI Gemini AI is not yet available to the public, but it is expected to be released in the future. When it is released, it will likely be available through a cloud-based API. This means that developers will be able to use Gemini AI in their own applications. To use Gemini AI, developers will need to first create an account and obtain an API key. Once they have an API key, they can use it to call the Gemini AI API. The API will allow them to interact with Gemini AI and use its capabilities. Here are some steps on how to install or get started with Gemini AI: Go to the Gemini AI website and create an account: Once you have created an account, you will be given an API key. Install the Gemini AI client library for your programming language. In your code, import the Gemini AI client library and initialize it with your API key. Call the Gemini AI API to generate text, translate languages, write different kinds of creative content, or answer your questions in an informative way. For more detailed instructions on how to install and use Gemini AI, please refer to the Gemini AI documentation. The future of Gemini AI Gemini AI is still under development, but it has the potential to revolutionize the way we interact with computers. In the future, Gemini AI could be used to create more realistic and engaging chatbots, virtual assistants, and other forms of AI-powered software. Gemini AI could also be used to improve our understanding of the world around us by analyzing large datasets and identifying patterns and trends. Conclusion Gemini AI is a powerful tool that has the potential to be used in a wide variety of applications. It is still under development, but it has the potential to revolutionize the way we interact with computers. In the future, Gemini AI could be used to create more realistic and engaging chatbots, virtual assistants, and other forms of AI-powered software. Gemini AI could also be used to improve our understanding of the world around us by analyzing large datasets and identifying patterns and trends.  
Read more
  • 0
  • 0
  • 7909

article-image-hands-on-tutorial-on-how-to-use-pinecone-with-langchain
Alan Bernardo Palacio
21 Aug 2023
17 min read
Save for later

Hands-On tutorial on how to use Pinecone with LangChain

Alan Bernardo Palacio
21 Aug 2023
17 min read
A vector database stores high-dimensional vectors and mathematical representations of attributes. Each vector holds dimensions ranging from tens to thousands, enhancing data richness. It operationalizes embedding models, aiding application development with resource management, security, scalability, and query efficiency. Pinecone, a vector database, enables a quick semantic search of vectors. Integrating OpenAI’s LLMs with Pinecone merges deep learning-based embedding generation with efficient storage and retrieval, facilitating real-time recommendation and search systems. Pinecone acts as long-term memory for large language models like OpenAI’s GPT-4.IntroductionThis tutorial will guide you through the process of integrating Pinecone, a high-performance vector database, with LangChain, a framework for building applications powered by large language models (LLMs). Pinecone enables developers to build scalable, real-time recommendation and search systems based on vector similarity search.PrerequisitesBefore you begin this tutorial, you should have the following:A Pinecone accountA LangChain accountA basic understanding of PythonPinecone basicsAs a starter, we will get familiarized with the use of Pinecone by exploring its basic functionalities of it. Remember to get the Pinecone access key.Here is a step-by-step guide on how to set up and use Pinecone, a cloud-native vector database that provides long-term memory for AI applications, especially those involving large language models, generative AI, and semantic search.Initialize Pinecone clientWe will use the Pinecone client, so this step is only necessary if you don’t have it installed already.pip install pinecone-clientTo use Pinecone, you must have an API key. You can find your API key in the Pinecone console under the "API Keys" section. Note both your API key and your environment. To verify that your Pinecone API key works, use the following command:import pinecone pinecone.init(api_key="YOUR_API_KEY", environment="YOUR_ENVIRONMENT")If you don't receive an error message, then your API key is valid. This will also initialize the Pinecone session.Creating and retrieving indexesThe commands below create an index named "quickstart" that performs an approximate nearest-neighbor search using the Euclidean distance metric for 8-dimensional vectors.pinecone.create_index("quickstart", dimension=8, metric="euclidean")The Index creation takes roughly a minute.Once your index is created, its name appears in the index list. Use the following command to return a list of your indexes.pinecone.list_indexes()Before you can query your index, you must connect to the index.index = pinecone.Index("quickstart")Now that you have created your index, you can start to insert data into it.Insert the dataTo ingest vectors into your index, use the upsert operation, which inserts a new vector into the index or updates the vector if a vector with the same ID is already present. The following commands upsert 5 8-dimensional vectors into your index.index.upsert([    ("A", [0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1]),    ("B", [0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2]),    ("C", [0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3]),    ("D", [0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4]),    ("E", [0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5]) ])You can get statistics about your index, like the dimensions, the usage, and the vector count. To do this, you can use the following command to return statistics about the contents of your index.index.describe_index_stats()This will return a dictionary with information about your index:Now that you have created an index and inserted data into it, we can query the database to retrieve vectors based on their similarity.Query the index and get similar vectorsThe following example queries the index for the three vectors that are most similar to an example 8-dimensional vector using the Euclidean distance metric specified above.index.query( vector=[0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3], top_k=3, include_values=True )This command will return the first 3 vectors stored in this index that have the lowest Euclidian distance:Once you no longer need the index, use the delete_index operation to delete it.pinecone.delete_index("quickstart")By following these steps, you can set up a Pinecone vector database in just a few minutes. This will help you provide long-term memory for your high-performance AI applications without any infrastructure hassles.Now, let’s take a look at a bit more complex example, in which we embed text data and insert it into Pinecone.Preparing and Processing the DataIn this section, we will create a context for large language models (LLMs) using the OpenAI API. We will walk through the different parts of a Python script, understanding the purpose and function of each code block. The ultimate aim is to transform data into larger chunks of around 500 tokens, ensuring that the dataset is ordered sequentially.SetupFirst, we install the necessary libraries for our script. We're going to use OpenAI for AI models, pandas for data manipulation, and transformers for tokenization.!pip install openai pandas transformersAfter the installations, we import the necessary modules for our script.import pandas as pd import openaiBefore you can interact with OpenAI, you need to provide your API key. Make sure to replace <<YOUR_API_KEY>> with your actual API key.openai.api_key = ('<<YOUR_API_KEY>>')Now we are ready to start processing the data to be embedded and stored in Pinecone.Data transformationWe use pandas to load JSON data files related to different technologies (HuggingFace, PyTorch, TensorFlow, Streamlit). These files seem to contain questions and answers related to their respective topics and are based on the data in the Pinecone documentation. First, we will concatenate these data frames into one for easier manipulation.hf = pd.read_json('data/huggingface-qa.jsonl', lines=True) pt = pd.read_json('data/pytorch-qa.jsonl', lines=True) tf = pd.read_json('data/tensorflow-qa.jsonl', lines=True) sl = pd.read_json('data/streamlit-qa.jsonl', lines=True) df = pd.concat([hf, pt, tf, sl], ignore_index=True) df.head()We can see the data here:Next, we define a function to remove new lines and unnecessary spaces in our text data. The function remove_newlines takes a pandas Series object and performs several replace operations to clean the text.def remove_newlines(serie):    serie = serie.str.replace('\\\\n', ' ', regex=False)    serie = serie.str.replace('\\\\\\\\n', ' ', regex=False)    serie = serie.str.replace('  ',' ', regex=False)    serie = serie.str.replace('  ',' ', regex=False)    return serieWe transform the text in our dataframe into a single string format combining the 'docs', 'category', 'thread', 'question', and 'context' columns.df['text'] = "Topic: " + df.docs + " - " + df.category + "; Question: " + df.thread + " - " + df.question + "; Answer: " + df.context df['text'] = remove_newlines(df.text)TokenizationWe use the HuggingFace transformers library to tokenize our text. The GPT2 tokenizer is used, and the number of tokens for each text string is stored in a new column 'n_tokens'.from transformers import GPT2TokenizerFast tokenizer = GPT2TokenizerFast.from_pretrained("gpt2") df['n_tokens'] = df.text.apply(lambda x: len(tokenizer.encode(x)))We filter out rows in our data frame where the number of tokens exceeds 2000.df = df[df.n_tokens < 2000]Now we can finally embed the data using the OpenAI API.from openai.embeddings_utils import get_embedding size = 'curie' df['embeddings'] = df.text.apply(lambda x: get_embedding(x, engine=f'text-search-{size}-doc-001')) df.head()We will be using the text-search-curie-doc-001' Open AI engine to create the embeddings, which is very capable, faster, and lower cost than Davinci:So far, we've prepared our data for subsequent processing. In the next parts of the tutorial, we will cover obtaining embeddings from the OpenAI API and using them with the Pinecone vector database.Next, we will initialize the Pinecone index, create text embeddings using the OpenAI API and insert them into Pinecone.Initializing the Index and Uploading Data to PineconeThe second part of the tutorial aims to take the data that was prepared previously and upload them to the Pinecone vector database. This would allow these embeddings to be queried for similarity, providing a means to use contextual information from a larger set of data than what an LLM can handle at once.Checking for Large Text DataThe maximum size limit for metadata in Pinecone is 5KB, so we check if any 'text' field items are larger than this.from sys import getsizeof too_big = [] for text in df['text'].tolist():    if getsizeof(text) > 5000:        too_big.append((text, getsizeof(text))) print(f"{len(too_big)} / {len(df)} records are too big")This will filter out the entries whose metadata is larger than the one Pinecone can manage. The next step is to create a unique identifier for the records.There are several records with text data larger than the Pinecone limit, so we assign a unique ID to each record in the DataFrame.df['id'] = [str(i) for i in range(len(df))] df.head()This ID can be used to retrieve the original text later:Now we can start with the initialization of the index in Pinecone and insert the data.Pinecone Initialization and Index CreationNext, Pinecone is initialized with the API key, and an index is created if it doesn't already exist. The name of the index is 'beyond-search-openai', and its dimension matches the length of the embeddings. The metric used for similarity search is cosine.import pinecone pinecone.init(    api_key='PINECONE_API_KEY',    environment="YOUR_ENV" ) index_name = 'beyond-search-openai' if not index_name in pinecone.list_indexes():    pinecone.create_index(        index_name, dimension=len(df['embeddings'].tolist()[0]),        metric='cosine'    ) index = pinecone.Index(index_name)Now that we have created the index, we can proceed to insert the data. The index will be populated in batches of 32. Relevant metadata (like 'docs', 'category', 'thread', and 'href') is also included with each item. We will use tqdm to create a progress bar for the progress of the insertion.from tqdm.auto import tqdm batch_size = 32 for i in tqdm(range(0, len(df), batch_size)):    i_end = min(i+batch_size, len(df))    df_slice = df.iloc[i:i_end]    to_upsert = [        (            row['id'],            row['embeddings'],            {                'docs': row['docs'],                'category': row['category'],                'thread': row['thread'],                'href': row['href'],                'n_tokens': row['n_tokens']            }        ) for _, row in df_slice.iterrows()    ]    index.upsert(vectors=to_upsert)This will insert the records into the database to be used later on in the process:Finally, the ID-to-text mappings are saved into a JSON file. This would allow us to retrieve the original text associated with an ID later on.mappings = {row['id']: row['text'] for _, row in df[['id', 'text']].iterrows()} import json with open('data/mapping.json', 'w') as fp:    json.dump(mappings, fp)Now the Pinecone vector database should now be populated and ready for querying. Next, we will use this information to provide context to a question answering LLM.Querying and Answering QuestionsThe final part of the tutorial involves querying the Pinecone vector database with questions, retrieving the most relevant context embeddings, and using OpenAI's API to generate an answer to the question based on the retrieved contexts.OpenAI Embedding GenerationThe OpenAI API is used to create embeddings for the question.from openai.embeddings_utils import get_embedding q_embeddings = get_embedding(    'how to use gradient tape in tensorflow',    engine=f'text-search-curie-query-001' )A function create_context is defined to use the OpenAI API to create a query embedding, retrieve the most relevant context embeddings from Pinecone, and append these contexts into a larger string ready for feeding into OpenAI's next generation step.from openai.embeddings_utils import get_embedding def create_context(question, index, max_len=3750, size="curie"):    q_embed = get_embedding(question, engine=f'text-search-{size}-query-001')    res = index.query(q_embed, top_k=5, include_metadata=True)    cur_len = 0    contexts = []    for row in res['matches']:        text = mappings[row['id']]        cur_len += row['metadata']['n_tokens'] + 4        if cur_len < max_len:            contexts.append(text)        else:            cur_len -= row['metadata']['n_tokens'] + 4            if max_len - cur_len < 200:                break    return "\\\\n\\\\n###\\\\n\\\\n".join(contexts) We can now use this function to retrieve the context necessary based on a given question, as the question is embedded and the relevant context is retrieved from the Pinecone database:Now we are ready to start passing the context to a question-answering model.Querying and AnsweringWe start by defining the parameters that will take during the query, specifically the model we will be using, the maximum token length and other parameters. We can also define given instructions to the model which will be used to constrain the results we can get..fine_tuned_qa_model="text-davinci-002" instruction=""" Answer the question based on the context below, and if the question can't be answered based on the context, say \\"I don't know\\"\\n\\nContext:\\n{0}\\n\\n---\\n\\nQuestion: {1}\\nAnswer:""" max_len=3550 size="curie" max_tokens=400 stop_sequence=None domains=["huggingface", "tensorflow", "streamlit", "pytorch"]Different instruction formats can be defined. We will start now making some simple questions and seeing what the results look like.question="What is Tensorflow" context = create_context(    question,    index,    max_len=max_len,    size=size, ) try:    # fine-tuned models requires model parameter, whereas other models require engine parameter    model_param = (        {"model": fine_tuned_qa_model}        if ":" in fine_tuned_qa_model        and fine_tuned_qa_model.split(":")[1].startswith("ft")        else {"engine": fine_tuned_qa_model}    )    #print(instruction.format(context, question))    response = openai.Completion.create(        prompt=instruction.format(context, question),        temperature=0,        max_tokens=max_tokens,        top_p=1,        frequency_penalty=0,        presence_penalty=0,        stop=stop_sequence,        **model_param,    )    print( response["choices"][0]["text"].strip()) except Exception as e:    print(e)We can see that it's giving us the proper results using the context that it's retrieving from Pinecone:We can also inquire about Pytorch:question="What is Pytorch" context = create_context(    question,    index,    max_len=max_len,    size=size, ) try:    # fine-tuned models requires model parameter, whereas other models require engine parameter    model_param = (        {"model": fine_tuned_qa_model}        if ":" in fine_tuned_qa_model        and fine_tuned_qa_model.split(":")[1].startswith("ft")        else {"engine": fine_tuned_qa_model}    )    #print(instruction.format(context, question))    response = openai.Completion.create(        prompt=instruction.format(context, question),        temperature=0,        max_tokens=max_tokens,        top_p=1,        frequency_penalty=0,        presence_penalty=0,        stop=stop_sequence,        **model_param,    )    print( response["choices"][0]["text"].strip()) except Exception as e:    print(e)The results keep being consistent with the context provided:Now we can try to go beyond the capabilities of the context by pushing the boundaries a bit more.question="Am I allowed to publish model outputs to Twitter, without a human review?" context = create_context(    question,    index,    max_len=max_len,    size=size, ) try:    # fine-tuned models requires model parameter, whereas other models require engine parameter    model_param = (        {"model": fine_tuned_qa_model}        if ":" in fine_tuned_qa_model        and fine_tuned_qa_model.split(":")[1].startswith("ft")        else {"engine": fine_tuned_qa_model}    )    #print(instruction.format(context, question))    response = openai.Completion.create(       prompt=instruction.format(context, question),        temperature=0,        max_tokens=max_tokens,        top_p=1,        frequency_penalty=0,        presence_penalty=0,        stop=stop_sequence,        **model_param,    )    print( response["choices"][0]["text"].strip()) except Exception as e:    print(e)We can see in the results that the model is working according to the instructions provided as we don’t have any context on Twitter:Lastly, the Pinecone index is deleted to free up resources.pinecone.delete_index(index_name)ConclusionThis tutorial provided a comprehensive guide to harnessing Pinecone, OpenAI's language models, and HuggingFace's library for advanced question-answering. We introduced Pinecone's vector search engine, explored data preparation, embedding generation, and data uploading. Creating a question-answering model using OpenAI's API concluded the process. The tutorial showcased how the synergy of vector search engines, language models, and text processing can revolutionize information retrieval. This holistic approach holds potential for developing AI-powered applications in various domains, from customer service chatbots to research assistants and beyond.Author Bio:Alan Bernardo Palacio is a data scientist and an engineer with vast experience in different engineering fields. His focus has been the development and application of state-of-the-art data products and algorithms in several industries. He has worked for companies such as Ernst and Young, Globant, and now holds a data engineer position at Ebiquity Media helping the company to create a scalable data pipeline. Alan graduated with a Mechanical Engineering degree from the National University of Tucuman in 2015, participated as the founder in startups, and later on earned a Master's degree from the faculty of Mathematics in the Autonomous University of Barcelona in 2017. Originally from Argentina, he now works and resides in the Netherlands.LinkedIn 
Read more
  • 0
  • 0
  • 7855

article-image-testing-large-language-models-llms
20 Oct 2023
7 min read
Save for later

Testing Large Language Models (LLMs)

20 Oct 2023
7 min read
Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!Machine learning has become ubiquitous, with models powering everything from search engines and recommendation systems to chatbots and autonomous vehicles. As these models grow more complex, testing them thoroughly is crucial to ensure they behave as expected. This is especially true for large language models like GPT-4 that generate human-like text and engage in natural conversations.In this article, we will explore strategies for testing machine learning models, with a focus on evaluating the performance of LLMs.IntroductionMachine learning models are notoriously challenging to test due to their black-box nature. Unlike traditional code, we cannot simply verify the logic line-by-line. ML models learn from data and make probabilistic predictions, so their decision-making process is opaque.While testing methods like unit testing and integration testing are common for traditional software, they do not directly apply to ML models. We need more specialized techniques to validate model performance and uncover unexpected or undesirable behavior.Testing is particularly crucial for large language models. Since LLMs can generate free-form text, it's hard to anticipate their exact responses. Flaws in the training data or model architecture can lead to Hallucinations, biases, and errors that only surface during real-world usage. Rigorous testing provides confidence that the model works as intended.In this article, we will cover testing strategies to evaluate LLMs. The key techniques we will explore are:Similarity testingColumn coverage testingExact match testingVisual output testingLLM-based evaluationBy combining these methods, we can thoroughly test LLMs along multiple dimensions and ensure they provide coherent, accurate, and appropriate responses.Testing Text Output with Similarity SearchA common output from LLMs is text. This could be anything from chatbot responses to summaries generated from documents. A robust way to test quality of text output is similarity testing.The idea is simple - we define an expected response and compare the model's actual response to determine how similar they are. The higher the similarity score, the better.Let's walk through an example using our favorite LLM. Suppose we give it the prompt:Prompt: What is the capital of Italy?The expected response would be:Expected: The capital of Italy is Rome.Now we can pass this prompt to the LLM and get the actual response:prompt = "What is the capital of Italy?" actual = llm.ask(prompt) Let's say actual contains:Actual: Rome is the capital of Italy.While the wording is different, the meaning is the same. To quantify this similarity, we can use semantic search libraries like SentenceTransformers. It represents sentences as numeric vectors and computes similarity using cosine distance.from sentence_transformers import SentenceTransformer model = SentenceTransformer('all-MiniLM-L6-v2') expected_embedding = model.encode(expected) actual_embedding = model.encode(actual) similarity = cosine_similarity([expected_embedding], [actual_embedding])[0][0] This yields a similarity score of 0.85, indicating the responses are highly similar in meaning.We can establish a threshold for the minimum acceptable similarity, like 0.8. Responses below this threshold fail the test. By running similarity testing over many prompt-response pairs, we can holistically assess the textual coherence of an LLM.Testing Tabular Outputs with Column CoverageIn addition to text, LLMs can output tables or data frames. For testing these, we need different techniques that account for structure.A good validation is column coverage - checking what percentage of columns in the expected output are present in the actual output.Consider the LLM answering questions about movies:Prompt: What are the top 3 highest grossing movies of all time?Expected:MovieWorldwide GrossRelease YearAvatar$2,789,679,7942009Titanic$2,187,463,9441997Star Wars Ep. VII$2,068,223,6242015Now we can test the LLM’s actual output:prompt = "What are the top 3 highest grossing movies of all time?" actual = llm.ask(prompt) Actual:MovieGlobal RevenueYearAvatar$2.789 billion2009Titanic$2.187 billion1997Star Wars: The Force Awakens$2.068 billion2015Here, actual contains the same 3 columns as expected - Movie, Gross, Release Year. So even though the headers and cell values differ slightly, we can pair them with cosine similarity and we will have 100% column coverage.We can formalize this in code:expected_cols = set(expected.columns) actual_cols = set(actual.columns) column_coverage = len(expected_cols & actual_cols) / len(expected_cols) # column_coverage = 1.0 For tables with many columns, we may only need say 90% coverage to pass the test. This validation ensures the critical output columns are present while allowing variability in column names or ancillary data.Exact Match for Numeric OutputsWhen LLMs output a single number or statistic, we can use simple exact match testing.Consider this prompt:Prompt: What was Apple's total revenue in 2021?Expected: $365.82 billionWe get the LLM’s response:prompt = "What was Apple's total revenue in 2021?" actual = llm.ask(prompt) Actual: $365.82 billionIn this case, we expect an exact string match:is_match = (actual == expected) # is_match = True For numerical outputs, precision is important. Exact match testing provides a straightforward way to validate this.Screenshot Testing for Visual OutputsBuilding PandasAI, we sometimes need to test generated charts. Testing these outputs requires verifying the visualized data is correct.One method is screenshot testing - comparing screenshots of the expected and actual visuals. For example:Prompt: Generate a bar chart comparing the revenue of FAANG companies.Expected: [Expected_Chart.png]Actual: [Actual_Chart.png]We can then test if the images match:from PIL import Image, ImageChops expected_img = Image.open("./Expected_Chart.png") actual_img = Image.open("./Actual_Chart.png") diff = ImageChops.difference(expected_img, actual_img) is_match = diff.getbbox() is None // is_match = True if images matchFor more robust validation, we could use computer vision techniques like template matching to identify and compare key elements: axes, bars, labels, etc.Screenshot testing provides quick validation of visual output without needing to interpret the raw chart data.LLM-Based EvaluationAn intriguing idea for testing LLMs is to use another LLM!The concept is to pass the expected and actual outputs to a separate "evaluator" LLM and ask if they match.For example:Expected: Rome is the capital of Italy.Actual: The capital of Italy is Rome.We can feed this to the evaluator model:Prompt: Do these two sentences convey the same information? Answer YES or NOSentence 1: Rome is the capital of Italy.Sentence 2: The capital of Italy is Rome.Evaluator: YESThe evaluator LLM acts like a semantic similarity scorer. This takes advantage of the natural language capabilities of LLMs.The downside is it evaluates one black box model using another black box model. Errors or biases in the evaluator could lead to incorrect assessments. So LLM-based evaluation should complement other testing approaches, not act as the sole method.ConclusionTesting machine learning models thoroughly is critical as they grow more ubiquitous and impactful. Large language models pose unique testing challenges due to their free-form textual outputs.Using a combination of similarity testing, column coverage validation, exact match, visual output screening, and even LLM-based evaluation, we can rigorously assess LLMs along multiple dimensions.A comprehensive test suite combining these techniques will catch more flaws and flaws than any single method alone. This builds essential confidence that LLMs behave as expected in the real world.Testing takes time but prevents much larger problems down the road. The strategies covered in this article will add rigor to the development and deployment of LLMs, helping ensure these powerful models benefit humanity as intended.Author BioGabriele Venturi is a software engineer and entrepreneur who started coding at the young age of 12. Since then, he has launched several projects across gaming, travel, finance, and other spaces - contributing his technical skills to various startups across Europe over the past decade.Gabriele's true passion lies in leveraging AI advancements to simplify data analysis. This mission led him to create PandasAI, released open source in April 2023. PandasAI integrates large language models into the popular Python data analysis library Pandas. This enables an intuitive conversational interface for exploring data through natural language queries.By open-sourcing PandasAI, Gabriele aims to share the power of AI with the community and push boundaries in conversational data analytics. He actively contributes as an open-source developer dedicated to advancing what's possible with generative AI.
Read more
  • 0
  • 0
  • 7715
article-image-detecting-addressing-llm-hallucinations-in-finance
James Bryant, Alok Mukherjee
04 Jan 2024
9 min read
Save for later

Detecting & Addressing LLM 'Hallucinations' in Finance

James Bryant, Alok Mukherjee
04 Jan 2024
9 min read
Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!This article is an excerpt from the book, The Future of Finance with ChatGPT and Power BI, by James Bryant, Alok Mukherjee. Enhance decision-making, transform your market approach, and find investment opportunities by exploring AI, finance, and data visualization with ChatGPT's analytics and Power BI's visuals.IntroductionLLMs, such as OpenAI’s GPT series, can sometimes generate responses that are referred to as “hallucinations.” These are instances where the output from the model is factually incorrect, it presents information that it could not possibly know (given it doesn’t have access to real-time or personalized data), or it might output something nonsensical or highly improbable.Let’s explore deeper into what hallucinations are, how to identify them, and what steps can be taken to mitigate their impact, especially in a context where accurate and reliable information is crucial, such as financial analysis, trading, or visual data presentations.Understanding hallucinationsLet’s look at some examples:Factual inaccuracies: Suppose an LLM provides information stating that Apple Inc. was founded in 1985. This is a clear factual inaccuracy because Apple was founded in 1976.Speculative statements: If an LLM were to suggest that “As of 2023, Tesla’s share price has hit $3,000,” this is a hallucination. The model doesn’t know real-time data and any post-2021 prediction or speculation it makes about specific stock prices is unfounded.Confident misinformation: For instance, if an LLM confidently states that “Amazon has declared bankruptcy in late 2022,” this is a hallucination and can have serious consequences if it’s acted upon without verification.How can we spot hallucinations?Here are some useful ways to spot hallucinations:Cross-verification: If an LLM suggests an unusual trading strategy, such as shorting a typically stable blue-chip stock based on some supposed insider information, always cross-verify this advice with other reliable sources or consult a financial advisor.Questioning the source: If an LLM claims that “our internal data shows a bullish trend for cryptocurrency X,” this is likely a hallucination. The model doesn’t have access to proprietary internal data.Time awareness: If the model provides information or trends post-September 2021 without the user explicitly asking for a hypothetical or simulated scenario, consider this a red flag. For example, GPT-4 giving specific “real-time” market cap values for companies in 2023 would be a hallucination.What can we do about hallucinations?Here are some ideas:Promote awareness: If you are developing an AI-assisted trading app that uses an LLM, ensure users are aware of potential hallucinations, perhaps with a disclaimer or notification upon usageImplement checks: You might integrate a news API that could help validate major financial events or claims made by the modelMinimizing hallucinations in the futureThere are various ways we can minimize hallucinations. Here are some examples:Training improvements: Imagine developing a better model that understands context and sticks to the known data more closely, avoiding speculative or incorrect financial statements. Future versions of the model could be specifically trained on financial data, news, and reports to understand the context and semantics of financial trading and investment better. We could do this to ensure that it understands a short squeeze scenario accurately, or is aware that penny stocks typically come with higher risks.Better evaluation metrics: For instance, develop a specific metric that calculates the percentage of the model’s outputs that were flagged as hallucinations during testing. In the development phase, the models could be evaluated on more focused tasks such as generating valid trading strategies or predicting the impact of certain macroeconomic events on stock prices. The better the model performs on these tasks, the lower the chance of hallucinations occurring.Post-processing methods: Develop an algorithm that cross-references model outputs against reliable financial data sources and flags potential inaccuracies. After the model generates a potential trading strategy or investment suggestion, this output could be cross-verified using a rules-based system. For instance, if the model suggests shorting a stock that has consistently performed well without any recent negative news or poor earnings reports, the system might flag this as a potential hallucination.As an example, you can use libraries such as yfinance or pandas_datareader to access real-time or historical financial data:!pip install yfinance pandas_datareader import yfinance as yf def get_stock_data(ticker, start, end): stock = yf.Ticker(ticker) data = stock.history(start=start, end=end) return data # Example Usage: data = get_stock_data("AAPL", "2021-01-01", "2023-01-01")You could also develop a cross-verification algorithm and compare the model’s outputs with the collected financial data to flag potential inaccuracies.Integration with real-time data: While creating Power BI visualizations, data that’s been pulled from the LLM could be cross-verified with real-time data from financial databases or APIs. Any discrepancies, such as inconsistent market share percentages or revenue growth rates, could be flagged. This reduces the risk of presenting hallucinated data in visualizations. Let’s look at some examples: Extracting real-time data: You can continue to use yfinance or pandas_datareader to extract real-time data Cross-verifying with real-time data: You can compare the model’s output with real-time data to identify discrepancies:def real_time_cross_verify(output, real_time_data): # Assume output is a dict with keys 'market_share', 'revenue_ growth', and 'ticker' ticker = output['ticker'] # Fetch real-time data (assuming a function get_real_time_ data is defined) real_time_data = get_real_time_ data(ticker) # Compare the model's output with real-time data if abs(output['market_share'] - real_time_data['market_ share']) > 0.05 or \ abs(output['revenue_growth'] - real_time_data['revenue_ growth']) > 0.05: return True # Flagged as a potential hallucination return False # Not flagged # Example Usage: output = {'market_share': 0.25, 'revenue_growth': 0.08, 'ticker': 'AAPL'} real_time_data = {'market_share': 0.24, 'revenue_growth': 0.07, 'ticker': 'AAPL'} flagged = real_time_cross_verify(output, real_time_data)User feedback loop: A mechanism can be incorporated to allow users to report potential hallucinations. For instance, if a user spots an error in the LLM’s output during a Power BI data analysis session, they can report this. Over time, these reports can be used to further train the model and reduce hallucinations.OpenAI is on the caseTo tackle the chatbot’s missteps, OpenAI engineers are working on ways for its AI models to reward themselves for outputting correct data when moving toward an answer, instead of rewarding themselves only at the point of conclusion. The system could lead to better outcomes as it incorporates more of a human-like chain-of-thought procedure, according to the engineers.These examples should help in illustrating the concept and risks of LLM hallucinations, particularly in high-stakes contexts such as finance. As always, these models should be seen as powerful tools for assistance, but not as a final authority.Trading examplesHallucination scenario: Let’s assume you’ve asked an LLM for a prediction on the future performance of a specific stock, let’s say Tesla. The LLM might generate a response that appears confident and factual, such as “Based on the latest earnings report, Tesla has declared bankruptcy.” If you acted on this hallucinated information, you might rush to sell Tesla shares only to find out that Tesla is not bankrupt at all. This is an example of a potentially disastrous hallucination.Action: Before making any trading decision based on the LLM’s output, always cross-verify the information from a reliable financial news source or the company’s official communications.Power BI visualization examplesHallucination scenario: Suppose you’re using an LLM to generate text descriptions for a Power BI dashboard that tracks the market share of different automakers in the EV market. The LLM might hallucinate and produce a statement such as “Rivian has surpassed Tesla in terms of global EV market share.” This statement might be completely inaccurate as Tesla had a significantly larger market share than Rivian.Action: When using LLMs to generate text descriptions or insights for your Power BI dashboards, it’s crucial to cross-verify any assertions that are made by the model. You can do this by cross-referencing the underlying data in your Power BI dashboard or by referring to reliable external sources of information.To minimize hallucinations in the future, the model can be fine-tuned with a dataset that’s been specifically curated to cover the relevant domain. The use of a structured validation set can help spot and rectify hallucinations during the model training process. Also, employing a robust fact-checking mechanism on the output of the model before acting on its suggestions or insights can help catch and rectify any hallucinations.Remember, while LLMs can provide valuable insights and suggestions, their output should always be used as one of many inputs in your decision-making process, particularly in high-stakes environments such as financial trading and analysis.ConclusionIn the dynamic world of financial analysis and data visualization, the presence of LLM 'hallucinations' poses a challenge. Awareness, verification, and ongoing improvement strategies stand as pillars against these inaccuracies. While LLMs offer invaluable support, their outputs must be scrutinized, verified, and used as one among many tools in decision-making. As we navigate this landscape, vigilance, continuous refinement, and a critical eye will fortify our ability to harness the power of LLMs while mitigating the risks they present in high-stakes financial contexts.Author BioJames Bryant, a finance and technology expert, excels at identifying untapped opportunities and leveraging cutting-edge tools to optimize financial processes. With expertise in finance automation, risk management, investments, trading, and banking, he's known for staying ahead of trends and driving innovation in the financial industry. James has built corporate treasuries like Salesforce and transformed companies like Stanford Health Care through digital innovation. He is passionate about sharing his knowledge and empowering others to excel in finance. Outside of work, James enjoys skiing with his family in Lake Tahoe, running half marathons, and exploring new destinations and culinary experiences with his wife and daughter.Aloke Mukherjee is a seasoned technologist with over a decade of experience in business architecture, digital transformation, and solutions architecture. He excels at applying data-driven solutions to real-world problems and has proficiency in data analytics and planning. Aloke worked at EMC Corp and Genentech and currently spearheads the digital transformation of Finance Business Intelligence at Stanford Health Care. In addition to his work, Aloke is a Certified Personal Trainer and is passionate about helping his clients stay fit. Aloke also has a passion for wine and exploring new vineyards. 
Read more
  • 0
  • 0
  • 7418

article-image-deploying-llms-with-amazon-sagemaker-part-1
Joshua Arvin Lat
29 Nov 2023
13 min read
Save for later

Deploying LLMs with Amazon SageMaker - Part 1

Joshua Arvin Lat
29 Nov 2023
13 min read
Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!IntroductionHave you ever tried asking a Generative AI-powered chatbot the question: “What is the meaning of life?”. In case you have not tried that yet, here’s the response I got when I tried that myself using a custom chatbot app I built with a managed machine learning (ML) service called Amazon SageMaker.                                              Image 01 — Asking a chatbot the meaning of lifeYou would be surprised that I built this quick demo application myself in just a few hours! In this post, I will teach you how to deploy your own Large Language Models (LLMs) in a SageMaker Inference Endpoint (that is, a machine learning-powered server that responds to inputs) with just a few lines of code.                                                   Image 02 — Deploying an LLM to a SageMaker Inference EndpointWhile most tutorials available teach us how to utilize existing Application Programming Interfaces (APIs) to prepare chatbot applications, it’s best that we also know how to deploy LLMs in our own servers in order to guarantee data privacy and compliance. In addition to this, we’ll be able to manage the long-term costs of our AI-powered systems as well. One of the most powerful solutions available for these types of requirements is Amazon SageMaker which helps us focus on the work we need to do instead of worrying about cloud infrastructure management.We’ll divide the hands-on portion into the following sections:●  Section I: Preparing the SageMaker Notebook Instance●  Section II: Deploying an LLM using the SageMaker Python SDK to a SageMaker Inference Endpoint●  Section III: Enabling Data Capture with SageMaker Model Monitor (discussed in Part 2)●  Section IV: Invoking the SageMaker inference endpoint using the boto3 client (discussed in Part 2)●  Section V: Preparing a Demo UI for our chatbot application (discussed in Part 2)●  Section VI: Cleaning Up (discussed in Part 2) Without further ado, let’s begin!Section I: Preparing the SageMaker Notebook InstanceLet’s start by creating a SageMaker Notebook instance. Note that while we can also do this in SageMaker Studio, running the example in a Sagemaker Notebook Instance should do the trick. If this is your first time launching a SageMaker Notebook instance, you can think of it as your local machine with several tools pre-installed already where we can run our scripts.STEP # 01: Sign in to your AWS account and navigate to the SageMaker console by typing sagemaker in the search box similar to what we have in the following image:                                                           Image 03 — Navigating to the SageMaker consoleChoose Amazon SageMaker from the list of options available as highlighted in Image 03.STEP # 02: In the sidebar, locate and click Notebook instances under Notebook:                                 Image 04 — Locating Notebook instances in the sidebar STEP # 03: Next, locate and click the Create notebook instance button.STEP # 04: In the Create notebook instance page, you’ll be asked to input a few configuration parameters before we’re able to launch the notebook instance where we’ll be running our code:                                                          Image 05 — Creating a new SageMaker Notebook instanceSpecify a Notebook instance name (for example, llm-demo) and select a Notebook instance type. For best results, you may select a relatively powerful instance type (ml.m4.xlarge) where we will run the scripts. However, you may decide to choose a smaller instance type such as ml.t3.medium (slower but less expensive). Note that we will not be deploying our LLM inside this notebook instance as the model will be deployed in a separate inference endpoint (which will require a more powerful instance type such as an ml.g5.2xlarge).STEP # 05:Create an IAM role by choosing Create a new role from the list of options available in the IAM role dropdown (under Permissions and encryption).                                                                             Image 06 — Opening the Jupyter appThis will open the following popup window. Given that we’re just working on a demo application, the default security configuration should do the trick. Click the Create role button.Important Note: Make sure to have a more secure configuration when dealing with production (or staging) work environments.Won’t dive deep into how cloud security works in this post so feel free to look for other resources and references to further improve the current security setup. In case you are interested to learn more about cloud security, feel free to check my 3rd book “Building and Automating Penetration Testing Labs in the Cloud”. In the 7th Chapter of the book (Setting Up an IAM Privilege Escalation Lab), you’ll learn how misconfigured machine learning environments on AWS can easily be exploited with the right sequence of steps.STEP #06: Click the Create notebook instance button. Wait for about 5-10 minutes for the SageMaker Notebook instance to be ready.Important Note: Given that this will launch a resource that will run until you turn it off (or delete it), make sure to complete all the steps in the 2nd part of this post and clean up the created resources accordingly.STEP # 07:Once the instance is ready, click Open Jupyter similar to what we have in Image 07:                                                                            Image 07 — Opening the Jupyter appThis will open the Jupyter application in a browser tab. If this is your first time using this application, do not worry as detailed instructions will be provided in the succeeding steps to help you get familiar with this tool.STEP # 08:Create a new notebook by clicking New and selecting conda_python3 from the list of options available: Image 08 — Creating a new notebook using the conda_python3 kernelIn case you are wondering about what a kernel is, it is simply an “engine” or “environment” with pre-installed libraries and prerequisites that executes the code specified in the notebook cells. You’ll see this in action in a bit.STEP # 09:At this point, we should see the following interface where we can run various types of scripts and blocks of code:                                                                              Image 09 — New Jupyter notebookFeel free to rename the Jupyter Notebook before proceeding to the next step. If you have not used a Jupyter Notebook before, you may run the following line of code by typing the following in the text field and pressing SHIFT + ENTER. print('hello')This should print the output hello right below the text field where we placed our code.Section II: Deploying an LLM using the SageMaker Python SDK to a SageMaker Inference EndpointSTEP # 01: With everything ready, let’s start by installing a specific version of the SageMaker Python SDK: !pip install sagemaker==2.192.1Here, we’ll be using v2.192.1. This will help us ensure that you won’t encounter breaking changes even if you work on the hands-on solutions in this post at a later date.In case you are wondering what the SageMaker Python SDK is, it is simply a software development kit (SDK) with the set of tools and APIs to help developers interact with and utilize the different features and capabilities of Amazon SageMaker.STEP # 02: Next, let’s import and prepare a few prerequisites by running the following block of code: import sagemaker import time sagemaker_session = sagemaker.Session() region = sagemaker_session.boto_region_name role = sagemaker.get_execution_role()STEP # 03: Let’s import HuggingFaceModel and get_huggingface_llm_image_uri as well:from sagemaker.huggingface import HuggingFaceModel, get_huggingface_llm_image_uriSTEP # 04: Next, let’s define the generate_random_label() function which we’ll use later when naming our resources:from string import ascii_uppercase from random import choice def generate_random_label():    letters = ascii_uppercase      return ''.join(choice(letters) for i in range(10))This will help us avoid naming conflicts when creating and configuring our resources.STEP # 05: Use the get_huggingface_llm_image_uri function we imported in an earlier step to retrieve the container image URI for our LLM. In addition to this, let’s define the model_name we’ll use later when deploying our LLM to a SageMaker endpoint:image_uri = get_huggingface_llm_image_uri( backend="huggingface", region=region, version="1.1.0" ) model_name = "MistralLite-" + generate_random_label()STEP # 06: Before, we proceed with the actual deployment, let’s quickly inspect what we have in the image_uri variable:image_uriThis will output the following variable value:'763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-tgi-inference:2.0.1-tgi1.1.0-gpu-py39-cu118-ubuntu20.04'STEP # 07: Similarly, let’s check the variable value of model_name model_nameThis will give us the following:'MistralLite-HKGKFRXURT'Note that you’ll get a different model_name value since we’re randomly generating a portion of the model nameSTEP # 08: Let’s prepare the hub model configuration as well:hub_env = { 'HF_MODEL_ID': 'amazon/MistralLite', 'HF_TASK': 'text-generation', 'SM_NUM_GPUS': '1', "MAX_INPUT_LENGTH": '16000', "MAX_TOTAL_TOKENS": '16384', "MAX_BATCH_PREFILL_TOKENS": '16384', "MAX_BATCH_TOTAL_TOKENS":  '16384', }Here, we specify that we’ll be using the MistralLite model. If this is your first time hearing out MistralLite, it is a fine-tuned Mistral-7B-v0.1 language model. It can perform significantly better on several long context retrieve and answering tasks. For more information, feel free to check: https://huggingface.co/amazon/MistralLite.STEP # 09: Let’s initialize the HuggingFaceModel object using some of the prerequisites and variables we’ve prepared in the earlier steps:model = HuggingFaceModel(    name=model_name,    env=hub_env,    role=role,    image_uri=image_uri )STEP # 10: Now, let’s proceed with the deployment of the model using the deploy() method:predictor = model.deploy( initial_instance_count=1, instance_type="ml.g5.2xlarge", endpoint_name=model_name, )Here, we’re using an ml.g5.2xlarge for our inference endpoint.Given that this step may take about 10-15 minutes to complete, feel free to grab a cup of coffee or tea while waiting!Important Note: Given that this will launch a resource that will run until you turn it off (or delete it), make sure to complete all the steps in the 2nd part of this post and clean up the created resources accordingly.STEP # 11: Now, let’s prepare our first input data:question = "What is the meaning of life?" input_data = { "inputs": f"<|prompter|>{question}</s><|assistant|>", "parameters": {    "do_sample": False,    "max_new_tokens": 2000,    "return_full_text": False, } }STEP # 12: With the prerequisites ready, let’s have our deployed LLM process the input data we prepared in the previous step:result = predictor.predict(input_data)[0]["generated_text"] print(result)This should yield the following output:The meaning of life is a philosophical question that has been debated by thinkers and philosophers for centuries. There is no single answer that can be definitively proven, as the meaning of life is subjective and can vary greatly from person to person. ...Looks like our SageMaker Inference endpoint (where the LLM is deployed) is working just fine!ConclusionThat wraps up the first part of this post. At this point, you should have a good idea of how to deploy LLMs using Amazon SageMaker. However, there’s more in store for us in the second part as we’ll build on top of what we have already and enable data capture to help us collect and analyze the data (that is, the input requests and output responses) that pass through the inference endpoint. In addition to this, we’ll prepare a demo user interface utilizing the ML model we deployed in this post.If you’re looking for the link to the second part, here it is: Deploying LLMs with Amazon SageMaker - Part 2We are just scratching the surface as there is a long list of capabilities and features available in SageMaker. If you want to take things to the next level, feel free to read 2 of my books focusing heavily on SageMaker: “Machine Learning with Amazon SageMaker Cookbook” and “Machine Learning Engineering on AWS”.Author BioJoshua Arvin Lat is the Chief Technology Officer (CTO) of NuWorks Interactive Labs, Inc. He previously served as the CTO of 3 Australian-owned companies and also served as the Director for Software Development and Engineering for multiple e-commerce startups in the past. Years ago, he and his team won 1st place in a global cybersecurity competition with their published research paper. He is also an AWS Machine Learning Hero and he has been sharing his knowledge in several international conferences to discuss practical strategies on machine learning, engineering, security, and management. He is also the author of the books "Machine Learning with Amazon SageMaker Cookbook", "Machine Learning Engineering on AWS", and "Building and Automating Penetration Testing Labs in the Cloud". Due to his proven track record in leading digital transformation within organizations, he has been recognized as one of the prestigious Orange Boomerang: Digital Leader of the Year 2023 award winners.
Read more
  • 0
  • 0
  • 7279