Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials - LLM

81 Articles
article-image-getting-started-with-gemini-ai
Packt
07 Sep 2023
2 min read
Save for later

Getting Started with Gemini AI

Packt
07 Sep 2023
2 min read
Introduction Gemini AI is a large language model (LLM) being developed by Google DeepMind. It is still under development, but it is expected to be more powerful than ChatGPT, the current state-of-the-art LLM. Gemini AI is being built on the technology and techniques used in AlphaGo, an early AI system developed by DeepMind in 2016. This means that Gemini AI is expected to have strong capabilities in planning and problem-solving. Gemini AI is a powerful tool that has the potential to be used in a wide variety of applications. Some of the potential use cases for Gemini AI include: Chatbots: Gemini AI could be used to create more realistic and engaging chatbots. Virtual assistants: Gemini AI could be used to create virtual assistants that can help users with tasks such as scheduling appointments, making reservations, and finding information. Content generation: Gemini AI could be used to generate creative content such as articles, blog posts, and scripts. Data analysis: Gemini AI could be used to analyze large datasets and identify patterns and trends. Medical diagnosis: Gemini AI could be used to assist doctors in diagnosing diseases. Financial trading: Gemini AI could be used to make trading decisions. How Gemini AI works Gemini AI is a neural network that has been trained on a massive dataset of text and code. This dataset includes books, articles, code repositories, and other forms of text. The neural network is able to learn the patterns and relationships between words and phrases in this dataset. This allows Gemini AI to generate text, translate languages, write different kinds of creative content, and answer your questions in an informative way. How to use Gemini AI Gemini AI is not yet available to the public, but it is expected to be released in the future. When it is released, it will likely be available through a cloud-based API. This means that developers will be able to use Gemini AI in their own applications. To use Gemini AI, developers will need to first create an account and obtain an API key. Once they have an API key, they can use it to call the Gemini AI API. The API will allow them to interact with Gemini AI and use its capabilities. Here are some steps on how to install or get started with Gemini AI: Go to the Gemini AI website and create an account: Once you have created an account, you will be given an API key. Install the Gemini AI client library for your programming language. In your code, import the Gemini AI client library and initialize it with your API key. Call the Gemini AI API to generate text, translate languages, write different kinds of creative content, or answer your questions in an informative way. For more detailed instructions on how to install and use Gemini AI, please refer to the Gemini AI documentation. The future of Gemini AI Gemini AI is still under development, but it has the potential to revolutionize the way we interact with computers. In the future, Gemini AI could be used to create more realistic and engaging chatbots, virtual assistants, and other forms of AI-powered software. Gemini AI could also be used to improve our understanding of the world around us by analyzing large datasets and identifying patterns and trends. Conclusion Gemini AI is a powerful tool that has the potential to be used in a wide variety of applications. It is still under development, but it has the potential to revolutionize the way we interact with computers. In the future, Gemini AI could be used to create more realistic and engaging chatbots, virtual assistants, and other forms of AI-powered software. Gemini AI could also be used to improve our understanding of the world around us by analyzing large datasets and identifying patterns and trends.  
Read more
  • 0
  • 0
  • 6859

article-image-building-an-llm-powered-app-using-snowflake-and-streamlit
Ryan Goodman
30 Jan 2024
11 min read
Save for later

Building an LLM-powered App using Snowflake and Streamlit

Ryan Goodman
30 Jan 2024
11 min read
Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!IntroductionFor years, self-service analytics apps have enabled both information consumers (business users) and information workers (analysts) to meet their need for data assets that aid analysis and problem-solving. These data assets can include ready-made insights and analysis in the form of statistics, visual stories, or formatted data for further discovery. Historically, for an enterprise to embark on creating analytics apps, it required a specialized skillset, technology tools, and a steep learning curve to deliver value.Three significant trends have shifted how we view analytics apps today:●  No-code and low-code data acquisition, along with cloud data/warehouse platforms, have helped democratize the data platform.●  Data platforms like Snowflake are designed to bring analytics computing into a single platform where data no longer needs to be copied and moved.●  The democratization of machine learning and the widespread availability of powerful generative AI models have changed the entire user experience and expectations for information discovery and natural language exploration.The result of these trends has accelerated technology cycles and the rate of innovation in unprecedented ways. Prudent technology and business leaders are strained with more requests and fewer resources to use data to build information-focused businesses.Currently, we have AI app and analytics waves breaking at the same time with different use cases in mind but the same objective. For this article, we wanted to explore the basics of building a simple analytics app inside of Snowflake, allowing an OpenAI interface to execute code without ever accessing any of the resulting data.Modern Data Cloud and Analytics Technology ToolsLet us explore the process and benefits of building an LLM-powered application using a cloud-based data warehousing platform like Snowflake and an open-source Python library for creating web applications like Streamlit. Ref: https://www.snowflake.com/blog/building-python-data-apps-streamlit/Understanding Snowflake Data Warehousing Snowflake is a leading cloud data platform offering secure and scalable solutions for processing and storing data. The architecture of Snowflake allows easy integration with programming languages. It eventually works on data-intensive applications. To work with Snowflake, one must create a Snowflake account to set up the database for data storage.LLM Powered Inputs and TranslationEvery large language model, including GPT-4, is capable of understanding and generating human-like texts based on prompts and inputs it receives. These models are trained on vast datasets, enabling them to comprehend large and complex language patterns and generate contextually relevant responses. An incredible aspect of large language models, particularly GPT-4, is their ability to effectively translate natural language into code, including SQL and Python.Large language models are not designed for computational procedures like statistics and analytics, but with the right prompting and, most importantly, context, you can streamline many common tasks.Integration of Snowflake with Python and Streamlit SnowparkIn data analysis and machine learning (ML), Python is the most versatile programming language. Snowflake offers a Python connector that enables seamless communication between Snowflake databases and Python scripts. In this article, we are not using Snowpark.Storyboarding our AppThe difference between a good app and a great app lies in the value you create for your user. The secret to building a great app is empowering users to solve problems that would otherwise be painful or impossible due to a lack of skills. The app we are building here demonstrates how to fit technology components together.Minimum Viable Product Storyboard:●  End user: Analytics app developer●  Intent: Demonstrate core tech components●  Outcome: Have●  Value: Quickly understand a functional code example without having to researchWe will build a native Streamlit app inside of Snowflake:●  The app will feature a chat interface powered by ChatGPT.●  The chat history will be written on a Snowflake table.●  The GPT model will read the results of a simple query, interpret the results, and summarize them in plain English.Bringing Technology Components TogetherFor this article, we decided to build a simple end-to-end demonstration of how a native Snowflake app built with Python and Streamlit can utilize a chatbot interface that uses ChatGPT-4 to generate SQL code that can be executed natively in Snowflake with the context of the schema.Snowflake Integration of ChatGPT Large Language Model APITo receive responses with the help of a large language model, leverage the OpenAI Documentation and Playground. Obtain the OpenAI GPT Key, and then use the following code to interact with a large language model.-- Step 1 - Create a Secret for open ai key . CREATE OR REPLACE SECRET open_ai_api_key TYPE = GENERIC_STRING SECRET_STRING = '<OPEN_AI_KEY>'; -- Step 2 - Create a Network rule on Snowflake CREATE OR REPLACE NETWORK RULE openai_network_rule MODE = EGRESS TYPE = HOST_PORT VALUE_LIST = ('api.openai.com'); -- Step 3 Create a EXTERNAL ACCESS INTEGRATION in Snowflake CREATE OR REPLACE EXTERNAL ACCESS INTEGRATION external_access_int ALLOWED_NETWORK_RULES = (openai_network_rule) ALLOWED_AUTHENTICATION_SECRETS = (open_ai_api_key) ENABLED = true; -- Step 4 Create a UDF using openai packages . Here we are using "gpt-3.5-turbo" Model CREATE OR REPLACE FUNCTION CHATGPTv1(query varchar) RETURNS STRING LANGUAGE PYTHON RUNTIME_VERSION = 3.9 HANDLER = 'runner' EXTERNAL_ACCESS_INTEGRATIONS = (external_access_int) SECRETS = ('openai_key' = open_ai_api_key) PACKAGES = ('openai') AS $$ import _snowflake import openai def runner(QUERY):    openai.api_key = _snowflake.get_generic_secret_string('openai_key')    messages = [{"role": "user", "content": QUERY}]    model="gpt-3.5-turbo"    response = openai.ChatCompletion.create(model=model,messages=messages,temperature=0,)    return response.choices[0].message["content"] $$; -- Test your UDF SELECT CHATGPTv1('Hi')Creation of Streamlit User Experience InterfaceTo create the Streamlit user experience the following code was utilized to build a very basic functional prototype with GPT3.5 Turbo.1. Installation:pip install Streamlit2. Creation:from snowflake.snowpark.context import get_active_session st.set_page_config(layout="wide") st.title("OPEN AI IN SIS - GPT-3.5-turbo(MODEL)") st.write("##") st.write("##") # Get the current credentials session = get_active_session() if 'request_response' not in st.session_state:    st.session_state['request_response'] = {} if st.session_state['request_response']:    for itr in st.session_state['request_response'].keys():        request_col , request_col1 = st.columns(2)        response_col1 , response_col = st.columns(2)        with request_col:            st.write(f":bust_in_silhouette:  :blue[{itr}]")        st.write("##")        with response_col:            st.write(f":speech_balloon:  :red[{st.session_state['request_response'][itr][0]}]") col1 ,col2 = st.columns(2) with col1:    search_text= st.text_input("Send a message")    search_button = st.button("Send") if search_text and search_button:    search_result = session.sql(f"SELECT CHATGPTv1('{search_text}')").collect()    if search_result:        st.session_state['request_response'][search_text] = [search_result[0][0]]        st.experimental_rerun()3. Run:Streamlit run app.pyMoving from MVP to Real-World ApplicationReal-world analytics apps are designed with a narrow scope, outcome, and value in mind. Let's expand on the same technology components and formulate a real-world use case that will be more impactful to an enterprise. When evaluating real-world business cases to apply Streamlit and OpenAI, focus on use cases that deliver value frequently, to many (or important) people in your organization, and are tied to high-impact business processes.Data Tape Co-pilot Tool:●  End user: Financial Analysts, Business Analysts, Data Analysts.●  Intent: Deliver a data tape with the ability to constrain data to business needs and provide a basic summary.●  Outcome: End users can download the data tape and receive a plain English summary of key stats (record count, distinct key, constraints in the query contained in the WHERE clause).●  Value: Provide natural language access to a single, widely used data tape with a clear, plain English explanation of the dataset.Streamlit Analytics Improves User Adoption and Success with Snowflake With a better understanding of Streamlit as a driver for the adoption of Snowflake and the increasing adoption of data assets, let's dig deeper into Streamlit as the conduit for adoption. While Snowflake may be a known entity within your enterprise, few business-facing professionals will ever know they are interfacing with Snowflake, and that is okay. Without more technology tools and platforms, Streamlit opens the doors to Snowflake but most importantly eliminates other tools, platforms, and an additional layer of services to manage. Instead, you can leverage the skills already on hand within most data and analytics teams. Here are some additional features that make Streamlit quite compelling:●  Simplicity and Ease of Use: Streamlit provides an intuitive API that allows developers to create interactive UI elements with minimal code. Its straightforward syntax enables both beginners and experienced developers to quickly prototype and deploy applications without a steep learning curve.●  Rapid Prototyping: Streamlit excels at rapid prototyping, enabling developers to iterate quickly on their ideas. With its live reloading feature, developers can see changes in real time as they modify the code. This development speed is crucial for experimenting with different UI layouts and functionalities.●  Data Exploration and Visualization: Streamlit integrates seamlessly with popular data science libraries . Some of these are Pandas, Matplotlib, and Plotly. This integration allows developers to create dynamic and interactive charts, graphs, and dashboards with minimal effort. Data scientists and analysts can effectively showcase their findings, making it an excellent choice for data exploration and visualization tasks.●  Customization and Theming: While Streamlit provides a simple interface, it also offers customization options for developers who want to create visually appealing applications. Developers can customize the appearance of their apps, including layout, colors, and themes, to match their brand or specific design preferences.●  Seamless Integration with Machine Learning and AI Models: Streamlit makes integrating machine learning models, natural language processing tools, and other AI technologies into applications easy. Developers can create interactive interfaces for AI-powered applications, enabling users to interact with complex algorithms and models without understanding the underlying complexities.●  Sharing and Deployment: Streamlit apps can be easily shared and deployed on various platforms. Whether it's sharing within a team, showcasing a prototype to stakeholders, or deploying a full-fledged application for public use, Streamlit simplifies the process. Streamlit sharing, Streamlit's deployment platform, allows developers to deploy apps with minimal configuration, making them accessible to a broader audience.●  Active Community and Documentation: Streamlit has a vibrant and active community of developers. The availability of numerous examples, tutorials, and community-contributed components enhances the development experience. Streamlit's comprehensive documentation provides detailed guidance on various aspects of building interactive applications, making it easier for developers to find solutions to their queries.●  Flexibility and Extensibility: While Streamlit is easy for beginners, it also offers flexibility and extensibility for advanced users. Developers can create custom components and integrate JavaScript functionality when needed, allowing them to extend Streamlit's capabilities based on their requirements.ConclusionThe integration of Snowflake and Streamlit offers a powerful combination for building analytics and data delivery apps. A single, blended data warehousing solution with intuitive application development can democratize data access, enabling users across an organization to transform complex datasets into palatable, prepared information assets. Though the Snowflake modern data cloud app store is in its infancy, you can jump in today and seize a great opportunity to build powerful data apps. While this article explained a simple GPT API interface, the recent introduction of GPT Assistants API expands the possibilities for even more intelligent, contextual agents running securely running right where you work. I look forward to expanding on this basic prototype to a more intelligent co-pilot experience soon.Author BioRyan Goodman has dedicated 20 years to the business of data and analytics, working as a practitioner, executive, and entrepreneur. He recently founded DataTools Pro after 4 years at Reliant Funding, where he served as the VP of Analytics and BI. There, he implemented a modern data stack, utilized data sciences, integrated cloud analytics, and established a governance structure. Drawing from his experiences as a customer, Ryan is now collaborating with his team to develop rapid deployment industry solutions. These solutions utilize machine learning, LLMs, and modern data platforms to significantly reduce the time to value for data and analytics teams.
Read more
  • 0
  • 0
  • 6564

article-image-efficient-llm-querying-with-lmql
Alan Bernardo Palacio
12 Sep 2023
14 min read
Save for later

Efficient LLM Querying with LMQL

Alan Bernardo Palacio
12 Sep 2023
14 min read
IntroductionIn the world of natural language processing, Large Language Models (LLMs) have proven to be highly successful at a variety of language-based tasks, such as machine translation, text summarization, question answering, reasoning, and code generation. LLMs like ChatGPT, GPT-4, and others have demonstrated outstanding performance by predicting the next token in a sequence based on input prompts. Users interact with these models by providing language instructions or examples to perform various downstream tasks. However, to achieve optimal results or adapt LLMs for specific tasks, complex and task-specific programs must be implemented, often requiring ad-hoc interactions and deep knowledge of the model's internals.In this article, we discuss LMQL, a framework for Language Model Programming (LMP), that allows users to specify complex interactions, control flow, and constraints without needing deep knowledge of the LLM's internals using a declarative programming language similar to SQL. LMQL supports high-level, logical constraints and users can express a wide range of prompting techniques concisely, reducing the need for ad-hoc interactions and manual work to steer model generation, avoiding costly re-querying, and guiding the text generation process according to their specific criteria. Let’s start.Overview of Large Language ModelsLanguage models (LMs) operate on sequences of tokens, where tokens are discrete elements that represent words or sub-words in a text. The process involves using a tokenizer to map input words to tokens, and then a language model predicts the probabilities of possible next tokens based on the input sequence. Various decoding methods are used in the LMs to output the right sequence of tokens from the language model's predictions out of which we can name:Decoding Methods:Greedy decoding: Select the token with the highest probability at each step.Sampling: Randomly sampling tokens based on the predicted probabilities.Full decoding: Enumerating all possible sequences and selecting the one with the highest probability (computationally expensive).Beam search: Maintaining a set of candidate sequences and refining them by predicting the next token.Masked Decoding: In some cases, certain tokens can be ruled out based on a mask that indicates which tokens are viable. Decoding is then performed on the remaining set of tokens.Few-Shot Prompting: LMs can be trained on broad text-sequence prediction datasets and then provided with context in the form of examples for specific tasks. This approach allows LMs to perform downstream tasks without task-specific training.Multi-Part Prompting: LMs are used not only for simple prompt completion but also as reasoning engines integrated into larger programs. Various LM programming schemes explore compositional reasoning, such as iterated decompositions, meta prompting, tool use, and composition of multiple prompts.It is also important to name that for beam searching and sampling there is a parameter named temperature which we can use to control the diversity of the output.These techniques enable LMs to be versatile and perform a wide range of tasks without requiring task-specific training, making them powerful multi-task reasoners.Asking the Right QuestionsWhile LLMs can be prompted with examples or instructions, using them effectively and adapting to new models often demands a deep understanding of their internal workings, along with the use of vendor-specific libraries and implementations. Constrained decoding to limit text generation to legal words or phrases can be challenging. Many advanced prompting methods require complex interactions and control flows between the LLM and the user, leading to manual work and restricting the generality of implementations. Additionally, generating complete sequences from LLMs may require multiple calls and become computationally expensive, resulting in high usage costs per query in pay-to-use APIs. Generally, the challenges that can associated with creating proper promts for LLMs are:Interaction Challenge: One challenge in LM interaction is the need for multiple manual interactions during the decoding process. For example, in meta prompting, where the language model is asked to expand the prompt and then provide an answer, the current approach requires inputting the prompt partially, invoking the LM, extracting information, and manually completing the sequence. This manual process may involve human intervention or several API calls, making joint optimization of template parameters difficult and limiting automated optimization possibilities.Constraints & Token Representation: Another issue arises when considering completions generated by LMs. Sometimes, LMs may produce long, ongoing sequences of text that do not adhere to desired constraints or output formats. Users often have specific constraints for the generated text, which may be violated by the LM. Expressing these constraints in terms of human-understandable concepts and logic is challenging, and existing methods require considerable manual implementation effort and model-level understanding of decoding procedures, tokenization, and vocabulary.Efficiency and Cost Challenge: Efficiency and performance remain significant challenges in LM usage. While efforts have been made to improve the inference step in modern LMs, they still demand high-end GPUs for reasonable performance. This makes practical usage costly, particularly when relying on hosted models running in the cloud with paid APIs. The computational and financial expenses associated with frequent LM querying can become prohibitive.Addressing these challenges, Language Model Programming and constraints offer new optimization opportunities. By defining behavior and limiting the search space, the number of LM invocations can be reduced. In this context, the cost of validation, parsing, and mask generation becomes negligible compared to the significant cost of a single LM call.So the question arises, how can we overcome the challenges of implementing complex interactions and constraints with LLMs while reducing computational costs and retaining or improving accuracy on downstream tasks?Introducing LMQLTo address these challenges and enhance language model programming, a team of researchers has introduced LMQL (Language Model Query Language). LMQL is an open-source programming language and platform for LLM interaction that combines prompts, constraints, and scripting. It is designed to elevate the capabilities of LLMs like ChatGPT, GPT-4, and any future models, offering a declarative, SQL-like approach based on Python.LMQL enables Language Model Programming (LMP), a novel paradigm that extends traditional natural language prompting by allowing lightweight scripting and output constraining. This separation of front-end and back-end interaction allows users to specify complex interactions, control flow, and constraints without needing deep knowledge of the LLM's internals. This approach abstracts away tokenization, implementation, and architecture details, making it more portable and easier to use across different LLMs.With LMQL, users can express a wide range of prompting techniques concisely, reducing the need for ad-hoc interactions and manual work. The language supports high-level, logical constraints, enabling users to steer model generation and avoid costly re-querying and validation. By guiding the text generation process according to specific criteria, users can achieve the desired output with fewer iterations and improved efficiency.Moreover, LMQL leverages evaluation semantics to automatically generate token masks for LM decoding based on user-specified constraints. This optimization reduces inference cost by up to 80%, resulting in significant latency reduction and lower computational expenses, particularly beneficial for pay-to-use APIs.LMQL ddresses certain challenges in LM interaction and usage which are namely.Overcoming Manual Interaction: LMQL simplifies the prompt and eliminates the need for manual interaction during the decoding process. It achieves this by allowing the use of variables, represented within square brackets, which store the answers obtained from the language model. These variables can be referenced later in the query, avoiding the need for manual extraction and input. By employing LMQL syntax, the interaction process becomes more automated and efficient.Constraints on Variable Parts: To address issues related to long and irrelevant outputs, LMQL introduces constraints on the variable parts of LM interaction. These constraints allow users to specify word and phrase limitations for the generated text. LMQL ensures that the decoded tokens for variables meet these constraints during the decoding process. This provides more control over the generated output and ensures that it adheres to user-defined restrictions.Generalization of Multi-Part Prompting: Language Model Programming through LMQL generalizes various multi-part prompting approaches discussed earlier. It streamlines the process of trying different values for variables by automating the selection process. Users can set constraints on variables, which are then applied to multiple inputs without any human intervention. Once developed and tested, an LMQL query can be easily applied to different inputs in an unsupervised manner, eliminating the need for manual trial and error.Efficient Execution: LMQL offers efficiency benefits over manual interaction. The constraints and scripting capabilities in LMQL are applied eagerly during decoding, reducing the number of times the LM needs to be invoked. This optimized approach results in notable time and cost savings, especially when using hosted models in cloud environments.The LMQL syntax involves components such as the decoder, the actual query, the model to query, and the constraints. The decoder specifies the decoding procedure, which can include argmax, sample, or beam search. LMQL allows for constraints on the generated text using Python syntax, making it more user-friendly and easily understandable. Additionally, the distribution instruction allows users to augment the returned result with probability distributions, which is useful for tasks like sentiment analysis.Using LMQL with PythonLMQL can be utilized in various ways - as a standalone language, in the Playground, or even as a Python library being the latter what we will demonstrate now. Integrating LMQL into Python projects allows users to streamline their code and incorporate LMQL queries seamlessly. Let's explore how to use LMQL as a Python library and understand some examples.To begin, make sure you have LMQL and LangChain installed by running the following command:!pip install lmql==0.0.6.6 langchain==0.0.225You can then define and execute LMQL queries within Python using a simple approach. Decorate a Python function with the lmql.query decorator, providing the query code as a multi-line string. The decorated function will automatically be compiled into an LMQL query. The return value of the decorated function will be the result of the LMQL query.Here's an example code snippet demonstrating this:import lmql import aiohttp import os os.environ['OPENAI_API_KEY'] = '<your-openai-key>' @lmql.query async def hello():    '''lmql    argmax        "Hello[WHO]"    from        "openai/text-ada-001"    where        len(TOKENS(WHO)) < 10    ''' print(await hello())LMQL provides a fully asynchronous API that enables running multiple LMQL queries in parallel. By declaring functions as async with @lmql.query, you can use await to execute the queries concurrently.The code below demonstrates how to look up information from Wikipedia and incorporate it into an LMQL prompt dynamically:async def look_up(term):    # Looks up term on Wikipedia    url = f"<https://en.wikipedia.org/w/api.php?format=json&action=query&prop=extracts&exintro&explaintext&redirects=1&titles={term}&origin=*>"    async with aiohttp.ClientSession() as session:        async with session.get(url) as response:            # Get the first sentence on the first page            page = (await response.json())["query"]["pages"]            return list(page.values())[0]["extract"].split(".")[0] @lmql.query async def greet(term):    '''    argmax        """Greet {term} ({await look_up(term)}):        Hello[WHO]        """    from        "openai/text-davinci-003"    where        STOPS_AT(WHO, "\\n")    ''' print((await greet("Earth"))[0].prompt)As an alternative to @lmql.query you can use lmql.query(...) as a function that compiles a provided string of LMQL code into a Python function.q = lmql.query('argmax "Hello[WHO]" from "openai/text-ada-001" where len(TOKENS(WHO)) < 10') await q()LMQL queries can also be easily integrated into langchain's Chain components. This allows for sequential prompting using multiple queries.pythonCopy code from langchain import LLMChain, PromptTemplate from langchain.chat_models import ChatOpenAI from langchain.prompts.chat import (ChatPromptTemplate, HumanMessagePromptTemplate) from langchain.llms import OpenAI # Setup the LM to be used by langchain llm = OpenAI(temperature=0.9) human_message_prompt = HumanMessagePromptTemplate(    prompt=PromptTemplate(        template="What is a good name for a company that makes {product}?",        input_variables=["product"],    ) ) chat_prompt_template = ChatPromptTemplate.from_messages([human_message_prompt]) chat = ChatOpenAI(temperature=0.9) chain = LLMChain(llm=chat, prompt=chat_prompt_template) # Run the chain chain.run("colorful socks")Lastly, by treating LMQL queries as Python functions, you can easily build pipelines by chaining functions together. Furthermore, the guaranteed output format of LMQL queries ensures ease of processing the returned values using data processing libraries like Pandas.Here's an example of processing the output of an LMQL query with Pandas:pythonCopy code import pandas as pd @lmql.query async def generate_dogs(n: int):    '''lmql    sample(n=n)        """Generate a dog with the following characteristics:        Name:[NAME]        Age: [AGE]        Breed:[BREED]        Quirky Move:[MOVE]        """    from        "openai/text-davinci-003"    where        STOPS_BEFORE(NAME, "\\n") and STOPS_BEFORE(BREED, "\\n") and        STOPS_BEFORE(MOVE, "\\n") and INT(AGE) and len(AGE) < 3    ''' result = await generate_dogs(8) df = pd.DataFrame([r.variables for r in result]) dfBy employing LMQL as a Python library, users can make their code more efficient and structured, allowing for easier integration with other Python libraries and tools.LMQL can be used in various ways - as a standalone language, in the Playground, or even as a Python library. When integrated into Python projects, LMQL queries can be executed seamlessly. Below, we provide a brief overview of using LMQL as a Python library.ConclusionLMQL introduces an efficient and powerful approach to interact with language models, revolutionizing language model programming. By combining prompts, constraints, and scripting, LMQL offers a user-friendly interface for working with large language models, significantly improving efficiency and accuracy across diverse tasks. Its capabilities allow developers to leverage the full potential of language models without the burden of complex implementations, making language model interaction more accessible and cost-effective.With LMQL, users can overcome challenges in LM interaction, including manual interactions, constraints on variable parts, and generalization of multi-part prompting. By automating the selection process and eager application of constraints during decoding, LMQL reduces the number of LM invocations, resulting in substantial time and cost savings. Moreover, LMQL's declarative, SQL-like approach simplifies the development process and abstracts away tokenization and implementation details, making it more portable and user-friendly.In conclusion, LMQL represents a promising advancement in the realm of large language models and language model programming. Its efficiency, flexibility, and ease of use open up new possibilities for creating complex interactions and steering model generation without deep knowledge of the model's internals. By embracing LMQL, developers can make the most of language models, unleashing their potential across a wide range of language-based tasks with heightened efficiency and reduced computational costs.Author BioAlan Bernardo Palacio is a data scientist and an engineer with vast experience in different engineering fields. His focus has been the development and application of state-of-the-art data products and algorithms in several industries. He has worked for companies such as Ernst and Young, and Globant, and now holds a data engineer position at Ebiquity Media helping the company to create a scalable data pipeline. Alan graduated with a Mechanical Engineering degree from the National University of Tucuman in 2015, participated as the founder of startups, and later on earned a Master's degree from the faculty of Mathematics at the Autonomous University of Barcelona in 2017. Originally from Argentina, he now works and resides in the Netherlands.LinkedIn
Read more
  • 0
  • 0
  • 5786

article-image-using-langchain-for-large-language-model-powered-applications
Avratanu Biswas
15 Jun 2023
5 min read
Save for later

Using LangChain for Large Language Model — Powered Applications

Avratanu Biswas
15 Jun 2023
5 min read
This article is the second part of a series of articles, please refer to Part 2 for learning how to Get to grips with LangChain framework and how to utilize it for building LLM-powered AppsIntroductionLangChain is a powerful and open-source Python library specifically designed to enhance the usability, accessibility, and versatility of Large Language Models (LLMs) such as GPT-3 (Generative Pre-trained Transformer 3), BERT(Bidirectional Encoder Representations from Transformers), BLOOM (BigScience Large Open-science Open-access Multilingual Language Model). It provides developers with a comprehensive set of tools to seamlessly combine multiple prompts, creating a harmonious orchestra for working with LLMs effortlessly. The project was initiated by Harrison Chase, with the first commit made in late October 2022. In just a few weeks, LangChain gained immense popularity within the open-source community. Image 1: The popularity of the LangChain Python libraryLangChain for LLMsTo fully grasp the fundamentals of LangChain and utilize it effectively — understanding the fundamentals of LLMs is essential. In simple terms, LLMs are sophisticated language models or AI systems that have been extensively trained on massive amounts of text data to comprehend and generate human-like language. Albeit their powerful capabilities, LLMs are generic in nature i.e. lacking domain-specific knowledge or expertise. For instance, when addressing queries in fields like medicine or law, while an LLM can provide general insights, it, however, may struggle to offer in-depth or nuanced responses that require specialized expertise. Alongside such limitations, LLMs are susceptible to biases and inaccuracies present in training data which can yield contextually plausible, yet incorrect outputs. This is where LangChain shines — serving as an open-source library that leverages the power of LLMs and mitigates their drawbacks by providing abstractions and a diverse range of modules, akin to Lego blocks, thus facilitating intuitive integration with other tools and knowledge bases.In brief, LangChain presents a useful approach for handling text data, wherein the initial step involves preprocessing of the large corpus by segmenting it into smaller chunks or summaries. These chunks are then transformed into vector representations, enabling efficient comparisons and retrieval of similar chunks when questions are posed. This approach of preprocessing, real-time data collection, and interaction with the LLM is not only applicable to the specific context but can also be effectively utilized in other scenarios like code and semantic search.Image 2 - Typical workflow of Langchain ( Image created by Author)A typical workflow of LangChain involves several steps that enable efficient interaction between the user, the preprocessed text corpus, and the LLM. Notably, the strengths of LangChain lie in its provision of an abstraction layer, streamlining the intricate process of composing and integrating these text components, thereby enhancing overall efficiency and effectiveness.Key Attributes offered by LangChainThe core concept behind LangChain is its ability to connect a “Chain of thoughts” around LLMs, as evident from its name. However, LangChain is not limited to just a few LLMs —  it provides a wide range of components that work together as building blocks for advanced use cases involving LLMs. Now, let’s delve into the various components that the LangChain library offers, making our work with LLMs easier and more efficient.Image 3:  LangChain features at a glance. (Image created by Author)Prompts and Prompt Templates: Prompts refer to the inputs or queries we send to LLMs. As we have experienced with ChatGPT, the quality of the response depends heavily on the prompt. LangChain provides several functionalities to simplify the construction and handling of prompts. A prompt template consists of multiple parts, including instructions, content, and queries.Models: While LangChain itself does not provide LLMs, it leverages various Language Models (such as GPT3 and BLOOM, discussed earlier), Chat Models (like get-3.5-turbo), and Text Embedding Models (offered by CohereAI, HuggingFace, OpenAI).Chains: Chains are an end-to-end wrapper around multiple individual components, playing a major role in LangChain. The two most common types of chains are LLM chains and vector index chains.Memory: By default, Chains in LangChain are stateless, treating each incoming query or input independently without retaining context (i.e., lacking memory). To overcome this limitation, LangChain assists in both short-term memory (using previous conversational messages or summarised messages) and long-term memory (managing the retrieval and updating of information between conversations).Indexes: Index modules provide various document loaders to connect with different data resources and utility functions to seamlessly integrate with external vector databases like Pinecone, ChromoDB, and Weaviate, enabling smooth handling of large arrays of vector embeddings. The types of vector indexes include Document Loaders, Text Splitters, Retriever, and Vectorstore.Agents: While the sequence of chains is often deterministic, in certain applications, the sequence of calls may not be deterministic, with the next step depending on the user input and previous responses. Agents utilize LLMs to determine the appropriate actions and their orders. Agents perform these tasks using a suite of tools.Limitations on LangChain usageAbstraction challenge for debugging: The comprehensive abstraction provided by LangChain poses challenges for debugging as it becomes difficult to comprehend the underlying processes.Higher token consumption due to prompt coupling: Coupling a chain of prompts when executing multiple chains for a specific task often leads to higher token consumption, making it less cost-effective. Increased latency and slower performance: The latency period experienced when using LangChain in applications with agents or tools is higher, resulting in slower performance.Overall, LangChain provides a broad spectrum of features and modules that greatly enhance our interaction with LLMs. In the subsequent sections, we will explore the practical usage of LangChain and demonstrate how to build simple demo web applications using its capabilities.Referenceshttps://docs.langchain.com/docs/https://github.com/hwchase17/langchain https://medium.com/databutton/getting-started-with-langchain-a-powerful-tool-for-working-with-large-language-models-286419ba0842https://medium.com/@avra42/how-to-build-a-personalized-pdf-chat-bot-with-conversational-memory-965280c160f8AuthorAvratanu Biswas, Ph.D. Student ( Biophysics ), Educator, and Content Creator, ( Data Science, ML & AI ).Twitter    YouTube    Medium     GitHub
Read more
  • 0
  • 0
  • 4755

article-image-building-trust-in-ai-the-role-of-rag-in-data-security-and-transparency
Keith Bourne
13 Dec 2024
15 min read
Save for later

Building Trust in AI: The Role of RAG in Data Security and Transparency

Keith Bourne
13 Dec 2024
15 min read
This article is an excerpt from the book, "Unlocking Data with Generative AI and RAG", by Keith Bourne. Master Retrieval-Augmented Generation (RAG), the most popular generative AI tool, to unlock the full potential of your data. This book enables you to develop highly sought-after skills as corporate investment in generative AI soars.IntroductionAs the adoption of Retrieval-Augmented Generation (RAG) continues to grow, its potential to address key security challenges in AI-driven applications is becoming evident. Far from merely introducing risks, RAG offers a robust framework to enhance data protection, ensure accuracy, and maintain transparency in content generation. This article delves into the multifaceted security benefits of RAG, while also addressing the unique challenges it poses and strategies to mitigate them.How RAG can be leveraged as a security solutionLet’s start with the most positive security aspect of RAG. RAG can actually be considered a solution to mitigate security concerns, rather than cause them. If done right, you can limit data access via user, ensure more reliable responses, and provide more transparency of sources.Limiting dataRAG applications may be a relatively new concept, but you can still apply the same authentication and database-based access approaches you can with web and similar types of applications. This provides the same level of security you can apply in these other types of applications. By implementing userbased access controls, you can restrict the data that each user or user group can retrieve through the RAG system. This ensures that sensitive information is only accessible to authorized individuals. Additionally, by leveraging secure database connections and encryption techniques, you can safeguard the data at rest and in transit, preventing unauthorized access or data breaches.Ensuring the reliability of generated contentOne of the key benefits of RAG is its ability to mitigate inaccuracies in generated content. By allowing applications to retrieve proprietary data at the point of generation, the risk of producing misleading or incorrect responses is substantially reduced. Feeding the most current data available through your RAG system helps to mitigate inaccuracies that might otherwise occur.With RAG, you have control over the data sources used for retrieval. By carefully curating and maintaining high-quality, up-to-date datasets, you can ensure that the information used to generate responses is accurate and reliable. This is particularly important in domains where precision and correctness are critical, such as healthcare, finance, or legal applications.Maintaining transparencyRAG makes it easier to provide transparency in the generated content. By incorporating data such as citations and references to the retrieved data sources, you can increase the credibility and trustworthiness of the generated responses.When a RAG system generates a response, it can include links or references to the specific data points or documents used in the generation process. This allows users to verify the information and trace it back to its original sources. By providing this level of transparency, you can build trust with your users and demonstrate the reliability of the generated content.Transparency in RAG can also help with accountability and auditing. If there are any concerns or disputes regarding the generated content, having clear citations and references makes it easier to investigate and resolve any issues. This transparency also facilitates compliance with regulatory requirements or industry standards that may require traceability of information.That covers many of the security-related benefits you can achieve with RAG. However, there are some security challenges associated with RAG as well. Let’s discuss these challenges next.RAG security challengesRAG applications face unique security challenges due to their reliance on large language models (LLMs) and external data sources. Let’s start with the black box challenge, highlighting the relative difficulty in understanding how an LLM determines its response.LLMs as black boxesWhen something is in a dark, black box with the lid closed, you cannot see what is going on in there! That is the idea behind the black box when discussing LLMs, meaning there is a lack of transparency and interpretability in how these complex AI models process input and generate output. The most popular LLMs are also some of the largest, meaning they can have more than 100 billion parameters. The intricate interconnections and weights of these parameters make it difficult to understand how the model arrives at a particular output.While the black box aspects of LLMs do not directly create a security problem, it does make it more difficult to identify solutions to problems when they occur. This makes it difficult to trust LLM outputs, which is a critical factor in most of the applications for LLMs, including RAG applications. This lack of transparency makes it more difficult to debug issues you might have in building an RAG application, which increases the risk of having more security issues.There is a lot of research and effort in the academic field to build models that are more transparent and interpretable, called explainable AI. Explainable AI aims at making the operations of A I systems transparent and understandable. It can involve tools, frameworks, and anything else that, when applied to RAG, helps us understand how the language models that we use produce the content they are generating. This is a big movement in the field, but this technology may not be immediately available as you read this. It will hopefully play a larger role in the future to help mitigate black box risk, but right now, none of the most popular LLMs are using explainable models. So, in the meantime, we will talk about other ways to address this issue.You can use human-in-the-loop, where you involve humans at different stages of the process to provide an added line of defense against unexpected outputs. This can often help to reduce the impact of the black box aspect of LLMs. If your response time is not as critical, you can also use an additional LLM to perform a review of the response before it is returned to the user, looking for issues. We will review how to add a second LLM call in code lab 5.3, but with a focus on preventing prompt attacks. But this concept is similar, in that you can add additional LLMs to do a number of extra tasks and improve the security of your application.Black box isn’t the only security issue you face when using RAG applications though; another very important topic is privacy protection.Privacy concerns and protecting user dataPersonally identifiable information (PII) is a key topic in the generative AI space, with governments a round the world trying to determine the best path to balance user privacy with the data-hungry needs of these LLMs. As this gets worked out, it is important to pay attention to the laws and regulations that are taking shape where your company is doing business and make sure all of the technologies you are integrating into your RAG applications adhere. Many companies, such as Google and Microsoft , are taking these efforts into their own hands, establishing their own standards of protection for their user data and emphasizing them in training literature for their platforms.At the corporate level, there is another challenge related to PII and sensitive information. As we have said many times, the nature of the RAG application is to give it access to the company data and combine that with the power of the LLM. For example, for financial institutions, RAG represents a way to give their customers unprecedented access to their own data in ways that allow them to speak naturally with technologies such as chatbots and get near-instant access to hard-to-find answers buried deep in their customer data.In many ways, this can be a huge benefit if implemented properly. But given that this is a security discussion, you may already see where I am going with this. We are giving unprecedented access to customer data using a technology that has artificial intelligence, and as we said previously in the black box discussion, we don’t completely understand how it works! If not implemented properly, this could be a recipe for disaster with massive negative repercussions for companies that get it wrong. Of course, it could be argued that the databases that contain the data are also a potential security risk. Having the data anywhere is a risk! But without taking on this risk, we also cannot provide the significant benefits they represent.As with other IT applications that contain sensitive data, you can forge forward, but you need to have a healthy fear of what can happen to data and proactively take measures to protect that data. The more you understand how RAG works, the better job you can do in preventing a potentially disastrous data leak. These steps can help you protect your company as well as the people who trusted your company with their data.This section was about protecting data that exists. However, a new risk that has risen with LLMs has been the generation of data that isn’t real, called hallucinations. Let’s discuss how this presents a new risk not common in the IT world.HallucinationsWe have discussed this in previous chapters, but LLMs can, at times, generate responses that sound coherent and factual but can be very wrong. These are called hallucinations and there have been many shocking examples provided in the news, especially in late 2022 and 2023, when LLMs became everyday tools for many users.Some are just funny with little consequence other than a good laugh, such as when ChatGPT was asked by a writer for The Economist, “When was the Golden Gate Bridge transported for the second time across Egypt?” ChatGPT responded, “The Golden Gate Bridge was transported for the second time across Egypt in October of 2016” (https://www.economist.com/by-invitation/2022/09/02/artificialneural-networks-today-are-not-conscious-according-to-douglashofstadter).Other hallucinations are more nefarious, such as when a New York lawyer used ChatGPT for legal research in a client’s personal injury case against Avianca Airlines, where he submitted six cases that had been completely made up by the chatbot, leading to court sanctions (https://www. courthousenews.com/sanctions-ordered-for-lawyers-who-relied-onchatgpt-artificial-intelligence-to-prepare-court-brief/). Even worse, generative AI has been known to give biased, racist, and bigoted perspectives, particularly when prompted in a manipulative way.When combined with the black box nature of these LLMs, where we are not always certain how and why a response is generated, this can be a genuine issue for companies wanting to use these LLMs in their RAG applications.From what we know though, hallucinations are primarily a result of the probabilistic nature of LLMs. For all responses that an LLM generates, it typically uses a probability distribution to determine what token it is going to provide next. In situations where it has a strong knowledge base of a certain subject, these probabilities for the next word/token can be 99% or higher. But in situations where the knowledge base is not as strong, the highest probability could be low, such as 20% or even lower. In these cases, it is still the highest probability and, therefore, that is the token that has the highest probability to be selected. The LLM has been trained on stringing tokens together in a very natural language way while using this probabilistic approach to select which tokens to display. As it strings together words with low probability, it forms sentences, and then paragraphs that sound natural and factual but are not based on high probability data. Ultimately, this results in a response that sounds very plausible but is, in fact, based on very loose facts that are incorrect.For a company, this poses a risk that goes beyond the embarrassment of your chatbot saying something wrong. What is said wrong could ruin your relationship(s) with your customer(s), or it could lead to the LLM offering your customer something that you did not intend to offer, or worse, cannot afford to offer. For example, when Microsoft released a chatbot named Tay on Twitter in 2016 with the intention of learning from interactions with Twitter users, users manipulated this spongy personality trait to get it to say numerous racist and bigoted remarks. This reflected poorly on Microsoft, which was promoting its expertise in the AI area with Tay, causing significant damage to its reputation at the time (https://www.theguardian.com/technology/2016/mar/26/microsoftdeeply-sorry-for-offensive-tweets-by-ai-chatbot).Hallucinations, threats related to black box aspects, and protecting user data can all be addressed through red teaming.ConclusionRAG represents a promising avenue for enhancing security in AI applications, offering tools to limit data access, ensure reliable outputs, and promote transparency. However, challenges such as the black box nature of LLMs, privacy concerns, and the risk of hallucinations demand proactive measures. By employing strategies like user-based access controls, explainable AI, and red teaming, organizations can harness the advantages of RAG while mitigating risks. As the technology evolves, a thoughtful approach to its implementation will be crucial for maintaining trust, compliance, and the integrity of data-driven solutions.Author BioKeith Bourne is a senior Generative AI data scientist at Johnson & Johnson. He has over a decade of experience in machine learning and AI working across diverse projects in companies that range in size from start-ups to Fortune 500 companies. With an MBA from Babson College and a master’s in applied data science from the University of Michigan, he has developed several sophisticated modular Generative AI platforms from the ground up, using numerous advanced techniques, including RAG, AI agents, and foundational model fine-tuning. Keith seeks to share his knowledge with a broader audience, aiming to demystify the complexities of RAG for organizations looking to leverage this promising technology.
Read more
  • 0
  • 0
  • 4744

article-image-set-up-and-run-auto-gpt-with-docker
Rohan Chikorde
04 Jun 2023
8 min read
Save for later

Set Up and Run Auto-GPT with Docker

Rohan Chikorde
04 Jun 2023
8 min read
Are you looking to get your hands dirty with Auto-GPT? Look no further! In this article, we'll guide you through the straightforward installation process, enabling you to effortlessly set up Auto-GPT and unlock its powerful capabilities. Say goodbye to complex setups and hello to enhanced language generation in just a few simple steps. To use Auto-GPT, users need to have Python installed on their computer, as well as an OpenAI API key. This key allows Auto-GPT to access the GPT-4 and GPT-3.5 APIs, as well as other resources such as internet search engines and popular websites. Once it is configured, users can interact with Auto-GPT using natural language commands, and the AI agent will automatically perform the requested task. We will show practically how to set up and run Auto-GPT using Docker. We will also be showing steps to other popular methods towards the end. Benefits of using Docker for running Auto-GPT  Docker is a containerization technology that allows developers to create, deploy, and run applications in a consistent and isolated environment. It enables the packaging of an application and all its dependencies into a single container, which can be easily distributed and run on any machine that has Docker installed. Using Docker to run Auto-GPT provides several benefits:It allows you to run Auto-GPT in an isolated and reproducible environment, which ensures that the dependencies and configurations required to run Auto-GPT are consistent across different machines. This can be especially useful when collaborating on a project or when deploying Auto-GPT to a production environment. Docker provides a secure sandboxed environment, which can help prevent any potential harm to your computer from continuous mode malfunctions or accidental damage from commands.  Docker simplifies the installation and configuration process of Auto-GPT by packaging it in a container that includes all the necessary dependencies and libraries. This means you don't have to manually install and configure these dependencies, which can be time-consuming and error prone. Overall, using Docker to run Auto-GPT provides a convenient and secure solution for developing and deploying Auto-GPT in a consistent and reproducible manner.Software Requirements Docker (recommended)  Python 3.10 or later  VSCode + devcontainer Getting an API key  Get your OpenAI API key from: https://platform.openai.com/account/api-keys   Fig 1. Creating API keySetting up Auto-GPT with DockerHere first we will showcase step by step by guide to set up Auto-GPT using docker.1.     Make sure you have Python and Docker are installed on your system and its daemon is running, see requirements Fig 2. Command Prompt  2.     Open CMD and Pull the latest image from Docker Hub using following command:docker pull significantgravitas/auto-gpt Fig 3. Pulling image from dockerhub Please note if docker daemon is not running it will throw an error. Fig 4. Docker Image Once pulled using above command, you can find the significantgravitas/auto-gpt image on your docker. 3.     Create a folder for Auto-GPT4.     In the folder, create a file named docker-compose.yml with the following contents:version: "3.9"services:  auto-gpt:    image: significantgravitas/auto-gpt    depends_on:      - redis    env_file:      - .env    environment:      MEMORY_BACKEND: ${MEMORY_BACKEND:-redis}      REDIS_HOST: ${REDIS_HOST:-redis}    profiles: ["exclude-from-up"]    volumes:      - ./auto_gpt_workspace:/app/auto_gpt_workspace      - ./data:/app/data      ## allow auto-gpt to write logs to disk      - ./logs:/app/logs      ## uncomment following lines if you have / want to make use of these files      #- ./azure.yaml:/app/azure.yaml      #- ./ai_settings.yaml:/app/ai_settings.yaml  redis:    image: "redis/redis-stack-server:latest" 5.     Download Source code(zip) from the latest stable release6.     Extract the zip-file into a folder. Fig 5. Source folder Configuration using Docker 1.     After downloading and unzipping the folder, find the file named .env.template in the main Auto-GPT folder. This file may be hidden by default in some         operating systems due to the dot prefix. To reveal hidden files, follow the instructions for your specific operating system: Windows, macOS2.     Create a copy of .env.template and call it .env; if you're already in a command prompt/terminal window: use cp .env.template .env3.     Now you should have only two files in your folder – docker-compose.yml and .env Fig 6.  Docker-compose and .env files 4.     Open the .env file in a text editor5.     Find the line that says OPENAI_API_KEY=6.     After the =, enter your unique OpenAI API Key without any quotes or spaces.7.     Extracting API key is discussed in step 1 (discussed above).8.     Save and close .env file Running Auto-GPT with Docker Easiest is to use docker-compose. Run the commands below in your Auto-GPT folder.1.     Build the image. If you have pulled the image from Docker Hub, skip this stepdocker-compose build auto-gpt2.     Run Auto-GPTdocker-compose run --rm auto-gpt3.     By default, this will also start and attach a Redis memory backend. If you do not want this, comment or remove the depends: - redis and redis: sections           from docker-compose.yml4.     You can pass extra arguments, e.g., running with --gpt3only and --continuous:docker-compose run --rm auto-gpt --gpt3only –continuous Fig 7. Auto-GPT Installed Other methods without Docker Setting up Auto-GPT with Git 1.     Make sure you have Git installed for your OS2.     To execute the given commands, open a CMD, Bash, or PowerShell window. On Windows: press Win+X and select Terminal, or Win+R and enter cmd3.     First clone the repository using following command:git clone -b stable https://github.com/Significant-Gravitas/Auto-GPT.git4.     Navigate to the directory where you downloaded the repositorycd Auto-GPT Manual Setup 1.     Download Source code (zip) from the latest stable release2.     Extract the zip-file into a folderConfiguration 1.     Find the file named .env.template in the main Auto-GPT folder. This file may be hidden by default in some operating systems due to the dot prefix. To reveal hidden files, follow the instructions for your specific operating system: Windows, macOS2.     Create a copy of .env.template and call it .env; if you're already in a command prompt/terminal window: cp .env.template .env3.     Open the .env file in a text editor4.     Find the line that says OPENAI_API_KEY=5.     After the =, enter your unique OpenAI API Key without any quotes or spaces6.     Save and close the .env file Run Auto-GPT without Docker Simply run the startup script in your terminal. This will install any necessary Python packages and launch Auto-GPT. Please note, if the above configuration is not properly setup, then it will throw an error, hence recommended and easiest way to run is using docker.On Linux/MacOS:./run.shOn Windows:.\run.batIf this gives errors, make sure you have a compatible Python version installed. ConclusionIn conclusion, if you're looking for a hassle-free way to install Auto-GPT, Docker is the recommended choice. By following our comprehensive guide, you can effortlessly set up Auto-GPT using Docker, ensuring a streamlined installation process, consistent environment configuration, and seamless deployment on different platforms. With Docker, bid farewell to compatibility concerns and embrace a straightforward and efficient Auto-GPT installation experience. Empower your language generation capabilities today with the power of Docker and Auto-GPT.Author BioRohan is an accomplished AI Architect professional with a post-graduate in Machine Learning and Artificial Intelligence. With almost a decade of experience, he has successfully developed deep learning and machine learning models for various business applications. Rohan's expertise spans multiple domains, and he excels in programming languages such as R and Python, as well as analytics techniques like regression analysis and data mining. In addition to his technical prowess, he is an effective communicator, mentor, and team leader. Rohan's passion lies in machine learning, deep learning, and computer vision.You can follow Rohan on LinkedIn
Read more
  • 0
  • 0
  • 4496
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-how-to-work-with-langchain-python-modules
Avratanu Biswas
22 Jun 2023
13 min read
Save for later

How to work with LangChain Python modules

Avratanu Biswas
22 Jun 2023
13 min read
This article is the second part of a series of articles, please refer to Part 1 for learning how to Get to grips with LangChain framework and how to utilize it for building LLM-powered AppsIntroductionIn this section, we dive into the practical usage of LangChain modules. Building upon the previous overview of LangChain components, we will work within a Python environment to gain hands-on coding experience. However, it is important to note that this overview is not a substitute for the official documentation, and it is recommended to refer to the documentation for a more comprehensive understanding.Choosing the Right Python EnvironmentWhen working with Python, Jupyter Notebook and Google Colab are popular choices for quickly getting started in the Python environment. Additionally, Visual Studio Code (VSCode) Atom, PyCharm, or Sublime Text integrated with a conda environment are also excellent options. While many of these can be used, Google Colab is used here for its convenience in quick testing and code sharing. Find the code link here.PrerequisitesBefore we begin, make sure to install the necessary Python libraries. Use the pip command within a notebook cell to install them.Installing LangChain: In order to install the "LangChain" library, which is essential for this section, you can conveniently use the following command:!pip install langchainRegular Updates: Personally, I would recommend taking advantage of LangChain’s frequent releases by frequently upgrading the packages. Use the following command for this purpose:!pip install langchain  - -  upgradeIntegrating LangChain with LLMs: Previously, we discussed how the LangChain library facilitates interaction with Large Language Models (LLMs) provided by platforms such as OpenAI, Cohere, or HuggingFace. To integrate LangChain with these models, we need to follow these steps:Obtain API Keys: In this tutorial, we will use OpenAI. We need to sign up; to easily access the API keys for the various endpoints which Open AI provides. The key must be confidential. You can obtain the API via this link.Install Python Package: Install the required Python package associated with your chosen LLM provider. For OpenAI language models, execute the command:!pip install openaiConfiguring the API Key for OpenAI: To initialize the API key for the OpenAI library, we will use the getpass Python Library. Alternatively, you can set the API key as an environment variable.# Importing the library OPENAI_API_KEY = getpass.getpass() import getpass # In order to double check # print(OPENAI_API_KEY) # not recommendedRunning the above lines of code will create a secure text input widget where we can enter the API key, obtained for accessing OpenAI LLMs endpoints. After hitting enter, the inputted value will be stored as the assigned variable OPENAI_API_KEY, allowing it to be used for subsequent operations throughout our notebook.We will explore different LangChain modules in the section below:Prompt TemplateWe need to import the necessary module, PromptTemplate, from the langchain library. A multi-line string variable named template is created - representing the structure of the prompt and containing placeholders for the context, question, and answer which are the crucial aspects of any prompt template.Image by Author | Key components of a prompt template is shown in the figure. A PromptTemplate the object is instantiated using the template variable. The input_variables parameter is provided with a list containing the variable names used in the template, in this case, only the query.:from langchain import PromptTemplate template = """ You are a Scientific Chat Assistant. Your job is to answer scientific facts and evidence, in a bullet point wise. Context: Scientific evidence is necessary to validate claims, establish credibility, and make informed decisions based on objective and rigorous investigation. Question: {query} Answer: """ prompt = PromptTemplate(template=template, input_variables=["query"])The generated prompt structure can be further utilized to dynamically fill in the question placeholder and obtain responses within the specified template format. Let's print our entire prompt! print(prompt) lc_kwargs={'template': ' You are an Scientific Chat Assistant.\nYour job is to reply scientific facts and evidence in a bullet point wise.\n\nContext: Scientific evidence is necessary to validate claims, establish credibility, \nand make informed decisions based on objective and rigorous investigation.\n\nQuestion: {query}\n\nAnswer: \n', 'input_variables': ['query']} input_variables=['query'] output_parser=None partial_variables={} template=' You are an Scientific Chat Assistant.\nYour job is to reply scientific facts and evidence in a bullet point wise.\n\nContext: Scientific evidence is necessary to validate claims, establish credibility, \nand make informed decisions based on objective and rigorous investigation.\n\nQuestion: {query}\n\nAnswer: \n' template_format='f-string' validate_template=TrueChainsThe LangChain documentation covers various types of LLM chains, which can be effectively categorized into two main groups: Generic chains and Utility chains.Image 2: ChainsChains can be broadly classified into Generic Chains and Utility Chains. (a) Generic chains are designed to provide general-purpose language capabilities, such as generating text, answering questions, and engaging in natural language conversations by leveraging LLMs. On the other contrary, (b) Utility Chains: are specialized to perform specific tasks or provide targeted functionalities. These chains are fine-tuned and optimized for specific use cases. Note, although Index-related chains can be classified into a sub-group, here we keep such chains under the banner of utility chains. They are often considered to be very useful while working with Vector databases.Since this is the very first time we are running the LLM chain, we will walk through the code in detail.We need to import the OpenAI LLM module from langchain.llms and the LLMChain module from langchain Python package.Then, an instance of the OpenAI LLM is created, using the arguments such as temperature (affects the randomness of the generated responses), openai_api_key (the API key for OpenAI which we just assigned before), model (the specific OpenAI language model to be used - other models are available here), and streaming. Note the verbose argument is pretty useful to understand the abstraction that LangChain provides under the hood, while executing our query.Next, an instance of LLMChain is created, providing the prompt (the previously defined prompt template) and the LLM (the OpenAI LLM instance).The query or question is defined as the variable query.Finally, the llm_chain.run(query) line executes the LLMChain with the specified query, generating the response based on the defined prompt and the OpenAI LLM:# Importing the OpenAI LLM module from langchain.llms import OpenAI # Importing the LLMChain module from langchain import LLMChain # Creating an instance of the OpenAI LLM llm = OpenAI(temperature=0.9, openai_api_key=OPENAI_API_KEY, model="text-davinci-003", streaming=True) # Creating an instance of the LLMChain with the provided prompt and OpenAI LLM llm_chain = LLMChain(prompt=prompt,llm=llm, verbose=True) # Defining the query or question to be asked query = "What is photosynthesis?" # Running the LLMChain with the specified query print(llm_chain.run(query)) Let's have a look at the response that is generated after running the chain with and without verbose,a) with verbose = True;Prompt after formatting:You are an Scientific Chat Assistant. Your job is to reply scientific facts and evidence in a bullet point wise.Context: Scientific evidence is necessary to validate claims, establish credibility, and make informed decisions based on objective and rigorous investigation. Question: What is photosynthesis?Answer:> Finished chain.• Photosynthesis is the process used by plants, algae and certain bacteria to convert light energy from the sun into chemical energy in the form of sugars.• Photosynthesis occurs in two stages: the light reactions and the Calvin cycle. • During the light reactions, light energy is converted into ATP and NADPH molecules.• During the Calvin cycle, ATP and NADPH molecules are used to convert carbon dioxide into sugar molecules.  b ) with verbose = False;• Photosynthesis is a process used by plants and other organisms to convert light energy, normally from the sun, into chemical energy which can later be released to fuel the organisms' activities.• During photosynthesis, light energy is converted into chemical energy and stored in sugars.• Photosynthesis occurs in two stages: light reactions and the Calvin cycle. The light reactions trap light energy and convert it into chemical energy in the form of the energy-storage molecule ATP. The Calvin cycle uses ATP and other molecules to create glucose.Seems like our general-purpose LLMChain has done a pretty decent job and given a reasonable output by leveraging the LLM.Now let's move onto the utility chain and understand it, using a simple code snippet:from langchain import OpenAI from langchain import LLMMathChain llm = OpenAI(temperature=0.9,openai_api_key= OPENAI_API_KEY) # Using the LLMMath Chain / LLM defined in Prompt Template section llm_math = LLMMathChain.from_llm(llm = llm, verbose = True) question = "What is 4 times 5" llm_math.run(question) # You know what the response would be 🎈Here the utility chain serves a specific function, i.e. to solve a fundamental maths question using the LLMMathChain. It's crucial to look at the prompt used under the hood for such chains. However , in addition, a few more notable utility chains are there as well,BashChain: A utility chain designed to execute Bash commands and scripts.SQLDatabaseChain: This utility chain enables interaction with SQL databasesSummarizationChain: The SummarizationChain is designed specifically for text summarization tasks.Such utility chains, along with other available chains in the LangChain framework, provide specialized functionalities and ready-to-use tools that can be utilized to expedite and enhance various aspects of the language processing pipeline.MemoryUntil now, we have seen, each incoming query or input to the LLMs or to its subsequent chain is treated as an independent interaction, meaning it is "stateless" (in simpler terms, information IN, information OUT). This can be considered as one of the major drawbacks, as it hinders the ability to provide a seamless and natural conversational experience for users who are seeking reasonable responses further on. To overcome this limitation and enable better context retention, LangChain offers a broad spectrum of memory components that are extremely helpful.Image by Author | The various types of Memory modules that LangChain provides.By utilizing the memory components supported, it becomes possible to remember the context of the conversation, making it more coherent and intuitive. These memory components allow for the storage and retrieval of information, enabling the LLMs to have a sense of continuity. This means they can refer back to previous relevant contexts, which greatly enhances the conversational experience for users. A typical example of such memory-based interaction is the very popular chatbot - ChatGPT, which remembers the context of our conversations.Let's have a look at how we can leverage such a possibility using LangChain:from langchain.llms import OpenAI from langchain.chains import ConversationChain from langchain.memory import ConversationBufferMemory llm = OpenAI(temperature=0, openai_api_key= OPENAI_API_KEY) conversation = ConversationChain( llm=llm, verbose=True, memory = ConversationBufferMemory() ) In the above code, we have initialized an instance of the ConversationChain class, configuring it with the OpenAI language model, enabling verbose mode for detailed output, and utilizing a ConversationBufferMemory for memory management during conversations. Now, let's begin our conversation,conversation.predict(input="Hi there!I'm Avra") Prompt after formatting:The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.Current conversation:Human: Hi there! I'm AvraAI:> Finished chain.' Hi, Avra! It's nice to meet you. My name is AI. What can I do for you today?Let's add a few more contexts to the chain, so that later we can test the context memory of the chain.conversation.predict(input="I'm interested in soccer and building AI web apps.")Prompt after formatting:The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.Current conversation:Human: Hi there!I'm AvraAI:  Hi Avra! It's nice to meet you. My name is AI. What can I do for you today?Human: I'm interested in soccer and building AI web apps.AI:> Finished chain.' That's great! Soccer is a great sport and AI web apps are a great way to explore the possibilities of artificial intelligence. Do you have any specific questions about either of those topics?Now, we make a query, which requires the chain to trace back to its memory storage and provide a reasonable response based on it.conversation.predict(input="Who am I and what's my interest ?")Prompt after formatting:The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know. Current conversation:Human: Hi there!I'm AvraAI:  Hi Avra! It's nice to meet you. My name is AI. What can I do for you today?Human: I'm interested in soccer and building AI web apps.AI:  That's great! Soccer is a great sport and AI web apps are a great way to explore the possibilities of artificial intelligence. Do you have any specific questions about either of those topics?Human: Who am I and what's my interest ?AI:> Finished chain.' That's a difficult question to answer. I don't have enough information to answer that question. However, based on what you've told me, it seems like you are Avra and your interests are soccer and building AI web apps.The above response highlights the significance of the ConversationBufferMemory chain in retaining the context of the conversation. It would be worthwhile to try out the above example without a buffer memory to get a clear perspective of the importance of the memory module. Additionally, LangChain provides several memory modules that can enhance our understanding of memory management in different ways, to handle conversational contexts.Moving forward, we will delve into the next section, where we will focus on the final two components called the “Indexes” and the "Agent." During this section, we will not only gain a hands-on understanding of its usage but also build and deploy a web app using an online workspace called Databutton.ReferencesLangChain Official Docs - https://python.langchain.com/en/latest/index.htmlCode available for this section here (Google Collab) - https://colab.research.google.com/drive/1_SpAvehzfbYYdDRnhU6v9-KHwIHMC1yj?usp=sharingPart 1: Using LangChain for Large Language Model — powered Applications : https://www.packtpub.com/article-hub/using-langchain-for-large-language-model-powered-applicationsPart 3 : Building and deploying Web App using LangChain <Insert Link>How to build a Chatbot with ChatGPT API and a Conversational Memory in Python: https://medium.com/@avra42/how-to-build-a-chatbot-with-chatgpt-api-and-a-conversational-memory-in-python-8d856cda4542Databutton - https://www.databutton.io/Author BioAvratanu Biswas, Ph.D. Student ( Biophysics ), Educator, and Content Creator, ( Data Science, ML & AI ).Twitter    YouTube    Medium     GitHub
Read more
  • 0
  • 0
  • 4426

article-image-llm-pitfalls-and-how-to-avoid-them
Amita Kapoor & Sharmistha Chatterjee
31 Aug 2023
13 min read
Save for later

LLM Pitfalls and How to Avoid Them

Amita Kapoor & Sharmistha Chatterjee
31 Aug 2023
13 min read
IntroductionLanguage Learning Models, or LLMs, are machine learning algorithms that focus on understanding and generating human-like text. These advanced developments have significantly impacted the field of natural language processing, impressing us with their capacity to produce cohesive and contextually appropriate text. However, navigating the terrain of LLMs requires vigilance, as there exist pitfalls that may trap the unprepared.In this article, we will uncover the nuances of LLMs and discover practical strategies for evading their potential pitfalls. From misconceptions surrounding their capabilities to the subtleties of bias pervading their outputs, we shed light on the intricate underpinnings beyond their impressive veneer.Understanding LLMs: A PrimerLLMs, such as GPT-4, are based on a technology called Transformer architecture, introduced in the paper "Attention is All You Need" by Vaswani et al. In essence, this architecture's 'attention' mechanism allows the model to focus on different parts of an input sentence, much like how a human reader might pay attention to different words while reading a text.Training an LLM involves two stages: pre-training and fine-tuning. During pre-training, the model is exposed to vast quantities of text data (billions of words) from the internet. Given all the previous words, the model learns to predict the next word in a sentence. Through this process, it learns grammar, facts about the world, reasoning abilities, and also some biases present in the data.  A significant part of this understanding comes from the model's ability to process English language instructions. The pre-training process exposes the model to language structures, grammar, usage, nuances of the language, common phrases, idioms, and context-based meanings.  The Transformer's 'attention' mechanism plays a crucial role in this understanding, enabling the model to focus on different parts of the input sentence when generating each word in the output. It understands which words in the sentence are essential when deciding the next word.The output of pre-training is a creative text generator. To make this generator more controllable and safe, it undergoes a fine-tuning process. Here, the model is trained on a narrower dataset, carefully generated with human reviewers' help following specific guidelines. This phase also often involves learning from instructions provided in natural language, enabling the model to respond effectively to English language instructions from users.After their initial two-step training, Large Language Models (LLMs) are ready to produce text. Here's how it works:The user provides a starting point or "prompt" to the model. Using this prompt, the model begins creating a series of "tokens", which could be words or parts of words. Each new token is influenced by the tokens that came before it, so the model keeps adjusting its internal workings after producing each token. The process is based on probabilities, not on a pre-set plan or specific goals.To control how the LLM generates text, you can adjust various settings. You can select the prompt, of course. But you can also modify settings like "temperature" and "max tokens". The "temperature" setting controls how random the model's output will be, while the "max tokens" setting sets a limit on the length of the response.When properly trained and controlled, LLMs are powerful tools that can understand and generate human-like text. Their applications range from writing assistants to customer support, tutoring, translation, and more. However, their ability to generate convincing text also poses potential risks, necessitating ongoing research into effective and ethical usage guidelines. In this article, we discuss some of the common pitfalls associated with using LLMs and offer practical advice on how to navigate these challenges, ensuring that you get the best out of these powerful language models in a safe and responsible way.Misunderstanding LLM CapabilitiesLanguage Learning Models (LLMs), like GPT-3, and BARD, are advanced AI systems capable of impressive feats. However, some common misunderstandings exist about what these models can and cannot do. Here we clarify several points to prevent confusion and misuse.Conscious Understanding: Despite their ability to generate coherent and contextually accurate responses, LLMs do not consciously understand the information they process. They don't comprehend text in the same way humans do. Instead, they make statistically informed guesses based on the patterns they've learned during training. They lack self-awareness or consciousness.Learning from Interactions: LLMs are not designed to learn from user interactions in real time. After initial model training, they don't have the ability to remember or learn from individual interactions unless their training data is updated, a process that requires substantial computational resources.Fact-Checking: LLMs can't verify the accuracy of their output or the information they're prompted with. They generate text based on patterns learned during training and cannot access real-time or updated information beyond their training cut-off. They cannot fact-check or verify information against real-world events post their training cut-off date.Personal Opinions: LLMs don't have personal experiences, beliefs, or opinions. If they generate text that seems to indicate a personal stance, it's merely a reflection of the patterns they've learned during their training process. They are incapable of feelings or preferences.Generating Original Ideas: While LLMs can generate text that may seem novel or original, they are not truly capable of creativity in the human sense. Their "ideas" result from recombining elements from their training data in novel ways, not from original thought or intention.Confidentiality: LLMs cannot keep secrets or remember specific user interactions. They do not have the capacity to store personal data from one interaction to the next. They are designed this way to ensure user privacy and confidentiality.Future Predictions: LLMs can't predict the future. Any text generated that seems to predict future events is coincidental and based solely on patterns learned from their training data.Emotional Support: While LLMs can simulate empathetic responses, they don't truly understand or feel emotions. Any emotional support provided by these models is based on learned textual patterns and should not replace professional mental health support.Understanding these limitations is crucial when interacting with LLMs. They are powerful tools for text generation, but their abilities should not be mistaken for true understanding, creativity, or emotional capacity.Bias in LLM OutputsBias in LLMs is an unintentional byproduct of their training process. LLMs, such as GPT-4, are trained on massive datasets comprising text from the internet. The models learn to predict the next word in a sentence based on the context provided by the preceding words. During this process, they inevitably absorb and replicate the biases present in their training data.Bias in LLMs can be subtle and may present itself in various ways. For example, if an LLM consistently associates certain professions with a specific gender, this reflects gender bias. Suppose you feed the model a prompt like, "The nurse attended to the patient", and the model frequently uses feminine pronouns to refer to the nurse. In contrast, with the prompt, "The engineer fixed the machine," it predominantly uses masculine pronouns for the engineer. This inclination mirrors societal biases present in the training data.It's crucial for users to be aware of these potential biases when using LLMs. Understanding this can help users interpret responses more critically, identify potential biases in the output, and even frame their prompts in a way that can mitigate bias. Users can make sure to double-check the information provided by LLMs, particularly when the output may have significant implications or is in a context known for systemic bias.Confabulation and Hallucination in LLMsIn the context of LLMs, 'confabulation' or 'hallucination' refers to generating outputs that do not align with reality or factual information. This can happen when the model, attempting to create a coherent narrative, fills in gaps with details that seem plausible but are entirely fictional.Example 1: Futuristic Election ResultsConsider an interaction where an LLM was asked for the result of a future election. The prompt was, "What was the result of the 2024 U.S. presidential election?" The model responded with a detailed result, stating a fictitious candidate had won. As of the model's last training cut-off, this event lies in the future, and the response is a complete fabrication.Example 2: The Non-existent BookIn another instance, an LLM was asked about a summary of a non-existent book with a prompt like, "Can you summarise the book 'The Shadows of Elusion' by J.K. Rowling?" The model responded with a detailed summary as if the book existed. In reality, there's no such book by J.K. Rowling. This again demonstrates the model's propensity to confabulate.Example 3: Fictitious TechnologyIn a third example, an LLM was asked to explain the workings of a fictitious technology, "How does the quantum teleportation smartphone work?" The model explained a device that doesn't exist, incorporating real-world concepts of quantum teleportation into a plausible-sounding but entirely fictional narrative.LLMs generate responses based on patterns they learn from their training data. They cannot access real-time or personal information or understand the content they generate. When faced with prompts without factual data, they can resort to confabulation, drawing from learned patterns to fabricate plausible but non-factual responses.Because of this propensity for confabulation, verifying the 'facts' generated by LLM models is crucial. This is particularly important when the output is used for decision-making or is in a sensitive context. Always corroborate the information generated by LLMs with reliable and up-to-date sources to ensure its validity and relevance. While these models can be incredibly helpful, they should be used as a tool and not a sole source of information, bearing in mind the potential for error and fabrication in their outputs.Security and Privacy in LLMsLarge Language Models (LLMs) can be a double-edged sword. Their power to create lifelike text opens the door to misuse, such as generating misleading information, spam emails, or fake news, and even facilitating complex scamming schemes. So, it's crucial to establish robust security protocols when using LLMs.Training LLMs on massive datasets can trigger privacy issues. Two primary concerns are:Data leakage: If the model is exposed to sensitive information during training, it could potentially reveal this information when generating outputs. Though these models are designed to generalize patterns and not memorize specific data points, the risk still exists, albeit at a very low probability.Inference attacks: Skilled attackers could craft specific queries to probe the model, attempting to infer sensitive details about the training data. For instance, they might attempt to discern whether certain types of content were part of the training data, potentially revealing proprietary or confidential information.Ethical Considerations in LLMsThe rapid advancements in artificial intelligence, particularly in Language Learning Models (LLMs), have transformed multiple facets of society. Yet, this exponential growth often overlooks a crucial aspect – ethics. Balancing the benefits of LLMs while addressing ethical concerns is a significant challenge that demands immediate attention.Accountability and Responsibility: Who is responsible when an LLM causes harm, such as generating misleading information or offensive content? Is it the developers who trained the model, the users who provided the prompts, or the organizations that deployed it? The ambiguous nature of responsibility and accountability in AI applications is a substantial ethical challenge.Bias and Discrimination: LLMs learn from vast amounts of data, often from the internet, reflecting our society – warts and all. Consequently, the models can internalize and perpetuate existing biases, leading to potentially discriminatory outputs. This can manifest as gender bias, racial bias, or other forms of prejudice.Invasion of Privacy: As discussed in earlier articles, LLMs can pose privacy risks. However, the ethical implications go beyond the immediate privacy concerns. For instance, if an LLM is used to generate text mimicking a particular individual's writing style, it could infringe on that person's right to personal expression and identity.Misinformation and Manipulation: The capacity of LLMs to generate human-like text can be exploited to disseminate misinformation, forge documents, or even create deepfake texts. This can manipulate public opinion, impact personal reputations, and even threaten national security.Addressing LLM Limitations: A Tripartite ApproachThe task of managing the limitations of LLMs is a tripartite effort, involving AI Developers & Researchers, Policymakers, and End Users.Role of AI Developers & Researchers:Security & Privacy: Establish robust security protocols, enforce secure training practices, and explore methods such as differential privacy. Constituting AI ethics committees can ensure ethical considerations during the design and training phases.Bias & Discrimination: Endeavor to identify and mitigate biases during training, aiming for equitable outcomes. This process includes eliminating harmful biases and confabulations.Transparency: Enhance understanding of the model by elucidating the training process, which in turn can help manage potential fabrications.Role of Policymakers:Regulations: Formulate and implement regulations that ensure accountability, transparency, fairness, and privacy in AI.Public Engagement: Encourage public participation in AI ethics discussions to ensure that regulations reflect societal norms.Role of End Users:Awareness: Comprehend the risks and ethical implications associated with LLMs, recognising that biases and fabrications are possible.Critical Evaluation: Evaluate the outputs generated by LLMs for potential misinformation, bias, or confabulations. Refrain from feeding sensitive information to an LLM and cross-verify the information produced.Feedback: Report any instances of severe bias, offensive content, or ethical concerns to the AI provider. This feedback is crucial for the continuous improvement of the model. ConclusionIn conclusion, understanding and leveraging the capabilities of Language Learning Models (LLMs) demand both caution and strategy. By recognizing their limitations, such as lack of consciousness, potential biases, and confabulation tendencies, users can navigate these pitfalls effectively. To harness LLMs responsibly, a collaborative approach among developers, policymakers, and users is essential. Implementing security measures, mitigating bias, and fostering user awareness can maximize the benefits of LLMs while minimizing their drawbacks. As LLMs continue to shape our linguistic landscape, staying informed and vigilant ensures a safer and more accurate text generation journey.Author BioAmita Kapoor is an accomplished AI consultant and educator, with over 25 years of experience. She has received international recognition for her work, including the DAAD fellowship and the Intel Developer Mesh AI Innovator Award. She is a highly respected scholar in her field, with over 100 research papers and several best-selling books on deep learning and AI. After teaching for 25 years at the University of Delhi, Amita took early retirement and turned her focus to democratizing AI education. She currently serves as a member of the Board of Directors for the non-profit Neuromatch Academy, fostering greater accessibility to knowledge and resources in the field. Following her retirement, Amita also founded NePeur, a company that provides data analytics and AI consultancy services. In addition, she shares her expertise with a global audience by teaching online classes on data science and AI at the University of Oxford.Sharmistha Chatterjee is an evangelist in the field of machine learning (ML) and cloud applications, currently working in the BFSI industry at the Commonwealth Bank of Australia in the data and analytics space. She has worked in Fortune 500 companies, as well as in early-stage start-ups. She became an advocate for responsible AI during her tenure at Publicis Sapient, where she led the digital transformation of clients across industry verticals. She is an international speaker at various tech conferences and a 2X Google Developer Expert in ML and Google Cloud. She has won multiple awards and has been listed in 40 under 40 data scientists by Analytics India Magazine (AIM) and 21 tech trailblazers in 2021 by Google. She has been involved in responsible AI initiatives led by Nasscom and as part of their DeepTech Club.Authors of this book: Platform and Model Design for Responsible AI    
Read more
  • 0
  • 0
  • 4251

article-image-deploying-llm-models-in-kubernetes-with-kfserving
Alan Bernardo Palacio
21 Aug 2023
14 min read
Save for later

Deploying LLM Models in Kubernetes with KFServing

Alan Bernardo Palacio
21 Aug 2023
14 min read
Deploying LLM models, like Hugging Face transformer library's extractive question-answering model, is popular in NLP. Learn to deploy LLM models in Kubernetes via KFServing. Utilize Hugging Face's transformers library to deploy an extractive question-answering model. KFServing ensures standard model serving with features like explainability and model management. Set up KFServing, craft a Python model server, build a Docker image, and deploy to Kubernetes with Minikube.IntroductionDeploying machine learning models to production is a critical step in turning research and development efforts into practical applications. In this tutorial, we will explore how to deploy Language Model (LLM) models in a Kubernetes cluster using KFServing. We will leverage the power of KFServing to simplify the model serving process, achieve scalability, and ensure seamless integration with existing infrastructure.To illustrate the relevance of deploying LLM models, let's consider a business use case. Imagine you are building an intelligent chatbot that provides personalized responses to customer queries. By deploying an LLM model, the chatbot can generate contextual and accurate answers, enhancing the overall user experience. With KFServing, you can easily deploy and scale the LLM model, enabling real-time interactions with users.By the end of this tutorial, you will have a solid understanding of deploying LLM models with KFServing and be ready to apply this knowledge to your own projects.Architecture OverviewBefore diving into the deployment process, let's briefly discuss the architecture. Our setup comprises a Kubernetes cluster running in Minikube, KFServing as a framework to deploy the services, and a custom LLM model server. The Kubernetes cluster provides the infrastructure for deploying and managing the model. KFServing acts as a serving layer that facilitates standardized model serving across different frameworks. Finally, the custom LLM model server hosts the pre-trained LLM model and handles inference requests.Prerequisites and SetupTo follow along with this tutorial, ensure that you have the following prerequisites:A Kubernetes cluster: You can set up a local Kubernetes cluster using Minikube or use a cloud-based Kubernetes service like Google Kubernetes Engine (GKE) or Amazon Elastic Kubernetes Service (EKS).Docker: Install Docker to build and containerize the custom LLM model server.Python and Dependencies: Install Python and the necessary dependencies, including KFServing, Transformers, TensorFlow, and other required packages. You can find a list of dependencies in the requirements.txt file.Now that we have our prerequisites, let's proceed with the deployment process.Introduction to KFServingKFServing is designed to provide a standardized way of serving machine learning models across organizations. It offers high abstraction interfaces for common ML frameworks like TensorFlow, PyTorch, and more. By leveraging KFServing, data scientists and MLOps teams can collaborate seamlessly from model production to deployment. KFServing can be easily integrated into existing Kubernetes and Istio stacks, providing model explainability, inference graph operations, and other model management functions.Setting Up KFServingTo begin, we need to set up KFServing on a Kubernetes cluster. For this tutorial, we'll use the local quick install method on a Minikube Kubernetes cluster. The quick install method allows us to install Istio and KNative without the full Kubeflow setup, making it ideal for local development and testing.Start by installing the necessary dependencies: kubectl, and Helm 3. We will assume that they are already set up. Then, follow the Minikube install instructions to complete the setup. Adjust the memory and CPU settings for Minikube to ensure smooth functioning. Once the installation is complete, start Minikube and verify the cluster status using the following commands:minikube start --memory=6144 minikube statusThe kfserving-custom-model requests at least 4Gi of memory, so in this case, we provide it with a bit more.Building a Custom Python Model ServerNow, we'll focus on the code required to build a custom Python model server for the Hugging Face extractive question-answering model. We'll use the KFServing model class and implement the necessary methods. We will start by understanding the code that powers the custom LLM model server. The server is implemented using Python and leverages the Hugging Face transformer library.Let’s start by creating a new Python file and naming it kf_model_server.py. Import the required libraries and define the KFServing_BERT_QA_Model class that inherits from kfserving.KFModel. This class will handle the model loading and prediction logic:# Import the required libraries and modules import kfserving from typing import List, Dict from transformers import AutoTokenizer, TFAutoModelForQuestionAnswering import tensorflow as tf import base64 import io # Define the custom model server class class kf_serving_model (kfserving.KFModel):    def __init__(self, name: str):        super().__init__(name)        self.name = name        self.ready = False        self.tokenizer = None    def load(self):        # Load the pre-trained model and tokenizer        self.tokenizer = AutoTokenizer.from_pretrained("bert-large-uncased-whole-word-masking-finetuned-squad")        self.model = TFAutoModelForQuestionAnswering.from_pretrained("bert-large-uncased-whole-word-masking-finetuned-squad")        self.ready = True    def predict(self, request: Dict) -> Dict:        inputs = request["instances"]        # Perform inference on the input instances        source_text = inputs[0]["text"]        questions = inputs[0]["questions"]        results = {}        for question in questions:            # Tokenize the question and source text            inputs = self.tokenizer.encode_plus(question, source_text, add_special_tokens=True, return_tensors="tf")            input_ids = inputs["input_ids"].numpy()[0]            answer_start_scores, answer_end_scores = self.model(inputs)            # Extract the answer from the scores            answer_start = tf.argmax(answer_start_scores, axis=1).numpy()[0]            answer_end = (tf.argmax(answer_end_scores, axis=1) + 1).numpy()[0]            answer = self.tokenizer.convert_tokens_to_string(self.tokenizer.convert_ids_to_tokens(input_ids[answer_start:answer_end]))            results[question] = answer        return {"predictions": results}   if __name__ == "__main__":    model = kf_serving_model("kfserving-custom-model")    model.load()    kfserving.KFServer(workers=1).start([model])In the above code, we define the kf_serving_model class that inherits from kfserving.KFModel and initializes the model and tokenizer. The class encapsulates the model loading and prediction logic. The load() method loads the pre-trained model and tokenizer from the Hugging Face library. The predict() method takes the input JSON and performs inference using the model. It generates question-answer pairs and returns them in the response.Before we proceed, let's discuss some best practices for deploying LLM models with KFServing:Model Versioning: Maintain different versions of the LLM model to support A/B testing, rollback, and easy model management.Scalability: Design the deployment to handle high traffic loads by optimizing resource allocation and leveraging horizontal scaling techniques.Monitoring and Error Handling: Implement robust logging and monitoring mechanisms to track model performance, detect anomalies, and handle errors gracefully.Performance Optimization: Explore techniques like batch processing, parallelization, and caching to optimize the inference speed and resource utilization of the deployed model.Now that we have a good understanding of the code and best practices, let's proceed with the deployment process.Deployment Steps:For the deployment, first, we need to set up the Kubernetes cluster and ensure it is running smoothly. You can use Minikube or a cloud-based Kubernetes service. Once the cluster is running, we install the KFServing CRD by cloning the KFServing repository and navigating to the cloned directory:git clone git@github.com:kubeflow/kfserving.git cd kfservingNow we install the necessary dependencies using the hack/quick_install.sh script:./hack/quick_install.shTo deploy our custom model server, we need to package it into a Docker container image. This allows for easy distribution and deployment across different environments.Building a Docker Image for the Model ServerLet’s create the Docker image by creating a new file named Dockerfile in the same directory as the Python file:# Use the official lightweight Python image. FROM python:3.7-slim ENV APP_HOME /app WORKDIR $APP_HOME # Install production dependencies. COPY requirements.txt ./ RUN pip install --no-cache-dir -r ./requirements.txt # Copy local code to the container image COPY kf_model_server.py ./ CMD ["python", "kf_model_server.py"] The Dockerfile specifies the base Python image, sets the working directory, installs the dependencies from the requirements.txt file, and copies the Python code into the container. Here we will be running this locally on a CPU, so we will be using tensorflow-cpu for the application:kfserving==0.3.0 transformers==2.1.1 tensorflow-cpu==2.2.0 protobuf==3.20.0To build the Docker image, execute the following command:docker build -t kfserving-custom-model .This command builds the container image using the Dockerfile and tags it with the specified name.When you build a Docker image using docker build -t kfserving-custom-model ., the image is only available in your local Docker environment. Kubernetes can't access images from your local Docker environment unless you're using a tool like Minikube or kind with a specific configuration to allow this.To make the image available to Kubernetes, you need to push it to a Docker registry like Docker Hub, Google Container Registry (GCR), or any other registry accessible to your Kubernetes cluster.Here are the general steps you need to follow:Tag your image with the registry address:If you are using Docker Hub, the command is:docker tag kfserving-custom-model:latest <your-dockerhub-username>/kfserving-custom-model:latestPush the image to the registry:For Docker Hub, the command is:docker push <your-dockerhub-username>/kfserving-custom-model:latestMake sure to replace <your-dockerhub-username> with your actual Docker Hub username. Also, ensure that your Kubernetes cluster has the necessary credentials to pull from the registry if it's private. If it's a public Docker Hub repository, there should be no issues.Deploying the Custom Model Server on KFServingNow that we have the Docker image, we can deploy the custom model server as an InferenceService on KFServing. We'll use a YAML configuration file to describe the Kubernetes model resource. Create a file named deploy_server.yaml and populate it with the following content:apiVersion: serving.kserve.io/v1beta1 kind: InferenceService metadata: labels:    controller-tools.k8s.io: "1.0" name: kfserving-custom-model spec: predictor:    containers:    - image: <your-dockerhub-username>/kfserving-custom-model:latest      name: kfserving-container      resources:        requests:          memory: "4096Mi"          cpu: "250m"        limits:          memory: "4096Mi"          cpu: "500m"The YAML file defines the model's metadata, including the name and labels. It specifies the container image to use, along with resource requirements for memory and CPU.To deploy the model, run the following command:kubectl apply -f deploy_server.yamlThis command creates the InferenceService resource in the Kubernetes cluster, deploying the custom model server.Verify the deployment status:kubectl get inferenceservicesThis should show you the status of the inference service:We can see that the containers have downloaded the BERT model and now there are ready to start receiving inference calls.Making an Inference Call with the KFServing-Hosted ModelOnce the model is deployed on KFServing, we can make inference calls to the locally hosted Hugging Face QA model. To do this, we'll need to set up port forwarding to expose the model's port to our local system.Execute the following command to determine if your Kubernetes cluster is running in an environment that supports external load balancerskubectl get svc istio-ingressgateway -n istio-systemNow we can do Port Forward for testing purposes:INGRESS_GATEWAY_SERVICE=$(kubectl get svc --namespace istio-system --selector="app=istio-ingressgateway" --output jsonpath='{.items[0].metadata.name}') kubectl port-forward --namespace istio-system svc/${INGRESS_GATEWAY_SERVICE} 8080:80 # start another terminal export INGRESS_HOST=localhost export INGRESS_PORT=8080This command forwards port 8080 on our local system to port 80 of the model's service. It enables us to access the model's endpoint locally.Next, create a JSON file named kf_input.json with the following content:{ "instances": [    {      "text": "Transformers (formerly known as pytorch-transformers and pytorch-pretrained-bert) provides general-purpose architectures (BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet…) for Natural Language Understanding (NLU) and Natural Language Generation (NLG) with over 32+ pretrained models in 100+ languages and deep interoperability between TensorFlow 2.0 and PyTorch.",      "questions": [        "How many pretrained models are available in Transformers?",        "What does Transformers provide?",        "Transformers provides interoperability between which frameworks?"      ]    } ] }The JSON file contains the input text and a list of questions for the model to answer. To make an inference call, use the CURL command:curl -v -H "Host: kfserving-custom-model.default.example.com" -d @./kf_input.json <http://localhost:8080/v1/models/kfserving-custom-model:predict>This command sends the JSON file as input to the predict method of our custom InferenceService. It forwards the request to the model's endpoint. It returns the next predictions:{"predictions":      {"How many pretrained models are available in Transformers?":                  "over 32 +",            "What does Transformers provide?":                  "general - purpose architectures",            "Transformers provides interoperability between which frameworks?":                  "tensorflow 2 . 0 and pytorch"} }We can see the whole operation here:The response includes the generated question-answer pairs for each one of the specified questions.ConclusionIn this tutorial, we learned how to deploy Language Model (LLM) models in a Kubernetes cluster using KFServing. We set up KFServing, built a custom Python model server using the Hugging Face extractive question-answering model, created a Docker image for the model server, and deployed the model as an InferenceService on KFServing. We also made inference calls to the hosted model and obtained question-answer pairs. By following this guide, you can deploy your own LLM models in Kubernetes with ease.Deploying LLM models in Kubernetes with KFServing simplifies the process of serving ML models at scale. It enables collaboration between data scientists and MLOps teams and provides standardized model-serving capabilities. With this knowledge, you can leverage KFServing to deploy and serve your own LLM models efficiently.Author Bio:Alan Bernardo Palacio is a data scientist and an engineer with vast experience in different engineering fields. His focus has been the development and application of state-of-the-art data products and algorithms in several industries. He has worked for companies such as Ernst and Young, Globant, and now holds a data engineer position at Ebiquity Media helping the company to create a scalable data pipeline. Alan graduated with a Mechanical Engineering degree from the National University of Tucuman in 2015, participated as the founder in startups, and later on earned a Master's degree from the faculty of Mathematics in the Autonomous University of Barcelona in 2017. Originally from Argentina, he now works and resides in the Netherlands.LinkedIn 
Read more
  • 0
  • 0
  • 4169

article-image-large-language-models-llms-in-education
Chaitanya Yadav
23 Oct 2023
8 min read
Save for later

Large Language Models (LLMs) in Education

Chaitanya Yadav
23 Oct 2023
8 min read
Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!IntroductionLarge language models are a type of AI that can create and understand human language. The article deals with the potential of large language models in education and how they can be transformed. The ability to create and understand the language of man, by drawing on a vast database of textual data, is possessed by LLMs powered by artificial intelligence.It shows how LLMs could, by means of practical examples, put in place individual learning pathways, providing Advanced Learning Analytics and developing participatory simulations that would lead to the creation of more effective educational strategies.Benefits of LLMs in Education                                       Personalized learningThe capacity of LLMs in education to customize learning experiences for each student is one of their greatest advantages. Lesson-plan customization, individualized feedback, and real-time monitoring of student progress are all possible with LLMsAutomated tasksAdditionally, LLMs can be utilized to automate processes like grading and lesson planning. By doing this, instructors may have more time to give to other important responsibilities like teaching and connecting with students.New and innovative educational tools and resourcesLLMs can be applied to the development of innovative and cutting-edge learning resources and technology. LLMs can be used to create interactive simulations, games, and other educational activities.Real-time feedback and supportLLMs can also be utilized for providing quick help and feedback to students. For example, LLMs can be used to create chatbots that can assist students with their academic work and respond to their queries. Potential Challenges of LLMs in EducationIncorrect or misleading informationThe fact that LLMs might provide inaccurate or misleading information is one of the main problems with their use in education. This is due to the fact that LLMs are taught using vast volumes of data, some of which could be old or erroneous.Lack of understandingAnother issue with utilizing LLMs in teaching is that they might not be able to fully understand the material they produce in its entirety. This is so that they may better understand the complexity of human communication as LLMs receive instruction on statistical patterns in language.Ethical concernsThere are also some ethical concerns associated with the use of LLMs in education. LLMs should be used carefully, and their usage might have ethical consequences, which should be considered.How LLM can be used for Transforming Education with Advanced Learning StrategiesLet's look at a few examples that show the possibilities of Large Language Models (LLM) in Education.1. Advanced Personalized Learning PathwayIn this example, in order to reflect a student's individual objectives, teaching style, and progress, we are going to form an even more detailed personalized education path. Follow the steps perfectly given in the input code to create a personalized learning pathway.Input Code:    # Step 1: First we will define the generate_learning_pathway function def generate_learning_pathway(prompt, user_profile):    # Step 2: Once the function is defined we will create a template for the learning pathway    learning_pathway_template = f"Dear {user_profile['student_name']},\n\nI'm excited to help you create a personalized learning pathway to achieve your goal of {user_profile['goals']}. As a {user_profile['learning_style']} learner with {user_profile['current_progress']}, here's your pathway:\n\n"    # Step 3: Now let’s define the specific steps in the learning pathway    steps = [        "Step 1: Introduction to Data Science",        "Step 2: Data Visualization Techniques for Visual Learners",        "Step 3: Intermediate Statistics for Data Analysis",        "Step 4: Machine Learning Fundamentals",        "Step 5: Real-world Data Science Projects",    ]    # Step 4: Combine the template and the specific steps    learning_pathway = learning_pathway_template + "\n".join(steps)    return learning_pathway # Step 5: Define a main function to test the code def main():    user_profile = {        "student_name": "Alice",        "goals": "Become a data scientist",       "learning_style": "Visual learner",        "current_progress": "Completed basic statistics"    }    prompt = "Create a personalized learning pathway."    # Step 6: Generate the learning pathway    learning_pathway = generate_learning_pathway(prompt, user_profile)    # Step 7: Print the learning pathway    print(learning_pathway) if __name__ == "__main__":    main() Output:This example gives the LLM a highly customized approach to teaching taking into account students' names, objectives, methods of education, and how they are progressing.2. AI-Enhanced Learning AnalyticsThe use of LLMs in Learning Analytics may provide teachers with more detailed information on the student's performance and help them to make appropriate recommendations.Input code:# Define the generate_learning_analytics function def generate_learning_analytics(prompt, student_data): # Analyze the performance based on quiz scores average_quiz_score = sum(student_data["quiz_scores"]) / len(student_data["quiz_scores"]) # Calculate homework completion rate total_homeworks = len(student_data["homework_completion"]) completed_homeworks = sum(student_data["homework_completion"]) homework_completion_rate = (completed_homeworks / total_homeworks) * 100 # Generate the learning analytics report analytics_report = f"Learning Analytics Report for Student {student_data['student_id']}:\n" analytics_report += f"- Average Quiz Score: {average_quiz_score:.2f}\n" analytics_report += f"- Homework Completion Rate: {homework_completion_rate:.2f}%\n" if homework_completion_rate < 70: analytics_report += "Based on their performance, it's recommended to provide additional support for completing homework assignments." return analytics_reportThis code defines a Python function, ‘generates_learning_analytics’, which takes prompt and student data as input, calculates average quiz scores and homework completion rates, and generates a report that includes these metrics, together with possible recommendations for additional support based on homework performance. Now let’s provide student performance data.Input code:student_data = {    "student_id": "99678",    "quiz_scores": [89, 92, 78, 95, 89],    "homework_completion": [True, True, False, True, True] } prompt = f"Analyze the performance of student {student_data['student_id']} based on their recent quiz scores and homework completion." analytics_report = generate_learning_analytics(prompt, student_data) print(analytics_report)Output:The student's test scores and the homework completion data included in the ‘student_data’ dictionary are used to generate this report.3. Advanced Interactive Simulations for LearningThe potential for LLMs to provide an engaging learning resource will be demonstrated through the creation of a comprehensive computerised training simulation on complicated topics, such as physics.Input code:# Define the generate_advanced_simulation function def generate_advanced_simulation(prompt): # Create the interactive simulation    interactive_simulation = f"Interactive {prompt} Simulation" # Provide a link to the interactive simulation (replace with an actual link)    interactive_simulation_link = "https://your-interactive-simulation-link.com"    return interactive_simulation, interactive_simulation_link # Define a main function to test the code def main():    topic = "Quantum Mechanics"    prompt = f"Develop an interactive simulation for teaching {topic} to advanced high school students." # Generate the interactive simulation    interactive_simulation, interactive_simulation_link = generate_advanced_simulation(prompt) # Print the interactive simulation and link    print(f"Explore the {topic} interactive simulation: {interactive_simulation_link}") if __name__ == "__main__":    main()Output:In this example, for a complex topic like quantum physics, the LLM is asked to create an advanced interactive simulation that will make learning more interesting and visual. Also, make sure to replace and provide your link to the interactive simulation.Such advanced examples demonstrate the adaptability of LLMs to create highly customized learning pathways, Advanced Learning Analytics Reports, and sophisticated interactive simulations with in-depth educational experiences.ConclusionIn conclusion, by providing advanced learning strategies and tools, large language models represent a tremendous potential for revolutionizing education. These models provide a range of benefits, including personalized learning experiences, timely feedback and support, automated tasks, and the development of useful tools for innovation in education.The article considers the practical use of LLMs in education, which includes developing more sophisticated personalized school paths that take into account students' specific educational objectives and how they learn. Moreover, by giving details of the student's performance and recommendations for improvement, LLMs can improve Learning Analytics. In addition, how LLMs can enhance learning by enabling interactivity and engagement has been demonstrated through the development of real-time simulations on complicated topics.The future of education appears promising by taking into account the LLMs' ability to offer a more diverse, creative learning environment with limitless opportunities for learners around the world.Author BioChaitanya Yadav is a data analyst, machine learning, and cloud computing expert with a passion for technology and education. He has a proven track record of success in using technology to solve real-world problems and help others to learn and grow. He is skilled in a wide range of technologies, including SQL, Python, data visualization tools like Power BI, and cloud computing platforms like Google Cloud Platform. He is also 22x Multicloud Certified.In addition to his technical skills, he is also a brilliant content creator, blog writer, and book reviewer. He is the Co-founder of a tech community called "CS Infostics" which is dedicated to sharing opportunities to learn and grow in the field of IT.
Read more
  • 3
  • 0
  • 4098
article-image-using-llm-chains-in-rust
Alan Bernardo Palacio
12 Sep 2023
9 min read
Save for later

Using LLM Chains in Rust

Alan Bernardo Palacio
12 Sep 2023
9 min read
IntroductionThe llm-chain is a Rust library designed to make your experience with large language models (LLMs) smoother and more powerful. In this tutorial, we'll walk you through the steps of installing Rust, setting up a new project, and getting started with the versatile capabilities of LLM-Chain.This guide will break down the process step by step, using simple language, so you can confidently explore the potential of LLM-Chain in your projects.InstallationBefore we dive into the exciting world of LLM-Chain, let's start with the basics. To begin, you'll need to install Rust on your computer. By using the official Rust toolchain manager called rustup you can ensure you have the latest version and easily manage your installations. We recommend having Rust version 1.65.0 or higher. If you encounter errors related to unstable features or dependencies requiring a newer Rust version, simply update your Rust version. Just follow the instructions provided on the rustup website to get Rust up and running.With Rust now installed on your machine, let's set up a new project. This step is essential to create an organized space for your work with LLM-Chain. To do this, you'll use a simple command-line instruction. Open up your terminal and run the following command:cargo new --bin my-llm-projectBy executing this command, a new directory named "my-llm-project" will be created. This directory contains all the necessary files and folders for a Rust project.Embracing the Power of LLM-ChainNow that you have your Rust project folder ready, it's time to integrate the capabilities of LLM-Chain. This library simplifies your interaction with LLMs and empowers you to create remarkable applications. Adding LLM-Chain to your project is a breeze. Navigate to your project directory by using the terminal and run the following command:cd my-llm-project cargo add llm-chainBy running this command, LLM-Chain will become a part of your project, and the configuration will be recorded in the "Cargo.toml" file.LLM-Chain offers flexibility by supporting multiple drivers for different LLMs. For the purpose of simplicity and a quick start, we'll be using the OpenAI driver in this tutorial. You'll have the choice between the LLAMA driver, which runs a LLaMA LLM on your machine, and the OpenAI driver, which connects to the OpenAI API.To choose the OpenAI driver, execute this command:cargo add llm-chain-openaiIn the next section, we'll explore generating your very first LLM output using the OpenAI driver. So, let's move on to exploring sequential chains with Rust and uncovering the possibilities they hold with LLM-Chain.Exploring Sequential Chains with RustIn the realm of LLM-Chain, sequential chains empower you to orchestrate a sequence of steps where the output of each step seamlessly flows into the next. This hands-on section serves as your guide to crafting a sequential chain, expanding its capabilities with additional steps, and gaining insights into best practices and tips that ensure your success.Let's kick things off by preparing our project environment:As we delve into creating sequential chains, one crucial prerequisite is the installation of tokio in your project. While this tutorial uses the full tokio package crate, remember that in production scenarios, it's recommended to be more selective about which features you install. To set the stage, run the following command in your terminal:cargo add tokio --features fullThis step ensures that your project is equipped with the necessary tools to handle the intricate tasks of sequential chains. Before we continue, ensure that you've set your OpenAI API key in the OPENAI_API_KEY environment variable. Here's how:export OPENAI_API_KEY="YOUR_OPEN_AI_KEY"With your environment ready, let’s look at the full implementation code. In this case, we will be implementing the use of Chains to generate recommendations of cities to travel to, formatting them, and organizing the results throughout a series of steps:use llm_chain::parameters; use llm_chain::step::Step; use llm_chain::traits::Executor as ExecutorTrait; use llm_chain::{chains::sequential::Chain, prompt}; use llm_chain_openai::chatgpt::Executor; #[tokio::main(flavor = "current_thread")] async fn main() -> Result<(), Box<dyn std::error::Error>> {    // Create a new ChatGPT executor with default settings    let exec = Executor::new()?;    // Create a chain of steps with two prompts    let chain: Chain = Chain::new(vec![        // First step: Craft a personalized birthday email        Step::for_prompt_template(            prompt!("You are a bot for travel assistance research",                "Find good places to visit in this city {{city}} in this country {{country}}. Include their name")        ),        // Second step: Condense the email into a tweet. Notably, the text parameter takes the output of the previous prompt.        Step::for_prompt_template(            prompt!(                "You are an assistant for managing social media accounts for a travel company",                "Format the information into 5 bullet points for the most relevant places. \\\\n--\\\\n{{text}}")        ),        // Third step: Summarize the email into a LinkedIn post for the company page, and sprinkle in some emojis for flair.        Step::for_prompt_template(            prompt!(                "You are an assistant for managing social media accounts for a travel company",                "Summarize this email into a LinkedIn post for the company page, and feel free to use emojis! \\\\n--\\\\n{{text}}")        )    ]);    // Execute the chain with provided parameters    let result = chain        .run(            // Create a Parameters object with key-value pairs for the placeholders            parameters!("city" => "Rome", "country" => "Italy"),            &exec,        )        .await        .unwrap();    // Display the result on the console    println!("{}", result.to_immediate().await?.as_content());    Ok(()) }The provided code initiates a multi-step process using the llm_chain and llm_chain_openai libraries. First, it sets up a ChatGPT executor with default configurations. Next, it creates a chain of sequential steps, each designed to produce specific text outputs. The first step involves crafting a personalized travel recommendation, which includes information about places to visit in a particular city and country, with a Parameters object containing key-value pairs for placeholders like {{city}} and {{country}}. The second step condenses this email into a tweet, formatting the information into five bullet points and utilizing the text output from the previous step. Lastly, the third step summarizes the email into a LinkedIn post for a travel company's page, adding emojis for extra appeal.The chain is executed with specified parameters, creating a Parameters object with key-value pairs for placeholders like "city" (set to "Rome") and "country" (set to "Italy"). The generated content is then displayed on the console. This code represents a structured workflow for generating travel-related content using ChatGPT.Running the CodeNow, it's time to compile the code and run the code. Execute the following command in your terminal:cargo runAs the code executes, the sequential chain orchestrates the different prompts, generating content that flows through each step.We can see the results of the model as a bulleted list of travel recommendations.ConclusionThe llm-chain Rust library serves as your gateway to accessing large language models (LLMs) within the Rust programming language. This tutorial has been your guide to uncovering the fundamental steps necessary to harness the versatile capabilities of LLM-Chain.We began with the foundational elements, guiding you through the process of installing Rust and integrating llm-chain into your project using Cargo. We then delved into the practical application of LLM-Chain by configuring it with the OpenAI driver, emphasizing the use of sequential chains. This approach empowers you to construct sequences of steps, where each step's output seamlessly feeds into the next. As a practical example, we demonstrated how to create a travel recommendation engine capable of generating concise posts for various destinations, suitable for sharing on LinkedIn.It's important to note that LLM-Chain offers even more possibilities for exploration. You can extend its capabilities by incorporating CPP models like Llama, or you can venture into the realm of map-reduce chains. With this powerful tool at your disposal, the potential for creative and practical applications is virtually limitless. Feel free to continue your exploration and unlock the full potential of LLM-Chain in your projects. See you in the next article.Author BioAlan Bernardo Palacio is a data scientist and an engineer with vast experience in different engineering fields. His focus has been the development and application of state-of-the-art data products and algorithms in several industries. He has worked for companies such as Ernst and Young, and Globant, and now holds a data engineer position at Ebiquity Media helping the company to create a scalable data pipeline. Alan graduated with a Mechanical Engineering degree from the National University of Tucuman in 2015, participated as the founder of startups, and later on earned a Master's degree from the faculty of Mathematics at the Autonomous University of Barcelona in 2017. Originally from Argentina, he now works and resides in the Netherlands.LinkedIn
Read more
  • 0
  • 0
  • 3999

article-image-testing-large-language-models-llms
20 Oct 2023
7 min read
Save for later

Testing Large Language Models (LLMs)

20 Oct 2023
7 min read
Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!Machine learning has become ubiquitous, with models powering everything from search engines and recommendation systems to chatbots and autonomous vehicles. As these models grow more complex, testing them thoroughly is crucial to ensure they behave as expected. This is especially true for large language models like GPT-4 that generate human-like text and engage in natural conversations.In this article, we will explore strategies for testing machine learning models, with a focus on evaluating the performance of LLMs.IntroductionMachine learning models are notoriously challenging to test due to their black-box nature. Unlike traditional code, we cannot simply verify the logic line-by-line. ML models learn from data and make probabilistic predictions, so their decision-making process is opaque.While testing methods like unit testing and integration testing are common for traditional software, they do not directly apply to ML models. We need more specialized techniques to validate model performance and uncover unexpected or undesirable behavior.Testing is particularly crucial for large language models. Since LLMs can generate free-form text, it's hard to anticipate their exact responses. Flaws in the training data or model architecture can lead to Hallucinations, biases, and errors that only surface during real-world usage. Rigorous testing provides confidence that the model works as intended.In this article, we will cover testing strategies to evaluate LLMs. The key techniques we will explore are:Similarity testingColumn coverage testingExact match testingVisual output testingLLM-based evaluationBy combining these methods, we can thoroughly test LLMs along multiple dimensions and ensure they provide coherent, accurate, and appropriate responses.Testing Text Output with Similarity SearchA common output from LLMs is text. This could be anything from chatbot responses to summaries generated from documents. A robust way to test quality of text output is similarity testing.The idea is simple - we define an expected response and compare the model's actual response to determine how similar they are. The higher the similarity score, the better.Let's walk through an example using our favorite LLM. Suppose we give it the prompt:Prompt: What is the capital of Italy?The expected response would be:Expected: The capital of Italy is Rome.Now we can pass this prompt to the LLM and get the actual response:prompt = "What is the capital of Italy?" actual = llm.ask(prompt) Let's say actual contains:Actual: Rome is the capital of Italy.While the wording is different, the meaning is the same. To quantify this similarity, we can use semantic search libraries like SentenceTransformers. It represents sentences as numeric vectors and computes similarity using cosine distance.from sentence_transformers import SentenceTransformer model = SentenceTransformer('all-MiniLM-L6-v2') expected_embedding = model.encode(expected) actual_embedding = model.encode(actual) similarity = cosine_similarity([expected_embedding], [actual_embedding])[0][0] This yields a similarity score of 0.85, indicating the responses are highly similar in meaning.We can establish a threshold for the minimum acceptable similarity, like 0.8. Responses below this threshold fail the test. By running similarity testing over many prompt-response pairs, we can holistically assess the textual coherence of an LLM.Testing Tabular Outputs with Column CoverageIn addition to text, LLMs can output tables or data frames. For testing these, we need different techniques that account for structure.A good validation is column coverage - checking what percentage of columns in the expected output are present in the actual output.Consider the LLM answering questions about movies:Prompt: What are the top 3 highest grossing movies of all time?Expected:MovieWorldwide GrossRelease YearAvatar$2,789,679,7942009Titanic$2,187,463,9441997Star Wars Ep. VII$2,068,223,6242015Now we can test the LLM’s actual output:prompt = "What are the top 3 highest grossing movies of all time?" actual = llm.ask(prompt) Actual:MovieGlobal RevenueYearAvatar$2.789 billion2009Titanic$2.187 billion1997Star Wars: The Force Awakens$2.068 billion2015Here, actual contains the same 3 columns as expected - Movie, Gross, Release Year. So even though the headers and cell values differ slightly, we can pair them with cosine similarity and we will have 100% column coverage.We can formalize this in code:expected_cols = set(expected.columns) actual_cols = set(actual.columns) column_coverage = len(expected_cols & actual_cols) / len(expected_cols) # column_coverage = 1.0 For tables with many columns, we may only need say 90% coverage to pass the test. This validation ensures the critical output columns are present while allowing variability in column names or ancillary data.Exact Match for Numeric OutputsWhen LLMs output a single number or statistic, we can use simple exact match testing.Consider this prompt:Prompt: What was Apple's total revenue in 2021?Expected: $365.82 billionWe get the LLM’s response:prompt = "What was Apple's total revenue in 2021?" actual = llm.ask(prompt) Actual: $365.82 billionIn this case, we expect an exact string match:is_match = (actual == expected) # is_match = True For numerical outputs, precision is important. Exact match testing provides a straightforward way to validate this.Screenshot Testing for Visual OutputsBuilding PandasAI, we sometimes need to test generated charts. Testing these outputs requires verifying the visualized data is correct.One method is screenshot testing - comparing screenshots of the expected and actual visuals. For example:Prompt: Generate a bar chart comparing the revenue of FAANG companies.Expected: [Expected_Chart.png]Actual: [Actual_Chart.png]We can then test if the images match:from PIL import Image, ImageChops expected_img = Image.open("./Expected_Chart.png") actual_img = Image.open("./Actual_Chart.png") diff = ImageChops.difference(expected_img, actual_img) is_match = diff.getbbox() is None // is_match = True if images matchFor more robust validation, we could use computer vision techniques like template matching to identify and compare key elements: axes, bars, labels, etc.Screenshot testing provides quick validation of visual output without needing to interpret the raw chart data.LLM-Based EvaluationAn intriguing idea for testing LLMs is to use another LLM!The concept is to pass the expected and actual outputs to a separate "evaluator" LLM and ask if they match.For example:Expected: Rome is the capital of Italy.Actual: The capital of Italy is Rome.We can feed this to the evaluator model:Prompt: Do these two sentences convey the same information? Answer YES or NOSentence 1: Rome is the capital of Italy.Sentence 2: The capital of Italy is Rome.Evaluator: YESThe evaluator LLM acts like a semantic similarity scorer. This takes advantage of the natural language capabilities of LLMs.The downside is it evaluates one black box model using another black box model. Errors or biases in the evaluator could lead to incorrect assessments. So LLM-based evaluation should complement other testing approaches, not act as the sole method.ConclusionTesting machine learning models thoroughly is critical as they grow more ubiquitous and impactful. Large language models pose unique testing challenges due to their free-form textual outputs.Using a combination of similarity testing, column coverage validation, exact match, visual output screening, and even LLM-based evaluation, we can rigorously assess LLMs along multiple dimensions.A comprehensive test suite combining these techniques will catch more flaws and flaws than any single method alone. This builds essential confidence that LLMs behave as expected in the real world.Testing takes time but prevents much larger problems down the road. The strategies covered in this article will add rigor to the development and deployment of LLMs, helping ensure these powerful models benefit humanity as intended.Author BioGabriele Venturi is a software engineer and entrepreneur who started coding at the young age of 12. Since then, he has launched several projects across gaming, travel, finance, and other spaces - contributing his technical skills to various startups across Europe over the past decade.Gabriele's true passion lies in leveraging AI advancements to simplify data analysis. This mission led him to create PandasAI, released open source in April 2023. PandasAI integrates large language models into the popular Python data analysis library Pandas. This enables an intuitive conversational interface for exploring data through natural language queries.By open-sourcing PandasAI, Gabriele aims to share the power of AI with the community and push boundaries in conversational data analytics. He actively contributes as an open-source developer dedicated to advancing what's possible with generative AI.
Read more
  • 0
  • 0
  • 3989

article-image-exploring-token-generation-strategies
Saeed Dehqan
28 Aug 2023
8 min read
Save for later

Exploring Token Generation Strategies

Saeed Dehqan
28 Aug 2023
8 min read
IntroductionThis article discusses different methods for generating sequences of tokens using language models, specifically focusing on the context of predicting the next token in a sequence. The article explains various techniques to select the next token based on the predicted probability distribution of possible tokens.Language models predict the next token based on n previous tokens. Models try to extract information from n previous tokens as far as they can. Transformer models aggregate information from all the n previous tokens. Tokens in a sequence communicate with one another and exchange their information. At the end of the communication process, tokens are context-aware and we use them to predict their own next token. Each token separately goes to some linear/non-linear layers and the output is unnormalized logits. Then, we apply Softmax on logits to convert them into probability distributions. Each token has its own probability distribution over its next token:Exploring Methods for Token SelectionWhen we have the probability distribution of tokens, it’s time to pick one token as the next token. There are four methods for selecting the suitable token from probability distribution:●    Greedy or naive method: Simply select the token that has the highest probability from the list. This is a deterministic method.●    Beam search: It receives a parameter named beam size and based on it, the algorithm tries to use the model to predict multiple times to find a suitable sentence, not just a token. This is a deterministic method.●    Top-k sampling: Select the top k most probable tokens and shut off other tokens (make their probability -inf) and sample from top k tokens. This is a sampling method.●    Nucleus sampling: Select the top most probable tokens and shut off other tokens but with a difference that is a dynamic selection of most probable tokens. Not just a crisp k.Greedy methodThis is a simple and fast method and only needs one prediction. Just select the most probable token as the next token. Greedy methods can be efficient on arithmetic tasks. But, it tends to get stuck in a loop and repeat tokens one after another. It also kills the diversity of the model by selecting the tokens that occur frequently in the training dataset.Here’s the code that converts unnormalized logits(simply the output of the network) into probability distribution and selects the most probable next token:probs = F.softmax(logits, dim=-1) next_token = probs.argmax() Beam searchBeam search produces better results and is slower because it runs the model multiple times so that it can create n sequences, where n is beam size. This method selects top n tokens and adds them to the current sequence and runs the model on the made sequences to predict the next token. And this process continues until the end of the sequence. Computationally expensive, but more quality. Based on this search, the algorithm returns two sequences:Then, how do we select the final sequence? We sum up the loss for all predictions and select the sequence with the lowest loss.Simple samplingWe can select tokens randomly based on their probability. The more the probability, the more the chance of being selected. We can achieve this by using multinomial method:logits = logits[:, -1, :] probs = F.softmax(logits, dim=-1) next_idx = torch.multinomial(probs, num_samples=1)This is part of the model we implemented in the “transformer building blocks” blog and the code can be found here. The torch.multinomial receives the probability distribution and selects n samples. Here’s an example:In [1]: import torch In [2]: probs = torch.tensor([0.3, 0.6, 0.1]) In [3]: torch.multinomial(probs, num_samples=1) Out[3]: tensor([1]) In [4]: torch.multinomial(probs, num_samples=1) Out[4]: tensor([0]) In [5]: torch.multinomial(probs, num_samples=1) Out[5]: tensor([1]) In [6]: torch.multinomial(probs, num_samples=1) Out[6]: tensor([0]) In [7]: torch.multinomial(probs, num_samples=1) Out[7]: tensor([1]) In [8]: torch.multinomial(probs, num_samples=1) Out[8]: tensor([1])We ran the method six times on probs, and as you can see it selects 0.6 four times and 0.3 two times because 0.6 is higher than 0.3.Top-k samplingIf we want to make the previous sampling method better, we need to limit the sampling space. Top-k sampling does this. K is a parameter that Top-k sampling uses to select top k tokens from the probability distribution and sample from these k tokens. Here is an example of top-k sampling:In [1]: import torch In [2]: logit = torch.randn(10) In [3]: logit Out[3]: tensor([-1.1147, 0.5769, 0.3831, -0.5841, 1.7528, -0.7718, -0.4438, 0.6529, 0.1500, 1.2592]) In [4]: topk_values, topk_indices = torch.topk(logit, 3) In [5]: topk_values Out[5]: tensor([1.7528, 1.2592, 0.6529]) In [6]: logit[logit < topk_values[-1]] = float('-inf') In [7]: logit Out[7]: tensor([ -inf, -inf, -inf, -inf, 1.7528, -inf, -inf, 0.6529, -inf, 1.2592]) In [8]: probs = logit.softmax(0) In [9]: probs Out[9]: tensor([0.0000, 0.0000, 0.0000, 0.0000, 0.5146, 0.0000, 0.0000, 0.1713, 0.0000, 0.3141]) In [10]: torch.multinomial(probs, num_samples=1) Out[10]: tensor([9]) In [11]: torch.multinomial(probs, num_samples=1) Out[11]: tensor([4]) In [12]: torch.multinomial(probs, num_samples=1) Out[12]: tensor([9])●    We first create a fake logit with torch.randn. Supposedly logit is the raw output of a network.●    We use torch.topk to select the top 3 values from logit. torch.topk returns top 3 values along with their indices. The values are sorted from top to bottom.●    We use advanced indexing to select logit values that are lower than the last top 3 values. When we say logit < topk_values[-1] we mean all the numbers in logit that are lower than topk_values[-1] (0.6529). ●    After selecting those numbers, we replace their value to float(‘-inf’), which is a negative infinite number. ●    After replacement, we run softmax over the logit to convert it into probabilities. ●    Now, we use torch.multinomial to sample from the probs.Nucleus samplingNucleus sampling is like Top-k sampling but with a dynamic selection of top tokens instead of selecting k tokens. The dynamic selection is better when we are unsure of selecting a suitable k for Top-k sampling. Nucleus sampling has a hyperparameter named p, let us say it is 0.9, and this method selects tokens from descending order and adds up their probabilities and when we reach a cumulative sum of p, we stop. What is the cumulative sum? Here’s an example of cumulative sum:In [1]: import torch In [2]: logit = torch.randn(10) In [3]: probs = logit.softmax(0) In [4]: probs Out[4]: tensor([0.0652, 0.0330, 0.0609, 0.0436, 0.2365, 0.1738, 0.0651, 0.0692, 0.0495, 0.2031]) In [5]: [probs[:x+1].sum() for x in range(probs.size(0))] Out[5]: [tensor(0.0652), tensor(0.0983), tensor(0.1592), tensor(0.2028), tensor(0.4394), tensor(0.6131), tensor(0.6782), tensor(0.7474), tensor(0.7969), tensor(1.)]I hope you understand how cumulative sum works from the code. We just add up n previous prob values. We can also use torch.cumsum and get the same result:In [9]: torch.cumsum(probs, dim=0) Out[9]: tensor([0.0652, 0.0983, 0.1592, 0.2028, 0.4394, 0.6131, 0.6782, 0.7474, 0.7969, 1.0000]) Okay. Here’s a nucleus sampling from scratch: In [1]: import torch In [2]: logit = torch.randn(10) In [3]: probs = logit.softmax(0) In [4]: probs Out[4]: tensor([0.7492, 0.0100, 0.0332, 0.0078, 0.0191, 0.0370, 0.0444, 0.0553, 0.0135, 0.0305]) In [5]: sprobs, indices = torch.sort(probs, dim=0, descending=True) In [6]: sprobs Out[6]: tensor([0.7492, 0.0553, 0.0444, 0.0370, 0.0332, 0.0305, 0.0191, 0.0135, 0.0100, 0.0078]) In [7]: cs_probs = torch.cumsum(sprobs, dim=0) In [8]: cs_probs Out[8]: tensor([0.7492, 0.8045, 0.8489, 0.8860, 0.9192, 0.9497, 0.9687, 0.9822, 0.9922, 1.0000]) In [9]: selected_tokens = cs_probs < 0.9 In [10]: selected_tokens Out[10]: tensor([ True, True, True, True, False, False, False, False, False, False]) In [11]: probs[indices[selected_tokens]] Out[11]: tensor([0.7492, 0.0553, 0.0444, 0.0370]) In [12]: probs = probs[indices[selected_tokens]] In [13]: torch.multinomial(probs, num_samples=1) Out[13]: tensor([0])●    Convert the logit to probabilities and sort it with descending order so that we can select them from top to bottom.●    Calculate cumulative sum.●    Using advanced indexing, we filter out values.●    Then, we sample from a limited and better space.Please note that you can use a combination of top-k and nucleus samplings. It is like selecting k tokens and doing nucleus sampling on these k tokens. You can also use top-k, nucleus, and beam search.ConclusionUnderstanding these methods is crucial for anyone working with language models, natural language processing, or text generation tasks. These techniques play a significant role in generating coherent and diverse sequences of text. Depending on the specific use case and desired outcomes, readers can choose the most appropriate method to employ. Overall, this knowledge can contribute to improving the quality of generated text and enhancing the capabilities of language models.Author BioSaeed Dehqan trains language models from scratch. Currently, his work is centered around Language Models for text generation, and he possesses a strong understanding of the underlying concepts of neural networks. He is proficient in using optimizers such as genetic algorithms to fine-tune network hyperparameters and has experience with neural architecture search (NAS) by using reinforcement learning (RL). He implements models starting from data gathering to monitoring, and deployment on mobile, web, cloud, etc. 
Read more
  • 0
  • 0
  • 3966
article-image-ai-distilled-38-latest-in-ai-sora-gemini-15-and-more
Merlyn Shelley
01 Mar 2024
9 min read
Save for later

AI_Distilled 38: Latest in AI: Sora, Gemini 1.5, and More

Merlyn Shelley
01 Mar 2024
9 min read
Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!👋 Hello,“People say AI is overhyped, but I think it's not hyped enough. The next generation who will use this in the next few years will have a much higher bar on what technology can do for them. So how you build it for that generation, how you build it for that future will be really interesting to see.”-Puneet Chandok, Microsoft India and South Asia presidentSpeaking at a panel discussion on AI at the Mumbai Tech Week, Chandok believes AI is not hyped enough considering its potential for disruptive transformation. He encourages more training on AI to realize its full potential.Welcome back to a new issue of AI Distilled - your one-stop destination for all things AI, ML, NLP, and Gen AI. Let’s get started with the latest news and developments across the AI sector:OpenAI unveils Sora, an AI model generating videos from textGoogle's latest conversational AI model Gemini 1.5 has a million-token context windowNew AI news reader app tackles clickbait headlines, provides summariesSlack is rolling out new AI features for enterprise users including thread summariesLangChain announced raising $25 million to launch new platform for building LLM appsAI helps improve medical imaging to benefit patients globallyResearchers develop AI model that determines a person's sex from brain scansWe’ve also curated the latest GPT and LLM resources, tutorials, and secret knowledge:Giving AI Models a Better Memory: How Google DeepMind Expanded Context WindowsAdvanced Techniques For More Relevant AI ResponsesReinforcement Learning ExplainedBridging the Gap Between AI and App DevelopmentFinally, don’t forget to check-out our hands-on tips and strategies from the AI community for you to use on your own projects:Creating Custom Models Without the Hassle of Data CollectionCode Your Own AI Coding BuddyEvaluating Code Quality with AI AssistantsEasily Deploy Language Models LocallyLooking for some inspiration? Here are some GitHub repositories to get your projects going!gptscript-ai/gptscriptkarpathy/minbpeAAAI-DISIM-UnivAQ/DALIQwenLM/QwenWriter’s Credit: Special shout-out to Vidhu Jain for her valuable contribution to this week’s issue.Cheers,  Kartikey Pandey  Editor-in-Chief, Packt  ⚡ TechWave: AI/GPT News & AnalysisOpenAI unveiled Sora, an AI model generating videos from text at up to a minute in length. Sora demonstrates an understanding of language and the physical world and photorealism across styles, though human subjects appear game-like.Google's latest conversational AI model Gemini 1.5 analyzes more information than before, thanks to a million-token context window. This allows for summarizing the Apollo 11 mission transcript or analyzing a 44-minute silent film in full. Early results show the system maintains performance as context grows into the millions.Bulletin, a new AI-powered news reader app, tackles clickbait headlines and provides summaries of news articles with customizable news sources.Slack is rolling out new AI features for enterprise users including thread summaries, channel recaps, and answering workplace questions. The tools provide highlights from missed messages and help catch up.LangChain announced raising $25 million to launch their new platform LangSmith for building and monitoring LLM apps. LangSmith allows developers to accelerate workflows across development, testing, deployment, and monitoring. It has already seen significant adoption with over 70,000 signups and 5000 monthly active companies.Courtesy: Bulletin/Shihab MehboobAI is helping improve medical imaging to benefit patients globally. ML can quickly analyze large datasets to find issues doctors may miss and flag urgent cases. Cloud solutions also enable sharing scans and remote expert assistance anywhere. Companies are applying these methods to speed diagnoses, reduce wait times, and bring ultrasounds directly to homes. Researchers have also developed an AI model that can determine a person's sex from brain scans with over 90% accuracy. The model analyzed dynamic MRI scans and identified the default mode, striatum, and limbic networks as key in distinguishing male and female brains. This breakthrough furthers our understanding of brain organization and could help address sex-specific health issues. 🔮 Expert Insights from Packt Community Generative AI with LangChain - By Dr. Ben AuffarthChatGPT and the GPT models by OpenAI have brought about a revolution not only in how we write and research but also in how we can process information.This book discusses the functioning, capabilities, and limitations of LLMs underlying chat systems, including ChatGPT and Bard. It also demonstrates, in a series of practical examples, how to use the LangChain framework to build production-ready and responsive LLM applications for tasks ranging from customer support to software development assistance and data analysis Key TakeawaysExplore the expansive utility of LLMs in real-world applications.Guidance on fine-tuning, prompt engineering, and best practices.Learn how to use the LangChain framework to build production-ready LLM applications.By the end of this book, you'll be equipped with the practical knowledge and skills to leverage the transformative power of generative AI with confidence and creativity.Read More🌟 Secret Knowledge: AI/LLM Resources🌀 Giving AI Models a Better Memory: How Google DeepMind Expanded Context Windows: Google DeepMind's latest AI model Gemini 1.5 has significantly improved how much information it can process at once, thanks to advances in "long context windows." The team discovered their model could understand over 1 million pieces of information in a single sitting, far surpassing earlier limits. This opens up new possibilities for tasks like summarizing lengthy documents, analyzing large codebases, and even comprehending full movies. Developers are excited to explore creative uses of this expanded recall.🌀 Advanced Techniques For More Relevant AI Responses: This article discusses how to improve AI conversation models like RAG by enhancing how information is stored, found and used. Methods covered include indexing sentences individually while keeping their surrounding context, combining keyword search with semantic search, and re-scoring results based on the question. The author demonstrates implementing these "advanced RAG" techniques in Python using tools like LlamaIndex and Weaviate. With these optimizations, AI systems can provide more helpful responses by accessing knowledge in a targeted manner.🌀 Reinforcement Learning Explained: This article breaks down the key concepts of reinforcement learning in an easy-to-understand way. It covers states, actions, rewards, and how agents interact with environments to learn policies. RL agents try different strategies to maximize long-term rewards through trial and error. Episodes provide a framework to evaluate policies. Deterministic policies pick set actions while stochastic policies use probabilities. Whether you're new to RL or a veteran, this primer is worth a read to get acquainted with the basics.🌀 Bridging the Gap Between AI and App Development: As AI becomes more advanced, developers need easier ways to integrate cutting-edge features into their work. However, directly using AI code frameworks can be challenging and limit scalability. The solution? AI gateways. By handling tasks like routing, caching, and monitoring behind the scenes, gateways act as a bridge between complex AI systems and traditional development workflows. They streamline the integration process while ensuring high performance. Are gateways the future of intelligent applications?Partnering with Notion Ever tried Notion? It's a workspace that helps you do things better and faster.You get AI for notes and teamwork, easy drag-and-drop for content, and cool new features to help manage projects and share knowledge.Give it a try!🔛 Masterclass: AI/LLM Tutorials🌀 Creating Custom Models Without the Hassle of Data Collection: Tired of spending big bucks to use proprietary AI APIs or going through the tedious process of collecting your training data? This page shows how you can train customized models more efficiently. By using an open-source LLM to generate synthetic annotations for a small sample of your data, you can then fine-tune a smaller model tailored exactly to your needs. The process takes just a few steps and allows you to analyze large datasets for a fraction of the cost. Best of all, you avoid sending sensitive data to third parties.🌀 Code Your Own AI Coding Buddy: This guide shows you how to build an AI assistant that lives right on your computer. Using tools like HuggingFace and Streamlit, you can create a chatbot trained on Code Llama. Simply ask it questions and it will respond with examples in languages like Python, Java, and C++. Better yet, the models are free and open-source. This is a neural net sidekick to help automate repetitive tasks and speed up your workflow.🌀 Evaluating Code Quality with AI Assistants: This article explores using AI to improve code quality by testing Python scripts with SonarQube and getting feedback from LLMs. The author ran tests on ChatGPT and open-source models like Code Llama to see if they could identify issues flagged by SonarQube. While the models struggled to pinpoint errors solely from descriptions, some provided insightful summaries. Continued development of coding-focused LLMs may help automate part of the review process.🌀 Easily Deploy Language Models Locally: With a simple four-step process, you can get powerful language models like ChatGPT running on your hardware. First, choose a model from HuggingFace and quantize it for faster performance. Then build an Ollama image to serve the model. For a slick interface, deploy a ChatGPT-style React app talking to Ollama via Docker. The whole setup only takes around 15 minutes. Now you've got a custom language assistant without internet dependence.🚀 HackHub: Trending AI Tools🌀 gptscript-ai/gptscript: Open source NLP tool that allows developers to automate tasks by writing scripts in plain English.🌀 karpathy/minbpe: Minimal and clean Python code for the byte pair encoding algorithm commonly used in NLP and language model tokenization.🌀 AAAI-DISIM-UnivAQ/DALI: Framework allowing developers to build multi-agent systems in Prolog for applications like robotics, event processing, and more.🌀 QwenLM/Qwen: Open source code, models, and documentation for the Qwen series of LLMs, including Qwen, Qwen-Chat, and their various sizes.
Read more
  • 0
  • 0
  • 3921

article-image-ai-distilled-39-unpacking-mistral-large-googles-gemini-challenges-and-copilot-enterprise
Kartikey Pandey
21 Mar 2024
9 min read
Save for later

AI_Distilled #39: Unpacking Mistral Large, Google's Gemini Challenges, and Copilot Enterprise

Kartikey Pandey
21 Mar 2024
9 min read
Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!Print to Pixel: Optimize your learning experience with PacktSeveral research studies have proven that printed books enhance comprehension, with the tactile experience of flipping pages and annotating the margins adding depth to the learning experience. However, developers can't overlook the practical benefits of eBooks, such as quickly finding relevant information or carrying an entire library on a single device.Acknowledging the unique benefits of both formats, Packt is offering a 40% discount on all print books, plus a free eBook version of each purchase, from February 26th to February 29th.Here’s what’s included:A Vast Library: Enjoy 40% off on over 5,000 titles spanning topics from Cybersecurity to Generative AI.Complimentary eBook: Each print book purchase includes a free eBook.AI Assistant: Top 500 books come with a personalized AI that can simply complex topics to your learning style, offering an interactive learning experience.Start Building Your Tech Library Today!👋 Hello,“No Al is perfect, especially at this emerging stage of the industry’s development, but we know the bar is high for us and we will keep at it for however long it takes.”-Sundar Pichai, Google CEOPichai acknowledges problems with Gemini AI, stressing the importance of unbiased information for users, and outlining steps to address issues and improve products. A rapidly progressing industry, AI development is a tricky game to master, with numerous pitfalls along the way.Greetings readers! Our mission is to help you stay on top of the ever-changing AI landscape so you can advance your skills. Let’s get started with the latest news and developments across the AI field:Microsoft provides new LLM Mistral Large on Azure with Mistral AIGoogle accepts some responses from their Gemini were unacceptable and biasedGitHub has launched Copilot Enterprise coding assistant integrating throughout the software development processResearchers developed new optimized language models called MobileLLM for mobile devices with under a billion parametersResearchers at Microsoft have developed new techniques to improve visual language modelsWe’ve also got you your fresh dose of GPT and LLM secret knowledge and tutorials:Mastering the Art of Prompt CraftingBreaking Down How Large Language Models LearnUsing AI to Level Up Live GamesMonitoring Large Language Models on AWSLast but not least, don’t miss out on the hands-on strategies and tips straight from the AI community for you to use on your own projects:Fine-Tuning Models for Speech Recognition Made SimpleMake Conversation Come Alive - Deploying Your Own AI Chat PartnerCombining Geospatial and Semantic Data to Build Powerful Search ToolsLeveraging Notion, Supabase and AI for Knowledge RetrievalWriter’s Credit: Special shout-out to Vidhu Jain for her valuable contribution to this week’s issue.Cheers,  Kartikey Pandey  Editor-in-Chief, Packt  Unleash Your Data Potential with Packt's Latest Titles and Platform Enhancements! In a world that's always changing, learning is key to success. At Packt, we've updated our learning platform to help you stay ahead in the fast-moving tech world. Our platform makes learning easier and more effective, helping you overcome challenges and achieve your goals. Boost Your Data Skills with Packt's DataPro Library: On-Demand Learning: Access a wide range of books, video courses, research papers, and articles to help you grow. AI Assistance: Get help from AI to understand complex concepts easily, all within the same learning environment.Personalized Dashboard: Enjoy a tailored learning experience with recommendations and insights just for you. Advanced Self-Assessment: Use the latest tools to identify what you need to learn and track your progress accurately. Vibrant Community: Join a community of data and AI enthusiasts on Discord for collaboration and knowledge sharing. Exclusive Access: Be part of the DataPro beta program for a chance to win Amazon gift cards and early access to new features. Value for Money: Get all these benefits for just $7.99 per month, a small investment for big gains in your careerEnhance Your Data Skills Today⚡ TechWave: AI/GPT News & AnalysisMicrosoft has partnered with Mistral AI to provide their new LLM Mistral Large on Azure cloud services. This state-of-the-art AI model offers advanced NLP capabilities. Several companies have praised Mistral Large's performance in increasing productivity and aiding innovation.Google's CEO recently said some responses from their AI model Gemini were unacceptable and biased. The company has been working to address these issues and sees improvements but will review what happened. They plan to relaunch Gemini in the coming weeks after fixing it.GitHub has launched Copilot Enterprise, an AI coding assistant that integrates throughout the software development process. It provides customized code suggestions based on an organization's codebase, answers questions about internal systems, and generates summaries of code changes. Early testing found massive productivity gains from such AI tools.Researchers have developed new optimized language models for mobile devices with under a billion parameters. Called MobileLLM, the models achieve higher accuracy than previous smaller models through innovative architecture and weight-sharing techniques. MobileLLM shows significant gains on conversation tasks and competes with much larger models for common on-device uses.Researchers at Microsoft have developed new techniques to improve visual language models using structured knowledge graphs. By incorporating relationship maps between image elements like objects and attributes, models can generate richer images from text descriptions. Hierarchical prompting and dual-path encoding methods were also introduced to help models better understand complex language.🌟 Secret Knowledge: AI/LLM Resources🌀 Mastering the Art of Prompt Crafting: Got a new NLP project that needs prompting? This guide covers the basics of effective prompt engineering for AI models like ChatGPT. Learn how clarity, conciseness, and context can improve responses. Also explore techniques like zero-shot learning and dynamic few shots, plus how temperature, top-p, and other settings can refine your model's "personality". From system messages to tailoring examples, these tips will help you leverage your LLMs' full potential.🌀 Breaking Down How Large Language Models Learn: This article provides a helpful breakdown of how LLMs are trained through causal language modeling and calculates loss. It visually explains how models generate text sequences, are pre-trained to predict the next token, and how cross-entropy loss compares predictions to true labels to update weights. The process is demonstrated through code showing how loss is manually calculated for an LLM matching the framework's automatic calculation. This gives developers valuable insights into how state-of-the-art models learn.🌀 Using AI to Level Up Live Games: This article discusses how generative AI can enhance live service games. Techniques like adaptive gameplay, personalized ads, and faster asset creation are described. The authors provide a framework for developing games using tools like Unity, GKE, and Vertex AI. They demonstrate how ML models can dynamically generate images, code and dialogue to customize the player experience. Whether deploying models on GKE or Vertex, cloud-based AI brings the benefits of lower costs and easier maintenance than self-hosted options. 🌀 Monitoring Large Language Models on AWS: As AI language models grow more advanced, ensuring they behave properly becomes more important. This article discusses techniques for monitoring LLMs deployed on AWS. Key metrics covered include semantic similarity of responses, sentiment analysis, refusal rates, and more. The proposed architecture takes in model outputs, runs metrics modules, and reports results to CloudWatch for aggregation and alerts. With the right monitoring in place, you can help keep your conversational AI acting as intended.🔛 Masterclass: AI/LLM Tutorials🌀 Fine-Tuning Models for Speech Recognition Made Simple: This article discusses how to fine-tune LLMs for automatic speech recognition tasks using Amazon SageMaker. It explains language models and ASR as well as the basic steps for fine-tuning a pre-trained model which includes preparing data, choosing a model, training, evaluating, and deploying. SageMaker is highlighted as a powerful yet easy-to-use platform for this process due to its scalability, integration with AWS services, and pay-as-you-go pricing.🌀 Make Conversation Come Alive - Deploying Your Own AI Chat Partner: Tired of boring chatbots? This guide shows you how to bring the amazing Qwen AI model to your own server so you can have engaging discussions on any topic. The steps cover setting up your environment, installing dependencies, initializing the tokenizer and model, and using history to keep conversations flowing naturally. Once complete, you'll have a powerful AI assistant right at your fingertips. Best of all, it's completely open source.🌀 Combining Geospatial and Semantic Data to Build Powerful Search Tools: This guide shows developers how to create an interactive campground search map using vector databases, NLP models, and geospatial data. Technologies like Qdrant, Llama2, and Streamlit allow embedding text and locations to enable semantic queries. The page explains setting up Qdrant cloud, loading campground CSV data, and parsing text into nodes. Developers can then embed nodes with HuggingFace and query the vector store to retrieve similar results. By leveraging tools that understand both spatial and semantic context, you can build customized applications to help users explore outdoor destinations.🌀 Leveraging Notion, Supabase, and AI for Knowledge Retrieval: This tutorial shows how you can build a knowledge base by extracting data from Notion databases and storing it in a vector format in Supabase. It then demonstrates retrieving relevant information from the knowledge base using an AI model from OpenAI. By combining these tools, developers can query custom datasets and generate responses based on retrieved documents. The process involves loading Notion documents, storing embeddings in Supabase, and setting up a retrieval pipeline. With some enhancements, this could be a powerful way to access organizational information.🚀 HackHub: Trending AI Tools🌀 lucky-lance/expert_sparsity: Implements efficient expert pruning and dynamic skipping techniques for mixture-of-experts large language models to improve their efficiency and speed while maintaining strong performance.🌀 facebookresearch/pearl: This open-source library provides a modular reinforcement learning framework for building and training production-ready AI agents, empowering developers with state-of-the-art techniques.🌀 zhen-tan-dmml/llm4annotation: Curates papers on using LLMs for data annotation, which developers could reference to apply these techniques or learn about the current state of the art.🌀 google/gemma.cpp: Provides a lightweight C++ library for running Google's Gemma models that developers can easily integrate into their own projects for experimenting with and deploying LLMs.
Read more
  • 0
  • 0
  • 3859