LLM | Tech News, Tutorials & Expert Insights

article-image-question-answering-in-langchain

10 Oct 2023

8 min read

Question Answering in LangChain

10 Oct 2023

Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!IntroductionImagine seamlessly processing vast amounts of data, posing any question, and receiving eloquently crafted answers in return. While Large Language Models like ChatGPT excel with general data, they falter when it comes to your private information—data you'd rather not broadcast to the world. Enter LangChain: it empowers us to harness any NLP model, refining it with our exclusive data.In this article, we'll explore LangChain, a framework designed for building applications with language models. We'll guide you through training a model, specifically the OpenAI Chat GPT, using your selected private data. While we provide a structured tutorial, feel free to adapt the steps based on your dataset and model preferences, as variations are expected and encouraged. Additionally, we'll offer various feature alternatives that you can incorporate throughout the tutorial.The Need for Privacy in Question AnsweringUndoubtedly, the confidentiality of personalized data is of absolute importance. While companies amass vast amounts of data daily, which offers them invaluable insights, it's crucial to safeguard this information. Disclosing such proprietary information to external entities could jeopardize the company's competitive edge and overall business integrity.How Does the Fine-Tuning LangChain Process WorkStep 1: Identifying The Appropriate Data SourceBefore selecting the right dataset, it's essential to ask a few preliminary questions. What specific topic are you aiming to inquire about? Is the data set sufficient? And such.Step 2: Integrating The Data With LangChainBased on the dataset's file format you have, you'll need to adopt different methods to effectively import the data into LangChain.Step 3: Splitting The Data Into ChunksTo ensure efficient data processing, it's crucial to divide the dataset into smaller segments, often referred to as chunks.Step 4: Transforming The Data Into EmbeddingsEmbedding is a technique where words or phrases from the vocabulary are mapped to vectors of real numbers. The idea behind embeddings is to capture the semantic meaning and relationships of words in a lower-dimensional space than the original representation.Step 5: Asking Queries To Our ModelFinally, after training our model on the updated documentation, we can directly query it for any information we require.Full LangChain ProcessDataSet UsedLangChain's versatility stems from its ability to process varied datasets. For our demonstration, we utilize the "Giskard Documentation", a comprehensive guide on the Giskard framework.Giskard is an open-source testing framework for Machine Learning models, spanning various Python model types. It automatically detects vulnerabilities in ML models, generates domain-specific tests, and integrates open-source QA best practices.Having said that, LangChain can seamlessly integrate with a myriad of other data sources, be they textual, tabular, or even multimedia, expanding its use-case horizons.Setting Up and Using LangChain for Private Question AnsweringStep 1: Installing The Necessary LibrariesAllows the first step of building any machine learning model, we will have to set up our environment, making sure!pip install langchain !pip install openai !pip install pypdf !pip install tiktoken !pip install faiss-gpuStep 2: Importing Necessary Librariesfrom langchain.llms import OpenAI from langchain.chat_models import ChatOpenAI from langchain.document_loaders import PyPDFLoader from langchain.vectorstores import FAISS from langchain.embeddings.openai import OpenAIEmbeddings from langchain.chains import LLMChain from langchain.prompts import PromptTemplate from langchain.chains.qa_with_sources import load_qa_with_sources_chain import openai import osStep 3: Importing OpenAI API Keyos.environ['OPENAI_API_KEY'] = "Insert your OpenAI key here"Step 4: Loading Our Data SetLandChain offers the capability to load data in various formats. In this article, we'll focus on loading data in PDF format but will also touch upon other popular formats such as CSV and File Directory. For details on other file formats, please refer to the LangChain Documentation.Loading PDF DataWe've compiled the Giskard AI tool's documentation into a PDF and subsequently partitioned the data.loader = PyPDFLoader("/kaggle/input/giskard-documentation/Giskard Documentation.pdf")pages = loader.load_and_split()Below are the code snippets if you prefer to work with either CSV or File Directory file formats.Loading CSV Datafrom langchain.document_loaders.csv_loader import CSVLoader loader = CSVLoader(“Insert the path to your CSV dataset here”) data = loader.load()Loading File Directory Datafrom langchain.document_loaders import DirectoryLoader loader = DirectoryLoader('../', glob="**/*.md") docs = loader.load()Step 5: Indexing The DatasetWe will be creating an index using FAISS (Facebook AI Similarity Search), which is a library developed by Facebook AI for efficiently searching similarities in large datasets, especially used with vectors from machine learning models.We will be converting those documents into vector embeddings using OpenAIEmbeddings(). This indexed data can then be used for efficient similarity searches later on.faiss_index = FAISS.from_documents(pages, OpenAIEmbeddings())Here are some alternative indexing options you might consider.Indexing using Pineconeimport osimport pineconefrom langchain.schema import Documentfrom langchain.embeddings.openai import OpenAIEmbeddingsfrom langchain.vectorstores import Pinecone pinecone.init( api_key=os.environ["PINECONE_API_KEY"], environment=os.environ["PINECONE_ENV"]) embeddings = OpenAIEmbeddings() pinecone.create_index("langchain-self-retriever-demo", dimension=1536)Indexing using Chromaimport os import getpass from langchain.schema import Document from langchain.embeddings.openai import OpenAIEmbeddings from langchain.vectorstores import Chroma os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:") embeddings = OpenAIEmbeddings() vectorstore = Chroma.from_documents(docs, embeddings)Step 6: Asking The Model Some QuestionsThere are multiple methods by which we can retrieve our data from our model.Similarity SearchIn the context of large language models (LLMs) and natural language processing, similarity search is often about finding sentences, paragraphs, or documents that are semantically similar to a given sentence or piece of text.query = "What is Giskard?" docs = faiss_index.similarity_search(query) print(docs[0].page_content)Similarity Search Output: Why Giskard?Giskard is an open-source testing framework dedicated to ML models, covering any Python model, from tabular to LLMs.Testing Machine Learning applications can be tedious. Since ML models depend on data, testing scenarios depend on the domain speciﬁcities and are often inﬁnite. Where to start testing? Which tests to implement? What issues to cover? How to implement the tests?At Giskard, we believe that Machine Learning needs its own testing framework. Created by ML engineers for ML engineers, Giskard enables you to:Scan your model to ﬁnd dozens of hidden vulnerabilities: The Giskard scan automatically detects vulnerability issues such as performance bias, data leakage, unrobustness, spurious correlation, overconﬁdence, underconﬁdence, unethical issue, etc. Instantaneously generate domain-speciﬁc tests: Giskard automatically generates relevant tests based on the vulnerabilities detected by the scan. You can easily customize the tests depending on your use case by deﬁning domain-speciﬁc data slicers and transformers as ﬁxtures of your test suites.Leverage the Quality Assurance best practices of the open-source community: The Giskard catalog enables you to easily contribute and load data slicing & transformation functions such as AI-based detectors (toxicity, hate, etc.), generators (typos, paraphraser, etc.), or evaluators. Inspired by the Hugging Face philosophy, the aim of Giskard is to become.LLM Chainsmodel = OpenAI(model_name="gpt-3.5-turbo") my_chain = load_qa_with_sources_chain(model, chain_type="refine") query = "What is Giskard?" documents = faiss_index.similarity_search(query) result = my_chain({"input_documents": pages, "question": query})LLM Chain Output:Based on the additional context provided, Giskard is a Python package or library that provides tools for wrapping machine learning models, testing, debugging, and inspection. It supports models from various machine learning libraries such as HuggingFace, PyTorch, TensorFlow, or Scikit-learn. Giskard can handle classification, regression, and text generation tasks using tabular or text data.One notable feature of Giskard is the ability to upload models to the Giskard server. Uploading models to the server allows users to compare their models with others using a test suite, gather feedback from colleagues, debug models effectively in case of test failures, and develop new tests incorporating additional domain knowledge. This feature enables collaborative model evaluation and improvement. It is worth highlighting that the provided context mentions additional ML libraries, including Langchain, API REST, and LightGBM, but their specific integration with Giskard is not clearly defined.Sources:Giskard Documentation.pdfAPI Reference (for Dataset methods)Kaggle: /kaggle/input/giskard-documentation/Giskard Documentation.pdfConclusionLangChain effectively bridges the gap between advanced language models and the need for data privacy. Throughout this article, we have highlighted its capability to train models on private data, ensuring both insightful results and data security. One thing is for sure though, as AI continues to grow, tools like LangChain will be essential for balancing innovation with user trust.Author BioMostafa Ibrahim is a dedicated software engineer based in London, where he works in the dynamic field of Fintech. His professional journey is driven by a passion for cutting-edge technologies, particularly in the realms of machine learning and bioinformatics. When he's not immersed in coding or data analysis, Mostafa loves to travel.Medium

0
0
14907

article-image-build-an-ai-based-personal-financial-advisor-with-langchain

Louis Owen

09 Oct 2023

11 min read

Build an AI-based Personal Financial Advisor with LangChain

Louis Owen

09 Oct 2023

11 min read

0
0
13447

article-image-kickstarting-your-journey-with-azure-openai

Shankar Narayanan

06 Oct 2023

8 min read

Kickstarting your journey with Azure OpenAI

Shankar Narayanan

06 Oct 2023

8 min read

0
0
9662

article-image-build-a-language-converter-using-google-bard

Aryan Irani

06 Oct 2023

7 min read

Build a Language Converter using Google Bard

Aryan Irani

06 Oct 2023

7 min read

Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!IntroductionIn this blog, we will be taking a look at building a language converter inside of a Google Sheet using Google Bard. We are going to achieve this using the PaLM API and Google Apps Script.We are going to use a custom function inside which we pass the origin language of the sentence, followed by the target language and the sentence you want to convert. In return, you will get the converted sentence using Google Bard.Sample Google SheetFor this blog, I will be using a very simple Google Sheet that contains the following columns:Sentence that has to be convertedOrigin language of the sentenceTarget Language of the sentenceConverted SentenceIf you want to work with the Google Sheet, click here. Once you make a copy of the Google Sheet you have to go ahead and change the API key in the Google Apps Script code.Step 1: Get the API keyCurrently, PaLM API hasn’t been released for public use but to access it before everybody does, you can apply for the waitlist by clicking here. If you want to know more about the process of applying for MakerSuite and PaLM API, you can check the YouTube tutorial below.Once you have access, to get the API key, we have to go to MakerSuite and go to the Get API key section. To get the API key, follow these steps:Go to MakerSuite or click here.On opening the MakerSuite you will see something like this3. To get the API key go ahead and click on Get API key on the left side of the page.4. On clicking the Get API key, you will see something like this where you can create your API key.5. To create the API key go ahead and click on Create API key in the new project.On clicking Create API Key, in a few seconds, you will be able to copy the API key.Step 2: Write the Automation ScriptWhile you are in the Google Sheet, let’s open up the Script Editor to write some Google Apps Script. To open the Script Editor, follow these steps:1. Click on Extensions and open the Script Editor.2. This brings up the Script Editor as shown below.We have reached the script editor lets code.Now that we have the Google Sheet and the API key ready, lets go ahead and write the Google Apps Script to integrate the custom function inside the Google Sheet.function BARD(sentence,origin_language,target_lanugage) { var apiKey = "your_api_key"; var apiUrl = "https://generativelanguage.googleapis.com/v1beta2/models/text-bison-001:generateText"; We start out by opening a new function BARD() inside which we will declare the API key that we just copied. After declaring the API key we go ahead and declare the API endpoint that is provided in the PaLM API documentation. You can check out the documentation by checking out the link given below.We are going to be receiving the prompt from the Google Sheet from the BARD function that we just created.Generative Language API | PaLM API | Generative AI for DevelopersThe PaLM API allows developers to build generative AI applications using the PaLM model. Large Language Models (LLMs)…developers.generativeai.googlevar url = apiUrl + "?key=" + apiKey; var headers = { "Content-Type": "application/json" };Here we create a new variable called url inside which we combine the API URL and the API key, resulting in a complete URL that includes the API key as a parameter. The headers specify the type of data that will be sent in the request which in this case is “application/json”.var prompt = { 'text': "Convert this sentence"+ sentence + "from"+origin_language + "to"+target_lanugage } var requestBody = { "prompt": prompt }Now we come to the most important part of the code which is declaring the prompt. For this blog, we will be designing the prompt in such a way that we get back only the converted sentence. This prompt will accept the variables from the Google Sheet and in return will give the converted sentence.Now that we have the prompt ready, we go ahead and create an object that will contain this prompt that will be sent in the request to the API. var options = { "method": "POST", "headers": headers, "payload": JSON.stringify(requestBody) };Now that we have everything ready, it's time to define the parameters for the HTTP request that will be sent to the PaLM API endpoint. We start out by declaring the method parameter which is set to POST which indicates that the request will be sending data to the API.The headers parameter contains the header object that we declared a while back. Finally, the payload parameter is used to specify the data that will be sent in the request.These options are now passed as an argument to the UrlFetchApp.fetch function which sends the request to the PaLM API endpoint, and returns the response that contains the AI generated text. var response = UrlFetchApp.fetch(url, options); var data = JSON.parse(response.getContentText()); var output = data.candidates[0].output; Logger.log(output); return output;In this case, we just have to pass the url and options variable inside the UrlFetchApp.fetch function. Now that we have sent a request to the PaLM API endpoint we get a response back. In order to get an exact response we are going to be parsing the data.The getContentText() function is used to extract the text content from the response object. Since the response is in JSON format, we use the JSON.parse function to convert the JSON string into an object.The parsed data is then passed to the final variable output, inside which we get the first response out of multiple other drafts that Bard generates for us. On getting the first response, we return the output back to the Google Sheet.Our code is complete and good to go.Step 3: Check the outputIt's time to check the output and see if the code is working according to what we expected. To do that go ahead and save your code and run the BARD() function.On running the code, let's go back to the Google Sheet, use the custom function, and pass the prompt inside it.Here I have passed the original sentence, followed by the Origin Language and the Target Language.On successful execution, we can see that Google Bard has successfully converted the sentences using the PaLM API and Google Apps Script.ConclusionThis is just another interesting example of how we can Integrate Google Bard into Google Workspace using Google Apps Script and the PaLM API. I hope you have understood how to use the PaLM API and Google Apps Script to create a custom function that acts as a Language converter. You can get the code from the GitHub link below.Google-Apps-Script/Bard_Lang.js at master · aryanirani123/Google-Apps-ScriptCollection of Google Apps Script Automation scripts written and compiled by Aryan Irani. …github.comFeel free to reach out if you have any issues/feedback at aryanirani123@gmail.com.Author BioAryan Irani is a Google Developer Expert for Google Workspace. He is a writer and content creator who has been working in the Google Workspace domain for three years. He has extensive experience in the area, having published 100 technical articles on Google Apps Script, Google Workspace Tools, and Google APIs.

0
0
6063

article-image-google-bard-everything-you-need-to-know-to-get-started

Sangita Mahala

05 Oct 2023

6 min read

Google Bard: Everything You Need to Know to Get Started

Sangita Mahala

05 Oct 2023

6 min read

Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!IntroductionGoogle Bard, a generative AI conversational chatbot. Initially from the LaMDA family of large language models and later the PaLM LLMs. It will follow your instructions and complete your requests in a better way. Bard uses its knowledge to answer your questions in an informative manner. It can generate different creative text formats, like poems, code, scripts, emails, letters, etc. Currently, it is available in 238 countries and 46 languages.Bard is a powerful tool which can be used for many different things, including:Writing and editing contentResearching and learning new thingsTranslating languagesGenerating new creative ideasAnswering questionsWhat is Google Bard and How does it work?Bard is a large language model, commonly referred to as a chatbot or conversational AI, that has been programmed to be extensive and informative. It is able to communicate and generate human-like text in response to a wide range of prompts and questions.Bard operates by using a method known as deep learning. Artificial neural networks are used in deep learning to learn from data. Deep learning is a subset of machine learning. The structure and operation of the human brain served as the inspiration for neural networks, which are capable of learning intricate patterns from data.The following illustrates how Bard works:You enter a command or query into Bard.The input or query is processed by Bard's neural network, which then produces a response.The reply from Bard is then shown to you.How to get started with Bard?It’s pretty simple and easy to get started using Google Bard. The following steps will let you know how you will be able to start your journey in Google Bard.Step-1Go to the Google Bard website by clicking this link: https://bard.google.com/Step-2Go to the top right corner of your screen and then click on the “Sign in” button.Step-3Once you are signed in, you can start typing your prompts or queries into the text box at the bottom of the screen.Step-4Your prompt or query will trigger Bard to produce an answer, which you may then read and evaluate.For Example:You can provide the prompt to Google Bard such as “Write a 'Hello, World!' program in the Rust programming language.”Prompt:Common Bard commands and how to use themGoogle Bard does not have any specific commands, but you can use certain keywords and phrases to get Bard to perform certain tasks. For example, you can use the following keywords and phrases to get Bard to:Create many creative text forms, such as "Write a script for...", "Write a poem about...", and "Write a code for...".Bard will perfectly answer your questions in a comprehensive and informative way: "What is the capital of India?", "How do I build a website?", "What are the benefits of using Bard?" Examples of what Bard can do and how to use it for specific tasksHere are some examples of what Bard can do and how to use it for specific tasks:Generate creative text formats: Bard allows you to generate a variety of unique text formats, including code, poems, emails, and letters. You have to simply input the required format, followed by a question or query, to get the things done. For example, to generate an email to your manager requesting a raise in salary, you would type “Write an email to your manager asking for a raise."Prompt:Answer your questions in a comprehensive and informative way: No matter how difficult or complex your query might be, Bard can help you to find the solution. So you have to simply enter your query into the text box, and Bard will generate a response likewise. For example, to ask Bard what is the National Flower of India is, you would type "What is the National Flower of India?".Prompt: Translate languages: Bard will allow to convert text between different languages. To do this, simply type the text that you want to translate into the text box, followed by the provided language from your side. For example, to translate the sentence: I am going to the store to buy some groceries into Hindi, you would type "I am going to the store to buy some groceries to Hindi".Prompt:How Bard Can Be Used for Different PurposesA writer can use Bard to generate new concepts for fresh stories or articles or to summarize research results.Students can utilize Bard to create essays and reports, get assistance with their assignments, and master new topics.A business owner can use Bard to produce marketing materials, develop chatbots for customer support, or perform data analysis.Software developers can use Bard to produce code, debug applications, or find solutions for technical issues.The future of Google BardGoogle Bard is probably going to get increasingly more capable and adaptable as it keeps developing and acquiring knowledge. It might be utilized to produce new works of art and entertainment, advance scientific research, and provide solutions to some of the most critical global problems.It's also crucial to remember that Google Bard is not the only significant language model being developed. Similar technologies are being created as well by various other companies, such as Microsoft and OpenAI. This competition is likely to drive innovation and lead to even more powerful and sophisticated language models in the future.Overall, the future of Google Bard and other substantial language models seems quite bright overall. These innovations have the power to completely transform the way we study, work, and produce. There is no doubt that these technologies have the potential to improve the world, but it is necessary to use them effectively.ConclusionGoogle Bard is a powerful AI tool that has the potential to be very beneficial, including writing and editing content, researching and learning new things, translating languages, generating new creative ideas, and answering questions. Being more productive and saving time are two of the greatest advantages of utilizing Google Bard. This can free up your time so that you can concentrate on other things, including developing new ideas or enhancing your skills. Bard can assist you in finding and understanding the data quickly and easily because it has access to a wide amount of information. It has the potential to be a game-changer for many people. If you are looking for a way to be more productive, I encourage you to try using Bard.Author BioSangita Mahala is a passionate IT professional with an outstanding track record, having an impressive array of certifications, including 12x Microsoft, 11x GCP, 2x Oracle, and LinkedIn Marketing Insider Certified. She is a Google Crowdsource Influencer and IBM champion learner gold. She also possesses extensive experience as a technical content writer and accomplished book blogger. She is always Committed to staying with emerging trends and technologies in the IT sector.

0
0
14591

article-image-build-a-clone-of-yourself-with-large-language-models-llms

Louis Owen

05 Oct 2023

13 min read

Build a Clone of Yourself with Large Language Models (LLMs)

Louis Owen

05 Oct 2023

13 min read

Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!Introduction"White Christmas," a standout sci-fi episode from the Black Mirror series, serves as a major source of inspiration for this article. In this episode, we witness a captivating glimpse into a potential future application of Artificial Intelligence (AI), particularly considering the timeframe when the show was released. The episode introduces us to "Cookies," digital replicas of individuals that piqued the author's interest.A "Cookie" is a device surgically implanted beneath a person's skull, meticulously replicating their consciousness over the span of a week. Subsequently, this replicated consciousness is extracted and transferred into a larger, egg-shaped device, which can be connected to a computer or tablet for various purposes.Back when this episode was made available to the public in approximately 2014, the concept seemed far-fetched, squarely in the realm of science fiction. However, what if I were to tell you that we now have the potential to create our own clones akin to the "Cookies" using Large Language Models (LLMs)? You might wonder how this is possible, given that LLMs primarily operate with text. Fortunately, we can bridge this gap by extending the capabilities of LLMs through the integration of a Text-to-Speech module.There are two primary approaches to harnessing LLMs for this endeavor: fine-tuning your own LLM and utilizing a general-purpose LLM (whether open-source or closed-source). Fine-tuning, though effective, demands a considerable investment of time and resources. It involves tasks such as gathering and preparing training data, fine-tuning the model through multiple iterations until it meets our criteria, and ultimately deploying the final model into production. Conversely, general LLMs have limitations on the length of input prompts (unless you are using an exceptionally long-context model like Antropic's Claude). Moreover, to fully leverage the capabilities of general LLMs, effective prompt engineering is essential. However, when we compare these two approaches, utilizing general LLMs emerges as the easier path for creating a Proof of Concept (POC). If the aim is to develop a highly refined model capable of replicating ourselves convincingly, then fine-tuning becomes the preferred route.In the course of this article, we will explore how to harness one of the general LLMs provided by AI21Labs and delve into the art of creating a digital clone of oneself through prompt engineering. While we will touch upon the basics of fine-tuning, we will not delve deeply into this process, as it warrants a separate article of its own.Without wasting any more time, let’s take a deep breath, make yourselves comfortable, and be ready to learn how to build a clone of yourself with LLM!Glimpse of Fine-tuning your LLMAs mentioned earlier, we won't delve into the intricate details of fine-tuning a Large Language Model (LLM) to achieve our objective of building a digital clone of ourselves. Nevertheless, in this section, we'll provide a high-level overview of the steps involved in creating such a clone through fine-tuning an LLM.1. Data CollectionThe journey begins with gathering all the relevant data needed for fine-tuning the LLM. This dataset should ideally comprise our historical conversational data, which can be sourced from various platforms like WhatsApp, Telegram, LINE, email, and more. It's essential to cast a wide net and collect as much pertinent data as possible to ensure the model's accuracy in replicating our conversational style and nuances.2. Data PreparationOnce the dataset is amassed, the next crucial step is data preparation. This phase involves several tasks:● Data Formatting: Converting the collected data into the required format compatible with the fine-tuning process.● Noise Removal: Cleaning the dataset by eliminating any irrelevant or noisy information that could negatively impact the model's training.● Resampling: In some cases, it may be necessary to resample the data to ensure a balanced and representative dataset for training.3. Model TrainingWith the data prepared and in order, it's time to proceed to the model training phase. Modern advances in deep learning have made it possible to train LLMs on consumer-grade GPUs, offering accessibility and affordability, such as via QLoRA. During this stage, the LLM learns from the provided dataset, adapting its language generation capabilities to mimic our conversational style and patterns.4. Iterative RefinementFine-tuning an LLM is an iterative process. After training the initial model, we need to evaluate its performance. This evaluation may reveal areas for improvement. It's common to iterate between model training and evaluation, making incremental adjustments to enhance the model's accuracy and fluency.5. Model EvaluationThe evaluation phase is critical in assessing the model's ability to replicate our conversational style and content accurately. Evaluations may include measuring the model's response coherence, relevance, and similarity to our past conversations.6. DeploymentOnce we've achieved a satisfactory level of performance through multiple iterations, the next step is deploying the fine-tuned model. Deploying an LLM is a complex task that involves setting up infrastructure to host the model and handle user requests. An example of a robust inference server suitable for this purpose is Text Generation Inference. You can refer to my other article for this. Deploying the model effectively ensures that it can be accessed and used in various applications.Building the Clone of Yourself with General LLMLet’s start learning how to build the clone of yourself with general LLM through prompt engineering! In this article, we’ll use j2-ultra, the biggest and most powerful model provided by AI21Labs. Note that AI21Labs gives us a free trial for 3 months with $90 credits. These free credits is very useful for us to build a POC for this project. The first thing we need to do is to create the prompt and test it in the playground. To do this, you can log in with your AI21Labs account and go to the AI21Studio. If you don’t have an account yet, you can create one by just following the steps provided on the web. It’s very straightforward. Once you’re on the Studio page, go to the Foundation Models page and choose the j2-ultra model. Note that there are three foundation models provided by AI21Labs. However, in this article, we’ll use j2-ultra which is the best one.Once we’re in the playground, we can experiment with the prompt that we want to try. Here, I provided an example prompt that you can start with. What you need to do is to adjust the prompt with your own information. Louis is an AI Research Engineer/Data Scientist from Indonesia. He is a continuous learner, friendly, and always eager to share his knowledge with his friends. Important information to follow: - His hobbies are writing articles and watching movies - He has 3 main strengths: strong-willed, fast-learner, and effective. - He is currently based in Bandung, Indonesia. - He prefers to Work From Home (WFH) compared to Work From Office - He is currently working as an NLP Engineer at Yellow.ai. - He pursued a Mathematics major in Bandung Institute of Technology. - The reason why he loves NLP is that he found it interesting where one can extract insights from the very unstructured text data. - He learns Data Science through online courses, competitions, internship, and side-projects. - For technical skills, he is familiar with Python, Tableau, SQL, R, Google Big Query, Git, Docker, Design Thinking, cloud service (AWS EC2), Google Data Studio, Matlab, SPSS - He is a Vegan since 2007! He loves all vegan foods except tomatoes. User: Hi, what's up? Louis: Hey, doing good here! How are u? User: All's good. Just wondering, I knew that you're into NLP, can you please give me some recommendation on how to learn? Louis: Sure thing man! I personally learned Data Science through online courses, competitions, internship, and side-projects. However, my top recommendation is to create your own personal projects and joining competitions. You can learn a lot from those! User: Nice. What personal projects to start? Louis: You can start with the topic that you're really interested at. For example, if you're interested at soccer, you can maybe create a data analysis on how one soccer team strategy can gives a better chance for them to win their matches. User: Awesome! thanks man, will ping you again if I have any doubts. Is it okay? Louis: Absolutely! Feel free, good day! ## Louis is an AI Research Engineer/Data Scientist from Indonesia. He is a continuous learner, friendly, and always eager to share his knowledge with his friends. Important information to follow: - His hobbies are writing articles and watching movies - He has 3 main strengths: strong-willed, fast-learner, and effective. - He is currently based in Bandung, Indonesia. - He prefers to Work From Home (WFH) compared to Work From Office - He is currently working as an NLP Engineer at Yellow.ai. - He pursued a Mathematics major in Bandung Institute of Technology. - The reason why he loves NLP is that he found it interesting where one can extract insights from the very unstructured text data. - He learns Data Science through online courses, competitions, internship, and side-projects. - For technical skills, he is familiar with Python, Tableau, SQL, R, Google Big Query, Git, Docker, Design Thinking, cloud service (AWS EC2), Google Data Studio, Matlab, SPSS - He is a Vegan since 2007! He loves all vegan foods except tomatoes. User: Hey, I stumbled upon your IG and realized that you're a Vegan?! Louis: Oh yeah man. I'm a Vegan since 2007! User: Awesome! Out of curiosity, what made you to decide become a Vegan? Louis: Oh mostly it's because of my family beliefs and also to help save the planet. User: Got it. Cool! Anyway, what are you up to lately? Louis: Lately I spend my time to work on my full-time job and also writes articles in my spare time. User: Cool man, keep up the good work! ## Louis is an AI Research Engineer/Data Scientist from Indonesia. He is a continuous learner, friendly, and always eager to share his knowledge with his friends. Important information to follow: - His hobbies are writing articles and watching movies - He has 3 main strengths: strong-willed, fast-learner, and effective. - He is currently based in Bandung, Indonesia. - He prefers to Work From Home (WFH) compared to Work From Office - He is currently working as an NLP Engineer at Yellow.ai. - He pursued a Mathematics major in Bandung Institute of Technology. - The reason why he loves NLP is that he found it interesting where one can extract insights from the very unstructured text data. - He learns Data Science through online courses, competitions, internship, and side-projects. - For technical skills, he is familiar with Python, Tableau, SQL, R, Google Big Query, Git, Docker, Design Thinking, cloud service (AWS EC2), Google Data Studio, Matlab, SPSS - He is a Vegan since 2007! He loves all vegan foods except tomatoes. User: Hey! Louis:The way this prompt works is by giving several few examples commonly called few-shot prompting. Using this prompt is very straightforward, we just need to append the User message at the end of the prompt and the model will generate the answer replicating yourself. Once the answer is generated, we need to put it back to the prompt and wait for the user’s reply. Once the user has replied to the generated response, we also need to put it back to the prompt. Since this is a looping procedure, it’s better to create a function to do all of this. The following is an example of the function that can handle the conversation along with the code to call the AI21Labs model from Python.import ai21 ai21.api_key = 'YOUR_API_KEY' def talk_to_your_clone(prompt): while True: user_message = input() prompt += "User: " + user_message + "\n" response = ai21.Completion.execute( model="j2-ultra", prompt=prompt, numResults=1, maxTokens=100, temperature=0.5, topKReturn=0, topP=0.9, stopSequences=["##","User:"], ) prompt += "Louis: " + response + "\n" print(response)ConclusionCongratulations on keeping up to this point! Throughout this article, you have learned ways to create a clone of yourself, detailed steps on how to create it with general LLM provided by AI21Labs, also working code that you can utilize to customize it for your own needs. Hope the best for your experiment in creating a clone of yourself and see you in the next article!Author BioLouis Owen is a data scientist/AI engineer from Indonesia who is always hungry for new knowledge. Throughout his career journey, he has worked in various fields of industry, including NGOs, e-commerce, conversational AI, OTA, Smart City, and FinTech. Outside of work, he loves to spend his time helping data science enthusiasts to become data scientists, either through his articles or through mentoring sessions. He also loves to spend his spare time doing his hobbies: watching movies and conducting side projects. Currently, Louis is an NLP Research Engineer at Yellow.ai, the world’s leading CX automation platform. Check out Louis’ website to learn more about him! Lastly, if you have any queries or any topics to be discussed, please reach out to Louis via LinkedIn.

0
0
10162

article-image-getting-started-with-microsofts-guidance

Prakhar Mishra

26 Sep 2023

8 min read

Getting Started with Microsoft’s Guidance

Prakhar Mishra

26 Sep 2023

8 min read

Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights and books. Don't miss out – sign up today!IntroductionThe emergence of a massive language model is a watershed moment in the field of artificial intelligence (AI) and natural language processing (NLP). Because of their extraordinary capacity to write human-like text and perform a range of language-related tasks, these models, which are based on deep learning techniques, have earned considerable interest and acceptance. This field has undergone significant scientific developments in recent years. Researchers all over the world have been developing better and more domain-specific LLMs to meet the needs of various use cases.Large Language Models (LLMs) such as GPT-3 and its descendants, like any technology or strategy, have downsides and limits. And, in order to use LLMs properly, ethically, and to their maximum capacity, it is critical to grasp their downsides and limitations. Unlike large language models such as GPT-4, which can follow the majority of commands. Language models that are not equivalently large enough (such as GPT-2, LLaMa, and its derivatives) frequently suffer from the difficulty of not following instructions adequately, particularly the part of instruction that asks for generating output in a specific structure. This causes a bottleneck when constructing a pipeline in which the output of LLMs is fed to other downstream functions.Introducing Guidance - an effective and efficient means of controlling modern language models compared to conventional prompting methods. It supports both open (LLaMa, GPT-2, Alpaca, and so on) and closed LLMs (ChatGPT, GPT-4, and so on). It can be considered as a part of a larger ecosystem of tools for expanding the capabilities of language models.Guidance uses Handlebars - a templating language. Handlebars allow us to build semantic templates effectively by compiling templates into JavaScript functions. Making it’s execution faster than other templating engines. Guidance also integrates well with Jsonformer - a bulletproof way to generate structured JSON from language models. Here’s a detailed notebook on the same. Also, in case you were to use OpenAI from Azure AI then Guidance has you covered - notebook.Moving on to some of the outstanding features that Guidance offers. Feel free to check out the entire list of features.Features1. Guidance Acceleration - This addition significantly improves inference performance by efficiently utilizing the Key/Value caches as we proceed through the prompt by keeping a session state with the LLM inference. Benchmarking revealed a 50% reduction in runtime when compared to standard prompting approaches. Here’s the link to one of the benchmarking exercises. The below image shows an example of generating a character profile of an RPG game in JSON format. The green highlights are the generations done by the model, whereas the blue and no highlights are the ones that are copied as it is from the input prompt, unlike the traditional method that tries to generate every bit of it.SourceNote: As of now, the Guidance Acceleration feature is implemented for open LLMs. We can soon expect to see if working with closed LLMs as well.2. Token Healing - This feature attempts to correct tokenization artifacts that commonly occur at the border between the end of a prompt and the start of a group of generated tokens.For example - If we ask LLM to auto-complete a URL with the below-mentioned Input, it’s likely to produce the shown output. Apart from the obvious limitation that the URL might not be valid. I'd like to draw your attention to the extra space it creates (highlighted in red). Such considerations make it difficult to construct a dependable parsing function and robustly absorb its result into subsequent phases.Input: “The link is <a href=http:”Actual Output: “The link is <a href=http: //www.google.com/search?q”Expected Output: “The link is <a href=http://www.google.com/search?q” This is the exact bucket of problems that Token Healing tries to solve using the backtracking method. Feel free to check out this jupyter notebook for more examples.3. Guaranteed Output Structure - Large language models are fantastic at producing useful outputs, but not so much at producing outputs in a specified format (especially open-source ones like LLaMa, GPT-2, and so on). When we want to use the output of a language model as input to another system, this is frequently an issue. With Handlebars, guidance guarantees the output format to be the same as what was being asked for.Let’s now see Guidance in action -InstallationInstalling guidance is a breeze, just do a pip :$ pip install guidanceAssume we are now creating a product description for an e-commerce website. Here's how the traditional generation compares to the guidance generation. Feel free to play with this colab notebook with both the below examples.Traditional GenerationInput:Consider you are an e-commerce expert. You need to write a product description for a product to be listed on the e-commerce website. The product description consists of fixed set of fields to be filled in the JSON. The below shown JSON is the structure of the description with constraints for each of the attributes enclosed within < > brackets. Please follow the mentioned constraint and format diligently. { prod_id: <numeric value of 5 digits>, prod_name: <name starts with the prefix 'p_'>, prod_price: <should be an integer between 1 and 16. Should end with suffix '$'> } The product description isOutput:Consider you are an e-commerce expert. You need to write a product description for a product to be listed on the e-commerce website. The product description consists of fixed set of fields to be filled in the JSON. The below shown JSON is the structure of the description with constraints for each of the attributes enclosed within < > brackets. Please follow the mentioned constraint and format diligently. { prod_id: <numeric value of 5 digits>, prod_name: <name starts with the prefix 'p_'>, prod_price: <should be an integer between 1 and 16. Should end with suffix '$'> } The product description is { resentprod_id: <numeric value of 5 digits>, resentprod_name: <name begins with the prefix 'p_'>, resentprod_price: <should be an integer between 1 and 16. Should end with suffix '$'> } In the above example, the product description has 5 constraint fields and 5 attribute fields. The constraints are as follows: resentprod_id: - value of 5 digits, resentprod_name: - name of the product, resentprod_price: - price of the product, resentprod_price_suffix: - suffix of the product price, resentprod_id: - the product id, resentpro diabetic_id: value of 4 digits, resentprod_ astronomer_id: - value of 4 digits, resentprod_ star_id: - value of 4 digits, resentprod_is_generic: - if the product is generic and not the generic type, resentprod_type: - the type of the product, resentprod_is_generic_typeHere’s the code for the above example with GPT-2 language model -``` from transformers import AutoModelForCausalLM, AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("gpt2-large") model = AutoModelForCausalLM.from_pretrained("gpt2-large") inputs = tokenizer(Input, return_tensors="pt") tokens = model.generate( **inputs, max_new_tokens=256, temperature=0.7, do_sample=True, )Output:tokenizer.decode(tokens[0], skip_special_tokens=True)) ```Guidance GenerationInput w/ code:guidance.llm = guidance.llms.Transformers("gpt-large") # define the prompt program = guidance("""Consider you are an e-commerce expert. You need to write a product description for a product to be listed on the e-commerce website. The product description consists of fixed set of fields to be filled in the JSON. The following is the format ```json { "prod_id": "{{gen 'id' pattern='[0-9]{5}' stop=','}}", "prod_name": "{{gen 'name' pattern='p_[A-Za-z]+' stop=','}}", "prod_price": "{{gen 'price' pattern='\b([1-9]|1[0-6])\b\$' stop=','}}" }```""") # execute the prompt Output = program()Output:Consider you are an e-commerce expert. You need to write a product description for a product to be listed on the e-commerce website. The product description consists of a fixed set of fields to be filled in the JSON. The following is the format```json { "prod_id": "11231", "prod_name": "p_pizzas", "prod_price": "11$" }```As seen in the preceding instances, with guidance, we can be certain that the output format will be followed within the given restrictions no matter how many times we execute the identical prompt. This capability makes it an excellent choice for constructing any dependable and strong multi-step LLM pipeline.I hope this overview of Guidance has helped you realize the value it may provide to your daily prompt development cycle. Also, here’s a consolidated notebook showcasing all the features of Guidance, feel free to check it out.Author BioPrakhar has a Master’s in Data Science with over 4 years of experience in industry across various sectors like Retail, Healthcare, Consumer Analytics, etc. His research interests include Natural Language Understanding and generation, and has published multiple research papers in reputed international publications in the relevant domain. Feel free to reach out to him on LinkedIn

0
0
9993

article-image-optimized-llm-serving-with-tgi

Louis Owen

25 Sep 2023

7 min read

Optimized LLM Serving with TGI

Louis Owen

25 Sep 2023

7 min read

Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights and books. Don't miss out – sign up today!IntroductionThe release of LLaMA-2 LLM to the public has been a tremendous contribution by Meta to the open-source community. It’s very powerful that many developers are fine-tuning it for so many different use cases. Closed-source LLMs such as GPT3.5, GPT4, or Claude are really convenient to utilize since we just need to send API requests to power up our application. On the other hand, utilizing open-source LLMs such as LLaMA-2 in production is not an easy task. Although there are several algorithms that support deploying an LLM with CPU, most of the time we still need to utilize GPU for better throughput.At a high level, there are two main things that need to be taken care of when deploying your LLM in production: memory and throughput. LLM is very big in size, the smallest version of LLaMA-2 has 7 billion parameters, which requires around 28GB GPU RAM for loading the model. Google Colab offers free NVIDIA T4 GPU to be used with 16GB memory, meaning that we can’t load even the smallest version of LLaMA-2 freely using the GPU provided by Google Colab.There are two popular ways to solve this memory issue: half-precision and 4/8-bit quantization. Half-precision is a very simple optimization method that we can adopt. In PyTorch, we just need to add the following line and we can load the 7B model by using only around 13GB GPU RAM. Basically, half-precision means that we’re representing the weights of the model with 16-bit floating points (fp16) instead of 32-bit floating points (fp32).torch.set_default_tensor_type(torch.cuda.HalfTensor)Another way to solve the memory issue is by performing 4/8-bits quantization. There are two quantization algorithms that are widely adopted by the developers: GPT-Q and BitsandBytes. These two algorithms are also supported within the `transformers` package. Quantization is a way to reduce the model size by representing the model weights in lower precision data types, such as 8-bit integers (int8) or 4-bit integers (int4) instead of 32-bit floating point (fp32) or 16-bit (fp16).As for comparison, after applying the 4-bit quantization with the GPT-Q algorithm, we can load the 7B model by using only around 4GB GPU RAM! This is indeed a huge drop - from 28GB to 4GB!Once we’re able to load the model, the simplest way to serve the model is by using the native HuggingFace workflow along with Flask or FastAPI. However, serving our model with this simple solution is not scalable and reliable enough to be used in production since it can’t handle parallel requests decently. This is related to the throughput problem mentioned earlier. There are many ways to solve this throughput problem, starting from continuous batching, tensor parallelism, prompt caching, paged attention, prefill parallel computing, KV cache, and many more. Discussing each of the algorithms will result in a very long article. However, luckily there’s an inference serving library for LLM that applies all of these techniques to ensure the reliability of serving our LLM in production. This library is called Text Generation Inference (TGI).Throughout this article, we’ll discuss in more detail what TGI is and how to utilize it to serve your LLM. We’ll see what are the top features of TGI and the step-by-step tutorial on how to spin up TGI as the inference server.Without wasting any more time, let’s take a deep breath, make yourselves comfortable, and be ready to learn how to utilize TGI as an inference server!What’s TGI?Text Generation Inference (TGI) is like the `transformers` package for inference servers. In fact, TGI is used in production at HuggingFace to power many applications. It provides a lot of useful features:Provide a straightforward launcher for the most popular Large Language Models.Implement Tensor Parallelism to enhance inference speed across multiple GPUs.Enable token streaming through Server-Sent Events (SSE).Implement continuous batching of incoming requests to boost overall throughput.Enhance transformer code for inference using flash-attention and Paged Attention on leading architectures.Apply quantization techniques with bitsandbytes and GPT-Q.Ensure fast and safe persistence format of weight loading with Safetensors.Incorporate watermarking using A Watermark for Large Language Models.Integrate a logit warper for various operations like temperature scaling, top-p, top-k, and repetition penalty.Implement stop sequences and log probabilities.Ensure production readiness with distributed tracing through Open Telemetry and Prometheus metrics.Offer Custom Prompt Generation, allowing users to easily generate text by providing custom prompts for guiding the model's output.Provide support for fine-tuning, enabling the use of fine-tuned models for specific tasks to achieve higher accuracy and performance.Implement RoPE scaling to extend the context length during inferenceTGI is surely a one-stop solution for serving LLMs. However, one important note about TGI that you need to know is regarding the project license. Starting from v1.0, HuggingFace has updated the license for TGI using the HFOIL 1.0 license. For more details, please refer to this GitHub issue. If you’re using TGI only for research purposes, then there’s nothing to worry about. If you’re using TGI for commercial purposes, please make sure to read the license carefully since there are cases where it can and can’t be used for commercial purposes.How to Use TGI?Using TGI is relatively easy and straightforward. We can use Docker to spin up the server. Note that you need to install the NVIDIA Container Toolkit to use GPU.docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:1.0.3 --model-id $modelOnce the server is up, we can easily send API requests to the server using the following command.curl 127.0.0.1:8080/generate \ -X POST \ -d '{"inputs":"What is LLM?","parameters":{"max_new_tokens":120}}' \ -H 'Content-Type: application/json'For more details regarding the API documentation, please refer to this page. Another way to send API requests to the TGI server is by sending it from Python. To do that, we need to install the Python library firstpip install text-generationOnce the library is installed, we can send API requests like the following.from text_generation import Client client = Client("http://127.0.0.1:8080") print(client.generate("What is LLM?", max_new_tokens=120).generated_text) text = "" for response in client.generate_stream("What is Deep Learning?", max_new_tokens=20): if not response.token.special: text += response.token.text print(text)ConclusionCongratulations on keeping up to this point! Throughout this article, you have learned what are the important things to be considered when you’re trying to serve your own LLM. You have also learned about TGI, starting from what is it, what’s the features, and also step-by-step examples on how to get the TGI server up. Hope the best for your LLM deployment journey and see you in the next article!Author BioLouis Owen is a data scientist/AI engineer from Indonesia who is always hungry for new knowledge. Throughout his career journey, he has worked in various fields of industry, including NGOs, e-commerce, conversational AI, OTA, Smart City, and FinTech. Outside of work, he loves to spend his time helping data science enthusiasts to become data scientists, either through his articles or through mentoring sessions. He also loves to spend his spare time doing his hobbies: watching movies and conducting side projects.Currently, Louis is an NLP Research Engineer at Yellow.ai, the world’s leading CX automation platform. Check out Louis’ website to learn more about him! Lastly, if you have any queries or any topics to be discussed, please reach out to Louis via LinkedIn.

0
0
18816

article-image-preventing-prompt-attacks-on-llms

Alan Bernardo Palacio

25 Sep 2023

16 min read

Preventing Prompt Attacks on LLMs

Alan Bernardo Palacio

25 Sep 2023

16 min read

3
0
20720

article-image-unleashing-the-potential-of-gpus-for-training-llms

Shankar Narayanan

22 Sep 2023

8 min read

Unleashing the Potential of GPUs for Training LLMs

Shankar Narayanan

22 Sep 2023

8 min read

Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!IntroductionThere is no doubt about Language Models being the true marvels in the arena of artificial intelligence. These sophisticated systems have the power to manipulate human language, understand, and even generate with astonishing accuracy.However, one can often complain about the immense computational challenges beyond these medical abilities. For instance, LLM training requires the incorporation of complex mathematical operations along with the processing of vast data. This is where the Graphics Processing Units (GPU) come into play. It serves as the engine that helps to power the language magic.Let me take you through the GPU advancement and innovations to support the Language Model. Parallely, we will explore how Nvidia helps revolutionize the enterprise LLM use cases.Role of GPUs in LLMs To understand the significance of GPU, let us first understand the concept of LLM.What is LLM?LLM or Large Language Models are AI systems that help generate human language. They have various applications, including translation services, sentiment analysis, chatbots, and content generation. Generative Pre-trained Transformer or GPT models, including BERT and GPT3, are popular among every LLM.These models require training, including vast data sets with billions of phrases and words. The model learns to predict while mastering the nuances and structure of language. It is like an intricate puzzle that requires enormous computational power.The need for GPUsThe Graphics Processing Units are specifically designed to undergo parallel processing. This characteristic makes them applicable to train the LLMs. The GPU can tackle thousands of tasks simultaneously, unlike the Central Processing Unit or CPU, which excels at handling sequential tasks.The training of a Large Language Model is like a massive jigsaw puzzle. Each puzzle piece represents a smaller portion of the model's language understanding. Using a CPU could only help one to work on one of these pieces at a simple time. But with GPU, one could work on various pieces parallelly while speeding up the whole process.Besides, GPU offers high computational throughput that one requires for complex mathematical operations. Their competency lies in metric multiplication, one of the fundamentals of neural network training. All these attributes make GPU indispensable for deep learning tasks like LLMs.Here is one of the practical example of how GPU works in LLM training: (Python)import time import torch # Create a large random dataset data = torch.randn(100000, 1000) # Training with CPU start_time = time.time() for _ in range(100): model_output = data.matmul(data) cpu_training_time = time.time() - start_time print(f"CPU Training Time: {cpu_training_time:.2f} seconds") # Training with GPU if torch.cuda.is_available(): data = data.cuda() start_time = time.time() for _ in range(100): model_output = data.matmul(data) gpu_training_time = time.time() - start_time print(f"GPU Training Time: {gpu_training_time:.2f} seconds") else: print("GPU not available.")GPU Advancements and LLMDue to the rising demands of LLMs and AI, GPU technology is evolving rapidly. These advancements, however, play a significant role in constituting the development of sophisticated language models.One such advancement is the increase in GPU memory capacity. Technically, the larger model requires more excellent memory to process massive data sets. Hence, modern GPUs offer substantial memory capacity, allowing researchers to build and train more substantial large language models.One of the critical aspects of training a Large Language Model is its speed. Sometimes, it can take months to prepare and train a large language model. But with the advent of faster GPU, things have changed dramatically. The quicker GPU reduces the training time and accelerates research and development. Apart from that, it also reduces the energy consumption that is often associated with training these large models.Let us explore the memory capacity of the GPU using a code snippet.(Python)import torch # Check GPU memory capacity if torch.cuda.is_available(): gpu_memory = torch.cuda.get_device_properties(0).total_memory print(f"GPU Memory Capacity: {gpu_memory / (1024**3):.2f} GB") else: print("GPU not available.")For the record, Nvidia's Tensor Core technology has been one of the game changers in this aspect. It accelerates one of the core operations in deep learning, i.e., the matrix computation process, allowing the LLMs to train faster and more efficiently.Using matrix Python and PYTorh, you can showcase the speedup with GPU processing.import time import torch # Create large random matrices matrix_size = 1000 cpu_matrix = torch.randn(matrix_size, matrix_size) gpu_matrix = torch.randn(matrix_size, matrix_size).cuda() # Move to GPU # Perform matrix multiplication with CPU start_time = time.time() result_cpu = torch.matmul(cpu_matrix, cpu_matrix) cpu_time = time.time() - start_time # Perform matrix multiplication with GPU start_time = time.time() result_gpu = torch.matmul(gpu_matrix, gpu_matrix) gpu_time = time.time() - start_time print(f"CPU Matrix Multiplication Time: {cpu_time:.4f} seconds") print(f"GPU Matrix Multiplication Time: {gpu_time:.4f} seconds")Nvidia's Contribution to GPU InnovationRegarding GPU innovation, the presence of Nvidia cannot be denied. It has a long-standing commitment to Machine Learning and advancing AI. Hence, it is a natural ally for the large language model community.Here is how Tensor Cores can be utilized with PYTorch.import torch # Enable Tensor Cores (requires a compatible GPU) if torch.cuda.is_available(): torch.backends.cuda.matmul.allow_tf32 = True # Create a tensor x = torch.randn(4096, 4096, device="cuda") # Perform matrix multiplication using Tensor Cores result = torch.matmul(x, x)It is interesting to know that Nvidia's graphics processing unit has powered several breakthroughs in LLM and AI models. BERT and GPT3 are known to harness the computational might of Nvidia's Graphics Processing Unit to achieve remarkable capabilities. Nvidia's dedication to the Artificial Intelligence world encompasses power and efficiency. The design of the graphics processing unit handles every AI workload with optimal performance per watt. It makes Nvidia one of the eco-friendly options for Large Language Model training procedures.As part of AI-focused hardware and architecture, the Tensor Core technology enables efficient and faster deep learning. This technology is instrumental in pushing the boundaries of LLM research.Supporting Enterprise LLM Use-caseThe application of LLM has a far-fetched reach, extending beyond research, labs, and academia. Indeed, they have entered the enterprise world with a bang. From analyzing massive datasets for insights to automating customer support through chatbots, large language models are transforming how businesses operate.Here, the Nvidia Graphics Processing Unit supports the enterprise LLM use cases. Enterprises often require LLM to handle vast amounts of data in real-time. With optimized AI performance and parallel processing power, Nvidia's GPU can provide the needed acceleration for these applications.Various companies across industries are harnessing the Nvidia GPU for developing LLM-based solutions to automate tasks, provide better customer experiences, and enhance productivity. From healthcare organizations analyzing medical records to financial institutions and predicting market trends, Nvidia drives enterprise LLM innovations.ConclusionNvidia continues to be the trailblazer in the captivating journey of training large language models. They are not only the hardware muscle for LLM but constantly innovate to make GPU capable and efficient with each generation.LLM is on the run to become integral to our daily lives. From business solutions to personal assistants, Nvidia's commitment to its GPU innovation ensures more power to the growth of language models. The synergy between AI and Nvidia GPU is constantly shaping the future of enterprise LLM use cases, helping organizations to achieve new heights in innovation and efficiency.Frequently Asked Questions1. How does the GPU accelerate the training process of large language models?The Graphics Processing Unit has parallel processing capabilities to allow the work of multiple tasks simultaneously. Such parallelism helps train Large Language Models by efficiently processing many components in understanding and generating human language.2. How does Nvidia contribute to GPU innovation for significant language and AI models?Nvidia has developed specialized hardware, including Tensor Core, optimized for AI workloads. The graphic processing unit of Nvidia powered numerous AI breakthroughs while providing efficient AI hardware to advance the development of Large Language Models.3. What are the expectations for the future of GPU innovation and launch language model?The future of GPU innovation promises efficient, specialized, and robust hardware tailored to the needs of AI applications and Large Language Models. It will continuously drive the development of sophisticated language models while opening up new possibilities for AI-power solutions.Author BioShankar Narayanan (aka Shanky) has worked on numerous different cloud and emerging technologies like Azure, AWS, Google Cloud, IoT, Industry 4.0, and DevOps to name a few. He has led the architecture design and implementation for many Enterprise customers and helped enable them to break the barrier and take the first step towards a long and successful cloud journey. He was one of the early adopters of Microsoft Azure and Snowflake Data Cloud. Shanky likes to contribute back to the community. He contributes to open source is a frequently sought-after speaker and has delivered numerous talks on Microsoft Technologies and Snowflake. He is recognized as a Data Superhero by Snowflake and SAP Community Topic leader by SAP.

0
0
13461

article-image-preparing-high-quality-training-data-for-llm-fine-tuning

Louis Owen

22 Sep 2023

9 min read

Preparing High-Quality Training Data for LLM Fine-Tuning

Louis Owen

22 Sep 2023

9 min read

Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!IntroductionLarge Language Models (LLM) such as GPT3.5, GPT4, or Claude have shown a very good general capability that can be utilized across different tasks, starting from question and answering, coding assistant, marketing campaign, and many more. However, utilizing those general LLMs in production, especially for enterprises, is not an easy task:Those models are very large in terms of the number of parameters - resulting in lower latency compared to the smaller modelWe need to give a very long prompt to achieve good results - again, resulting in lower latencyReliability is not ensured - sometimes they can return the response with an additional prefix, which is really annoying when we expect only a JSON format response for exampleOne of the solutions to solve those problems is fine-tuning a smaller LLM that is specific to the task that we want to handle. For example, we need a QnA model that is able to answer user queries based only on the provided passage. Instead of utilizing those general LLMs, we can just fine-tune a smaller LLM, let’s say a 7 billion parameters LLM to do this specific task. Why to utilize such a giant LLM when our use case is only for QnA?The quality of training data plays a pivotal role in the success of fine-tuning. Garbage in, garbage out holds true in the world of LLMs. When you fine-tune low-quality data, you risk transferring noise, biases, and inaccuracies to your model. Let’s take the newly released paper, Textbooks Are All You Need II: phi-1.5 Technical Report, as an example. Despite the relatively low number of parameters (1.5B), this model is able to perform as well as models five times its size. Additionally, it excels in complex reasoning tasks, surpassing most non-frontier LLMs. What’s their secret sauce? High-quality of training data! The next question is how to prepare the training data for LLM fine-tuning. Moreover, how to prepare high-quality training data? Since fine-tuning needs labeled training data, we need to annotate the unlabeled data that we have. Annotating unlabeled data for classification tasks is much easier compared to more complex tasks like summarization. We just need to give labels based on the available classes in the classification task. If you have deployed an application with those general LLMs before and you have the data coming from real production data, then you can use those data as the training data. Actually, you can also use the response coming from the general LLM as the label directly, no need to do any data annotation anymore. However, what if you don’t have real production data? Then, you can use open-source data or even synthetic data generated by the general LLM as your unlabeled data.Throughout this article, we’ll discuss ways to give high-quality labels to the unlabeled training data, whether it’s annotated by humans or by general LLM. We’ll discuss what are the pros and cons of each of the annotation options. Furthermore, we’ll discuss in more detail how to utilize general LLM to do the annotation task - along with the step-by-step example.Without wasting any more time, let’s take a deep breath, make yourselves comfortable, and be ready to learn how to prepare high-quality training data for LLM fine-tuning!Human Annotated DataThe first option to create high-quality training data is by using the help of human annotators. In the ideal scenario, well-trained human annotators not only are able to produce high-quality training data but also produce labels that are fully steerable according to the criteria (SOP). However, using humans as the annotators will surely be both time and money-consuming. It is also not scalable since we need to wait for a not short time until we can get the labeled data. Finally, the ideal scenario is also hard to achieve since each of the annotators has their own bias towards a specific domain or even the label quality is most often based on their mood.LLM Annotated DataAnother better option is to utilize general LLM as the annotator. LLMs will always give not only high-quality training data but also full steerability according to the criteria if we do the prompt engineering correctly. It is also cheaper both in terms of time and money. Finally, it’s absolutely scalable and no bias included - except for hallucination.Let’s see how general LLM is usually utilized as an annotator. We’ll use conversation summarization as the task example. The goal of the task is to summarize the given conversation between two users (User A and User B) and return all important information discussed in the conversation in the form of a summarized paragraph.1. Write the initial promptWe need to start from an initial prompt that we will use to generate the summary of the given conversation, or in general, that will be used to generate the label of the given unlabeled sample.You are an expert in summarizing the given conversation between two users. Return all important information discussed in the conversation in the form of summarized paragraph. Conversation: {}2. Evaluate the generated output with a few samples - qualitativelyUsing the initial prompt, we need to evaluate the generated label with few number of samples - let’s say <20 random samples. We need to do this manually by eyeballing through each of the labeled samples and judging qualitatively if they are good enough or not. If the output quality on these few samples is good enough, then we can move into the next step. If not, then revise the prompt and re-evaluate using another <20 random samples. Repeat this process until you are satisfied with the label quality.3. Evaluate the generated output with large samples - quantitativelyOnce we’re confident enough with the generated labels, we can further assess the quality using a more quantitative approach and with a larger number of samples - let’s say >500 samples. For classification tasks, such as sentiment analysis, evaluating the quality of the labels is easy, we just need to compare the generated label with the ground truth that we have, and then we can calculate the precision, recall, or any other classification metrics that we’re interested in. However, for more complex tasks, such as the task in this example, we need a more sophisticated metric. There are a couple of widely used metrics for summarization task - BLEU, ROUGE, and many more. However, those metrics are based on a string-matching algorithm only, which means if the generated summary doesn’t contain the exact word used in the conversation, then this score will suggest that the summary quality is not good. To overcome this, many engineers nowadays are utilizing GPT-4 to assess the label quality. For example, we can write a prompt as follows to assess the quality of the generated labels. Read the given conversation and summary pair. Give the rating quality for the summary with 5 different options: “very bad”, “bad”, “moderate”, “good”, “excellent”. Make sure the summary captures all of the important information in the conversation and does not contain any misinformation. Conversation: {} Summary: {} Rating: Once you get the rating, you can map them into integers - for example, “very bad”:0, “bad”: 1, “moderate”: 2, … Please make sure that the LLM that you’re using as the evaluator is not in the same LLMs family with the LLM that you’re using as the annotator. For example, GPT3.5 and GPT4 are both in the same family since they’re both coming from OpenAI.If the quantitative metric looks decent and meets the criteria, then we can move into the next step. If it’s not, then we can do a subset analysis to see in what kind of cases the label quality is not good. From there, we can revise the prompt and re-evaluate on the same test data. Repeat this step until you’re satisfied enough with the quantitative metric.4. Apply the final prompt to generate labels in the full dataFinally, we can apply the best prompt that we get from all of those iterations and apply it to generate labels in the full unlabeled data that we have.ConclusionCongratulations on keeping up to this point! Throughout this article, you have learned why LLM fine-tuning is important and when to do fine-tuning. You have also learned how to prepare high-quality training data for LLM fine-tuning. Hope the best for your LLM fine-tuning experiments and see you in the next article!Author BioLouis Owen is a data scientist/AI engineer from Indonesia who is always hungry for new knowledge. Throughout his career journey, he has worked in various fields of industry, including NGOs, e-commerce, conversational AI, OTA, Smart City, and FinTech. Outside of work, he loves to spend his time helping data science enthusiasts to become data scientists, either through his articles or through mentoring sessions. He also loves to spend his spare time doing his hobbies: watching movies and conducting side projects.Currently, Louis is an NLP Research Engineer at Yellow.ai, the world’s leading CX automation platform. Check out Louis’ website to learn more about him! Lastly, if you have any queries or any topics to be discussed, please reach out to Louis via LinkedIn.

0
0
11145

article-image-building-an-investment-strategy-in-the-era-of-llms

Anshul Saxena

21 Sep 2023

16 min read

Building an Investment Strategy in the Era of LLMs

Anshul Saxena

21 Sep 2023

16 min read

0
0
11839

article-image-how-large-language-models-reshape-trading-stats

Anshul Saxena

21 Sep 2023

15 min read

How Large Language Models Reshape Trading Stats

Anshul Saxena

21 Sep 2023

15 min read

0
0
9082

article-image-demystifying-azure-openai-service

Olivier Mertens, Breght Van Baelen

15 Sep 2023

16 min read

Demystifying Azure OpenAI Service

Olivier Mertens, Breght Van Baelen

15 Sep 2023

16 min read

Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!This article is an excerpt from the book, Azure Data and AI Architect Handbook, by Olivier Mertens and Breght Van Baelen. Master core data architecture design concepts and Azure Data & AI services to gain a cloud data and AI architect’s perspective to developing end-to-end solutionsIntroductionOpenAI has risen immensely in popularity with the arrival of ChatGPT. The company, which started as an on-profit organization, has been the driving force behind the GPT and DALL-E model families, with intense research at a massive scale. The speed at which new models get released and become available on Azure has become impressive lately.Microsoft has a close partnership with OpenAI, after heavy investments in the company from Microsoft . The models created by OpenAI use Azure infrastructure for development and deployment. Within this partnership, OpenAI carries the responsibility of research and innovation, coming up with new models and new versions of their existing models. Microsoft manages the enterprise-scale go-to-market. It provides infrastructure and technical guidance, along with reliable SLAs, to get large organizations started with the integrations of these models, fine-tuning them on their own data, and hosting a private deployment of the models.Like the face recognition model in Azure Cognitive Services, powerful LLMs such as the ones in Azure OpenAI Service could be used to cause harm at scale. Therefore, this service is also gated according to Microsoft ’s guidelines on responsible AI.At the time of writing, Azure OpenAI Service offers access to the following models: GPT model family * GPT-3.5 * GPT-3.5-Turbo (the model behind ChatGPT * GPT-4CodexDALL-E 2Let’s dive deeper into these models.The GPT model familyGPT models, which stands for generative pre-trained transformer models, made their first appearance in 2018, with GPT-1, trained on a dataset of roughly 7,000 books. This made good advancements in performance at the time, but the model was already vastly outdated a couple of years later. GPT-2 followed in 2019, trained on the WebText dataset (a collection of 8 million web pages). In 2020, GPT-3 was released, trained on the WebText dataset, two book corpora, and English Wikipedia.In these years, there were no major breakthroughs in terms of efficient algorithms , but rather, in the scale of the architecture and datasets. This becomes easily visible when we look at the growing number of parameters used for every new generation of the model, as shown in the following figure.Figure 9.3 – A visual comparison between the sizes of the different generations of GPT models, based on their trainable parametersThe question is often raised of how to interpret this concept of parameters. An easy analogy is the number of neurons in a brain. Although parameters in a neural network are not equivalent to its artificial neurons, the number of parameters and neurons are heavily correlated – more parameters = more neurons. The more neurons there are in the brain, the more knowledge it can grasp.Since the arrival of GPT-3, we have seen two major adaptations of the third-generation model being made. The first one is GPT-3.5. This model has a similar architecture as the GPT-3 model but is trained on text and code, whereas the original GPT-3 only saw text data during training. Therefore, GPT-3.5 is capable of generating and understanding code. GPT-3.5, in turn, became the basis for the next adaptation, the vastly popular ChatGPT model. This model has been fine-tuned for conversational usage while using additional reinforcement learning to get a sense of ethical behavior.GPT model sizesThe OpenAI models are available in different sizes, which are all named after remarkable scientists. The GPT-3.5 model specifically, is available in four versions:AdaBabbage CurieDavinciThe Ada model is the smallest, most lightweight model, while Davinci is the most complex and most performant model. The larger the model, the more expensive it is to use, host, and fine-tune, as shown in Figure 9.4. As a side note, when you hear about the absurd number of parameters of new GPT models, this usually refers to the Davinci model.Figure 9.4 – A trade-off exists between lightweight, cheap models and highly performant, complex modelsWith a trade-off between costs and performance available, an architect can start thinking about which model size may best fit a solution. In reality, this often comes down to empirical testing. If the cheaper model can perform the job at an acceptable performance, then this is the more cost-effective solution. Note that when talking about performance in this scenario, we mean predictive power, not the speed at which the model makes predictions. The larger models will be slower to output a prediction than the lightweight models.Understanding the difference between GPT-3.5 and GPT-3.5-Turbo (ChatGPT)GPT-3.5 and GPT-3.5-Turbo are both models used to generate natural language text, but they are used in different ways. GPT-3.5 is classified as a text completion model, whereas GPT-3.5-Turbo is referred to as conversational AI.To better understand the contrast between the two models, we first need to introduce the concept of contextual learning. These models are trained to understand the structure of the input prompt to provide a meaningful answer. Contextual learning is often split up into few-shot learning, one-shot learning, and zero-shot learning. Shot, in this context, refers to an example given in the input prompt. With few-shot learning, we provide multiple examples in the input prompt, one-shot learning provides a single example, and zero-shot indicates that no examples are given. In the case of the latter, the model will have to figure out a different way to understand what is being asked of it (such as interpreting the goal of a question).Consider the following example:Figure 9.5 – Few-shot learning takes up the most amount of tokens and requires more effort but often results in model outputs of higher qualityWhile it takes more prompt engineering effort to apply few-shot learning, it will usually yield better results. A text completion model, such as GPT-3.5, will perform vastly better on few-shot learning than one-shot or zero-shot. As the name suggests, the model figures out the structure of the input prompt (i.e., the examples) and completes the text accordingly.Conversational AI, such as ChatGPT, is more performant in zero-shot learning. In the case of the preceding example, both models are able to output the correct answer, but as questions become more and more complex, there will be a noticeable difference in predictive performance. Additionally, GPT-3.5-Turbo will remember information from previous input prompts, whereas GPT-3.5 prompts are handled independently.Innovating with GPT-4With the arrival of GPT-4, the focus has shifted toward multimodality. Multimodality in AI refers to the ability of an AI system to process and interpret information from multiple modalities, such as text, speech, images, and videos. Essentially, it is the capability of AI models to understand and combine data from different sources and formats.GPT-4 is capable of additionally taking images as input and interpreting them. It has stronger reasoning and overall performance than its predecessors. There was a famous example where GPT-4 was able to deduce that balloons would fly upward when asked what would happen if someone cut the balloons' strings, as shown in the following photo.Figure 9.6 – The image in question that was used in the experiment. When asked what would happen if the strings were cut, GPT-4 replied that the balloons would start flying awaySome adaptations of GPT-4, such as the one used in Bing Chat, have the extra feature of citing sources in generated answers. This is a welcome addition, as hallucination was a significant flaw in earlier GPT models.HallucinationHallucination in the context of AI refers to generating wrong predictions with high confidence. It is obvious that this can cause a lot more harm than the model indicating it is not sure how to respond or knowing the answer.Next, we will look at the Codex model.CodexCodex is a model that is architecturally similar to GPT-3, but it fully focuses on code generation and understanding. Furthermore, an adaptation of Codex forms the underlying model for GitHub Copilot, a tool that provides suggestions and auto-completion for code based on context and natural language inputs, available for various integrated development environments (IDEs) such as Visual Studio Code. Instead of a ready-to-use solution, Codex is (like the other models in Azure OpenAI) available as a model endpoint and should be used for integration in custom apps.The Codex model is initially trained on a collection of 54 million code repositories, resulting in billions of lines of code, with the majority of training data written in Python.Codex can generate code in different programming languages based on an input prompt in natural language (text-to-code), explain the function of blocks of code (code-to-text), add comments to code, and debug existing code.Codex is available as a C (cushman) and D (Davinci) model. Lightweight Codex models (A series or B series) currently do not exist.Models such as Codex or GitHub Copilot are a great way to boost the productivity of software engineers, data analysts, data engineers, and data scientists. They do not replace these roles, as their accuracy is not perfect; rather, they give engineers the opportunity to start editing from a fairly well-written block of code instead of coding from scratch.DALL-E 2The DALL-E model family is used to generate visuals. By providing a description in natural language in the input prompt, it generates a series of matching images. While other models are often used at scale in large enterprises, DALL-E 2 tends to be more popular in smaller businesses. Organizations that lack an in-house graphic designer can make great use of DALL-E to generate visuals for banners, brochures, emails, web pages, and so on.DALL-E 2 only has a single model size to choose from, although open-source alternatives exist if a lightweight version is preferred. Fine-tuning and private deploymentsAs a data architect, it is important to understand the cost structure of these models. The first option is to use the base model in a serverless manner. Similar to how we work with Azure Cognitive Services, users will get a key for the model’s endpoint and simply pay per prediction. For DALL-E 2, costs are incurred per 100 images, while the GPT and Codex models are priced per 1,000 tokens. For every request made to a GPT or Codex model, all tokens of the input prompt and the output are added up to determine the cost of the prediction.TokensIn natural language processing, a token refers to a sequence of characters that represents a distinct unit of meaning in a text. These units do not necessarily correspond to words, although for short words, this is mostly the case. Tokens are used as the basic building blocks to process and analyze text data. A good rule of thumb for the English language is that one token is, on average, four characters. Dividing your total character count by four will make a good estimate of the number of tokens.Azure OpenAI Service also grants extensive fine-tuning functionalities. Up to 1 GB of data can be uploaded per Azure OpenAI instance for fine-tuning. This may not sound like a lot, but note that we are not training a new model from scratch. The goal of fine-tuning is to retrain the last few layers of the model to increase performance on specific tasks or company-specific knowledge. For this process, 1 GB of data is more than sufficient.When adding a fine-tuned model to a solution, two additional costs will be incurred. On top of the token-based inference cost, we need to take into account the training and hosting costs. The hourly training cost can be quite high due to the amount of hardware needed, but compared to the inference and hosting costs during a model’s life cycle, it remains a small percentage. Next, since we are not using the base model anymore and, instead, our own “version” of the model, we will need to host the model ourselves, resulting in an hourly hosting cost.Now that we have covered both pre-trained model collections, Azure Cognitive Services, and Azure OpenAI Service, let’s move on to custom development using Azure Machine Learning.Grounding LLMsOne of the most popular use cases for LLMs involves providing our own data as context to the model (often referred to as grounding). The reason for its popularity is partly due to the fact that many business cases can be solved using a consistent technological architecture. We can reuse the same solution, but by providing different knowledge bases, we can serve different end users.For example, by placing an LLM on top of public data such as product manuals or product specifics, it is easy to develop a customer support chatbot. If we swap out this knowledge base of product information with something such as HR documents, we can reuse the same tech stack to create an internal HR virtual assistant.A common misconception regarding grounding is that a model needs to be trained on our own data. This is not the case. Instead, after a user asks a question, the relevant document (or paragraphs) is injected into the prompt behind the scenes and lives in the memory of the model for the duration of the chat session (when working with conversational AI) or for a single prompt. The context, as we call it, is then wiped clean and all information is forgotten. If we wanted to cache this info, it is possible to make use of a framework such as LangChain or Semantic Kernel, but that is out of the scope of this book.The fact that a model does not get retrained on our own data plays a crucial role in terms of data privacy and cost optimization. As shown before in the section on fine-tuning, as soon as a base model is altered, an hourly operating cost is added to run a private deployment of the model. Also, information from the documents cannot be leaked to other users working with the same model.Figure 9.7 visualizes the architectural concepts to ground an LLM.Figure 9.7 – Architecture to ground an LLMThe first thing to do is turn the documents that should be accessible to the model into embeddings. Simply put, embeddings are mathematical representations of natural language text. By turning text into embeddings, it is possible to accurately calculate the similarity (from a semantics perspective) between two pieces of text.To do this, we can leverage Azure Functions, a service that allows pieces of code to run in a serverless function. It often forms the glue between different components by handling interactions. In this case, an Azure function (on the bottom left of Figure 9.7) will grab the relevant documents from the knowledge base, break them up into chunks (to accommodate for the maximum token limits of the model), and generate an embedding for each one. This embedding is then stored, alongside the natural language text, in a vector database. This function should be run for all historic data that will be accessible to the model, as well as triggered for every new, relevant document that is added to the knowledge base.Once the vector database is in place, users can start asking questions. However, the user questions are not directly sent to the model endpoint. Instead, another Azure function (shown at the top of Figure 9.7) will turn the user question into an embedding and check its similarity of it with the embeddings of the documents or paragraphs in the vector database. Then, the top X most relevant text chunks are injected into the prompt as context, and the prompt is sent over to the LLM. Finally, the response is returned to the user.ConclusionAzure OpenAI Service, a collaboration between OpenAI and Microsoft, delivers potent AI models. The GPT model family, from GPT-1 to GPT-4, has evolved impressively, with GPT-3.5-Turbo (ChatGPT) excelling in conversational AI. GPT-4 introduces multimodal capabilities, comprehending text, speech, images, and videos.Codex specializes in code generation, while DALL-E 2 creates visuals from text descriptions. These models empower developers and designers. Customization via fine-tuning offers cost-effective solutions for specific tasks. Leveraging Azure OpenAI Service for your projects enhances productivity.Grounding language models with user data ensures data privacy and cost efficiency. This collaboration holds promise for innovative AI applications across various domains.Author BioOlivier Mertens is a cloud solution architect for Azure data and AI at Microsoft, based in Dublin, Ireland. In this role, he assisted organizations in designing their enterprise-scale data platforms and analytical workloads. Next to his role as an architect, Olivier leads the technical AI expertise for Microsoft EMEA in the corporate market. This includes leading knowledge sharing and internal upskilling, as well as solving highly complex or strategic customer AI cases. Before his time at Microsoft, he worked as a data scientist at a Microsoft partner in Belgium.Olivier is a lecturer for generative AI and AI solution architectures, a keynote speaker for AI, and holds a master’s degree in information management, a postgraduate degree as an AI business architect, and a bachelor’s degree in business management.Breght Van Baelen is a Microsoft employee based in Dublin, Ireland, and works as a cloud solution architect for the data and AI pillar in Azure. He provides guidance to organizations building large-scale analytical platforms and data solutions. In addition, Breght was chosen as an advanced cloud expert for Power BI and is responsible for providing technical expertise in Europe, the Middle East, and Africa. Before his time at Microsoft, he worked as a data consultant at Microsoft Gold Partners in Belgium.Breght led a team of eight data and AI consultants as a data science lead. Breght holds a master’s degree in computer science from KU Leuven, specializing in AI. He also holds a bachelor’s degree in computer science from the University of Hasselt.

0
0
17954

article-image-ai-distilled-17-numentas-nupic-adepts-persimmon-8b-hugging-face-rust-ml-framework-nvidias-tensorrt-llm-azure-ml-promptflow-siris-gen-ai-enhancements

Merlyn Shelley

15 Sep 2023

11 min read

AI_Distilled #17: Numenta’s NuPIC, Adept’s Persimmon-8B, Hugging Face Rust ML Framework, NVIDIA’s TensorRT-LLM, Azure ML PromptFlow, Siri's Gen AI Enhancements

Merlyn Shelley

15 Sep 2023

11 min read

Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!👋 Hello,"If we don't embrace AI, it will move forward without us. Now is the time to harness AI's potential for the betterment of society."- Fei-Fei Li, Computer Scientist and AI Expert. AI is proving to be a real game-changer worldwide, bringing new perspectives to everyday affairs in every field. No wonder Apple is heavily investing in Siri's generative AI enhancement and Microsoft to Provide Legal Protection for AI-Generated Copyright Breaches however, AI currently has massive cooling requirements in data centers which has led to a 34% increase in water consumption in Microsoft data centers. Say hello to the latest edition of our AI_Distilled #17 where we talk about all things LLM, NLP, GPT, and Generative AI! In this edition, we present the latest AI developments from across the world, including NVIDIA TensorRT-LLM enhances Large Language Model inference on H100 GPUs, Meta developing powerful AI system to compete with OpenAI, Google launching Digital Futures Project to support responsible AI, Adept open-sourcing a powerful language model with <10 billion parameters, and Numenta introduces NuPIC, revolutionizing AI efficiency by 100 Times. We know how much you love our curated AI secret knowledge resources. This week, we’re here with some amazing tutorials on building an AWS conversational AI app with AWS Amplify, how to evaluate legal language models with Azure ML PromptFlow, deploying generative AI models on Amazon EKS with a step-by-step guide, Automate It with Zapier and Generative AI and generating realistic textual synthetic data using LLMs. What do you think of this issue and our newsletter? Please consider taking the short survey below to share your thoughts and you will get a free PDF of the “The Applied Artificial Intelligence Workshop” eBook upon completion. Complete the Survey. Get a Packt eBook for Free!Writer’s Credit: Special shout-out to Vidhu Jain for their valuable contribution to this week’s newsletter content! Cheers, Merlyn Shelley Editor-in-Chief, Packt ⚡ TechWave: AI/GPT News & AnalysisGoogle Launches Digital Futures Project to Support Responsible AI: Google has initiated the Digital Futures Project, accompanied by a $20 million fund from Google.org to provide grants to global think tanks and academic institutions. This project aims to unite various voices to understand and address the opportunities and challenges presented by AI. It seeks to support researchers, organize discussions, and stimulate debates on public policy solutions for responsible AI development. The fund will encourage independent research on topics like AI's impact on global security, labor, and governance structures. Inaugural grantees include renowned institutions like the Aspen Institute and MIT Work of the Future. Microsoft to Provide Legal Protection for AI-Generated Copyright Breaches: Microsoft has committed to assuming legal responsibility for copyright infringement related to material generated by its AI software used in Word, PowerPoint, and coding tools. The company will cover legal costs for commercial customers who face lawsuits over tools or content produced by AI. This includes services like GitHub Copilot and Microsoft 365 Copilot. The move aims to ease concerns about potential clashes with content owners and make the software more user-friendly. Other tech companies, such as Adobe, have made similar pledges to indemnify users of AI tools. Microsoft's goal is to provide reassurance to paying users amid the growing use of generative AI, which may reproduce copyrighted content. NVIDIA TensorRT-LLM Enhances Large Language Model Inference on H100 GPUs: NVIDIA introduces TensorRT-LLM, a software solution that accelerates and optimizes LLM inference. This open-source software incorporates advancements achieved through collaboration with leading companies. TensorRT-LLM is compatible with Ampere, Lovelace, and Hopper GPUs, aiming to streamline LLM deployment. It offers an accessible Python API for defining and customizing LLM architectures without requiring deep programming knowledge. Performance improvements are demonstrated with real-world datasets, including a 4.6x acceleration for Meta's Llama 2. Additionally, TensorRT-LLM helps reduce total cost of ownership and energy consumption in data centers, making it a valuable tool for the AI community. Meta Developing Powerful AI System to Compete with OpenAI: The Facebook parent company is reportedly working on a new AI system that aims to rival the capabilities of OpenAI's advanced models. The company intends to launch this AI model next year, and it is expected to be significantly more powerful than Meta's current offering, Llama 2, an open-source AI language model. Llama 2 was introduced in July and is distributed through Microsoft's Azure services to compete with OpenAI's ChatGPT and Google's Bard. This upcoming AI system could assist other companies in developing sophisticated text generation and analysis services. Meta plans to commence training on this new AI system in early 2024. Adept Open-Sources a Powerful Language Model with <10 Billion Parameters: Adept announces the open-source release of Persimmon-8B, a highly capable language model with fewer than 10 billion parameters. This model, made available under an Apache license, is designed to empower the AI community for various use cases. Persimmon-8B stands out for its substantial context size, being 4 times larger than LLaMA2 and 8 times more than GPT-3. Despite using only 0.37x the training data of LLaMA2, it competes with its performance. It includes 70k unused embeddings for multimodal extensions and offers unique inference code combining speed and flexibility. Adept expects this release to inspire innovation in the AI community. Apple Invests Heavily in Siri's Generative AI Enhancement: Apple has significantly increased its investment in AI, particularly in developing conversational chatbot features for Siri. The company is reportedly spending millions of dollars daily on AI research and development. CEO Tim Cook expressed a strong interest in generative AI. Apple's AI journey began four years ago when John Giannandrea, head of AI, formed a team to work on LLMs. The Foundational Models team, led by Ruoming Pang, is at the forefront of these efforts, rivaling OpenAI's investments. Apple plans to integrate LLMs into Siri to enhance its capabilities, but the challenge lies in fitting these large models onto devices while maintaining privacy and performance standards. Numenta Introduces NuPIC: Revolutionizing AI Efficiency by 100 Times: Numenta, a company bridging neuroscience and AI, has unveiled NuPIC (Numenta Platform for Intelligent Computing), a groundbreaking solution rooted in 17 years of brain research. Developed by computing pioneers Jeff Hawkins and Donna Dubinsky, NuPIC aims to make AI processing up to 100 times more efficient. Partnering with game startup Gallium Studios, NuPIC enables high-performance LLMs on CPUs, prioritizing user trust and privacy. Unlike GPU-reliant models, NuPIC's CPU focus offers cost savings, flexibility, and control while maintaining high throughput and low latency. AI Development Increases Water Consumption in Microsoft Data Centers by 34%: The development of AI tools like ChatGPT has led to a 34% increase in Microsoft's water consumption, raising concerns in the city of West Des Moines, Iowa, where its data centers are located. Microsoft, along with tech giants like OpenAI and Google, has seen rising demand for AI tools, which comes with significant costs, including increased water usage. Microsoft disclosed a 34% spike in global water consumption from 2021 to 2022, largely attributed to AI research. A study estimates that ChatGPT consumes 500 milliliters of water every time it's prompted. Google also reported a 20% growth in water use, partly due to AI work. Microsoft and OpenAI stated they are working to make AI systems more efficient and environmentally friendly. 🔮 Looking for a New Book from Packt’s Expert Community? Automate It with Zapier and Generative AI - By Kelly Goss, Philip Lakin Are you excited to supercharge your work with Gen AI's automation skills? Check out this new guide that shows you how to become a Zapier automation pro, making your work more efficient and productive in no time! It covers planning, configuring workflows, troubleshooting, and advanced automation creation. It emphasizes optimizing workflows to prevent errors and task overload. The book explores new built-in apps, AI integration, and complex multi-step Zaps. Additionally, it provides insights into account management and Zap issue resolution for improved automation skills. Read through the Chapter 1 unlocked here... 🌟 Secret Knowledge: AI/LLM ResourcesUnderstanding Liquid Neural Networks: A Primer on AI Advancements: In this post, you'll learn how liquid neural networks are transforming the AI landscape. These networks, inspired by the human brain, offer a unique and creative approach to problem-solving. They excel in complex tasks such as weather prediction, stock market analysis, and speech recognition. Unlike traditional neural networks, liquid neural networks require significantly fewer neurons, making them ideal for resource-constrained environments like autonomous vehicles. These networks excel in handling continuous data streams but may not be suitable for static data. They also provide better causality handling and interpretability. Navigating Generative AI with FMOps and LLMOps: A Practical Guide: In this informative post, you'll gain valuable insights into the world of generative AI and its operationalization using FMOps and LLMOps principles. The authors delve into the challenges businesses face when integrating generative AI into their operations. You'll explore the fundamental differences between traditional MLOps and these emerging concepts. The post outlines the roles various teams play in this process, from data engineers to data scientists, ML engineers, and product owners. The guide provides a roadmap for businesses looking to embrace generative AI. AI Compiler Quartet: A Breakdown of Cutting-Edge Technologies: Explore Microsoft’s groundbreaking "heavy-metal quartet" of AI compilers: Rammer, Roller, Welder, and Grinder. These compilers address the evolving challenges posed by AI models and hardware. Rammer focuses on optimizing deep neural network (DNN) computations, improving hardware parallel utilization. Roller tackles the challenge of memory partitioning and optimization, enabling faster compilation with good computation efficiency. Welder optimizes memory access, particularly vital as AI models become more memory-intensive. Grinder addresses complex control flow execution in AI computation. These AI compilers collectively offer innovative solutions for parallelism, compilation efficiency, memory, and control flow, shaping the future of AI model optimization and compilation. 💡 MasterClass: AI/LLM Tutorials Exploring IoT Data Simulation with ChatGPT and MQTTX: In this comprehensive guide, you'll learn how to harness the power of AI, specifically ChatGPT, and the MQTT client tool, MQTTX, to simulate and generate authentic IoT data streams. Discover why simulating IoT data is crucial for system verification, customer experience enhancement, performance assessment, and rapid prototype design. The article dives into the integration of ChatGPT and MQTTX, introducing the "Candidate Memory Bus" to streamline data testing. Follow the step-by-step guide to create simulation scripts with ChatGPT and efficiently simulate data transmission with MQTTX. Revolutionizing Real-time Inference: SageMaker Unveils Streaming Support for Generative AI: Amazon SageMaker now offers real-time response streaming, transforming generative AI applications. This new feature enables continuous response streaming to clients, reducing time-to-first-byte and enhancing interactive experiences for chatbots, virtual assistants, and music generators. The post guides you through building a streaming web application using SageMaker real-time endpoints for interactive chat use cases. It showcases deployment options with AWS Large Model Inference (LMI) and Hugging Face Text Generation Inference (TGI) containers, providing a seamless, engaging conversation experience for users. Implementing Effective Guardrails for Large Language Models: Guardrails are crucial for maintaining trust in LLM applications as they ensure compliance with defined principles. This guide presents two open-source tools for implementing LLM guardrails: Guardrails AI and NVIDIA NeMo-Guardrails. Guardrails AI offers Python-based validation of LLM responses, using the RAIL specification. It enables developers to define output criteria and corrective actions, with step-by-step instructions for implementation. NVIDIA NeMo-Guardrails introduces Colang, a modeling language for flexible conversational workflows. The guide explains its syntax elements and event-driven design. Comparing the two, Guardrails AI suits simple tasks, while NeMo-Guardrails excels in defining advanced conversational guidelines. 🚀 HackHub: Trending AI Tools cabralpinto/modular-diffusion: Python library for crafting and training personalized Diffusion Models with PyTorch. cofactoryai/textbase: Simplified Python chatbot development using NLP and ML with Textbase's on_message function in main.py. microsoft/BatteryML: Open-source ML tool for battery analysis, aiding researchers in understanding electrochemical processes and predicting battery degradation. facebookresearch/co-tracker: Swift transformer-based video tracker with Optical Flow, pixel-level tracking, grid sampling, and manual point selection. explodinggradients/ragas: Framework evaluates Retrieval Augmented Generation pipelines, enhancing LLM context with external data using research-based tools.

0
0
9589

How-To Tutorials - LLM

Question Answering in LangChain

Build an AI-based Personal Financial Advisor with LangChain

Kickstarting your journey with Azure OpenAI

Build a Language Converter using Google Bard

Google Bard: Everything You Need to Know to Get Started

Build a Clone of Yourself with Large Language Models (LLMs)

Getting Started with Microsoft’s Guidance

Optimized LLM Serving with TGI

Preventing Prompt Attacks on LLMs

Unleashing the Potential of GPUs for Training LLMs

Trending Topics

Preparing High-Quality Training Data for LLM Fine-Tuning

Building an Investment Strategy in the Era of LLMs

How Large Language Models Reshape Trading Stats

Demystifying Azure OpenAI Service

AI_Distilled #17: Numenta’s NuPIC, Adept’s Persimmon-8B, Hugging Face Rust ML Framework, NVIDIA’s TensorRT-LLM, Azure ML PromptFlow, Siri's Gen AI Enhancements

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access