Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials - LLM

81 Articles
article-image-exploring-token-generation-strategies
Saeed Dehqan
28 Aug 2023
8 min read
Save for later

Exploring Token Generation Strategies

Saeed Dehqan
28 Aug 2023
8 min read
IntroductionThis article discusses different methods for generating sequences of tokens using language models, specifically focusing on the context of predicting the next token in a sequence. The article explains various techniques to select the next token based on the predicted probability distribution of possible tokens.Language models predict the next token based on n previous tokens. Models try to extract information from n previous tokens as far as they can. Transformer models aggregate information from all the n previous tokens. Tokens in a sequence communicate with one another and exchange their information. At the end of the communication process, tokens are context-aware and we use them to predict their own next token. Each token separately goes to some linear/non-linear layers and the output is unnormalized logits. Then, we apply Softmax on logits to convert them into probability distributions. Each token has its own probability distribution over its next token:Exploring Methods for Token SelectionWhen we have the probability distribution of tokens, it’s time to pick one token as the next token. There are four methods for selecting the suitable token from probability distribution:●    Greedy or naive method: Simply select the token that has the highest probability from the list. This is a deterministic method.●    Beam search: It receives a parameter named beam size and based on it, the algorithm tries to use the model to predict multiple times to find a suitable sentence, not just a token. This is a deterministic method.●    Top-k sampling: Select the top k most probable tokens and shut off other tokens (make their probability -inf) and sample from top k tokens. This is a sampling method.●    Nucleus sampling: Select the top most probable tokens and shut off other tokens but with a difference that is a dynamic selection of most probable tokens. Not just a crisp k.Greedy methodThis is a simple and fast method and only needs one prediction. Just select the most probable token as the next token. Greedy methods can be efficient on arithmetic tasks. But, it tends to get stuck in a loop and repeat tokens one after another. It also kills the diversity of the model by selecting the tokens that occur frequently in the training dataset.Here’s the code that converts unnormalized logits(simply the output of the network) into probability distribution and selects the most probable next token:probs = F.softmax(logits, dim=-1) next_token = probs.argmax() Beam searchBeam search produces better results and is slower because it runs the model multiple times so that it can create n sequences, where n is beam size. This method selects top n tokens and adds them to the current sequence and runs the model on the made sequences to predict the next token. And this process continues until the end of the sequence. Computationally expensive, but more quality. Based on this search, the algorithm returns two sequences:Then, how do we select the final sequence? We sum up the loss for all predictions and select the sequence with the lowest loss.Simple samplingWe can select tokens randomly based on their probability. The more the probability, the more the chance of being selected. We can achieve this by using multinomial method:logits = logits[:, -1, :] probs = F.softmax(logits, dim=-1) next_idx = torch.multinomial(probs, num_samples=1)This is part of the model we implemented in the “transformer building blocks” blog and the code can be found here. The torch.multinomial receives the probability distribution and selects n samples. Here’s an example:In [1]: import torch In [2]: probs = torch.tensor([0.3, 0.6, 0.1]) In [3]: torch.multinomial(probs, num_samples=1) Out[3]: tensor([1]) In [4]: torch.multinomial(probs, num_samples=1) Out[4]: tensor([0]) In [5]: torch.multinomial(probs, num_samples=1) Out[5]: tensor([1]) In [6]: torch.multinomial(probs, num_samples=1) Out[6]: tensor([0]) In [7]: torch.multinomial(probs, num_samples=1) Out[7]: tensor([1]) In [8]: torch.multinomial(probs, num_samples=1) Out[8]: tensor([1])We ran the method six times on probs, and as you can see it selects 0.6 four times and 0.3 two times because 0.6 is higher than 0.3.Top-k samplingIf we want to make the previous sampling method better, we need to limit the sampling space. Top-k sampling does this. K is a parameter that Top-k sampling uses to select top k tokens from the probability distribution and sample from these k tokens. Here is an example of top-k sampling:In [1]: import torch In [2]: logit = torch.randn(10) In [3]: logit Out[3]: tensor([-1.1147, 0.5769, 0.3831, -0.5841, 1.7528, -0.7718, -0.4438, 0.6529, 0.1500, 1.2592]) In [4]: topk_values, topk_indices = torch.topk(logit, 3) In [5]: topk_values Out[5]: tensor([1.7528, 1.2592, 0.6529]) In [6]: logit[logit < topk_values[-1]] = float('-inf') In [7]: logit Out[7]: tensor([ -inf, -inf, -inf, -inf, 1.7528, -inf, -inf, 0.6529, -inf, 1.2592]) In [8]: probs = logit.softmax(0) In [9]: probs Out[9]: tensor([0.0000, 0.0000, 0.0000, 0.0000, 0.5146, 0.0000, 0.0000, 0.1713, 0.0000, 0.3141]) In [10]: torch.multinomial(probs, num_samples=1) Out[10]: tensor([9]) In [11]: torch.multinomial(probs, num_samples=1) Out[11]: tensor([4]) In [12]: torch.multinomial(probs, num_samples=1) Out[12]: tensor([9])●    We first create a fake logit with torch.randn. Supposedly logit is the raw output of a network.●    We use torch.topk to select the top 3 values from logit. torch.topk returns top 3 values along with their indices. The values are sorted from top to bottom.●    We use advanced indexing to select logit values that are lower than the last top 3 values. When we say logit < topk_values[-1] we mean all the numbers in logit that are lower than topk_values[-1] (0.6529). ●    After selecting those numbers, we replace their value to float(‘-inf’), which is a negative infinite number. ●    After replacement, we run softmax over the logit to convert it into probabilities. ●    Now, we use torch.multinomial to sample from the probs.Nucleus samplingNucleus sampling is like Top-k sampling but with a dynamic selection of top tokens instead of selecting k tokens. The dynamic selection is better when we are unsure of selecting a suitable k for Top-k sampling. Nucleus sampling has a hyperparameter named p, let us say it is 0.9, and this method selects tokens from descending order and adds up their probabilities and when we reach a cumulative sum of p, we stop. What is the cumulative sum? Here’s an example of cumulative sum:In [1]: import torch In [2]: logit = torch.randn(10) In [3]: probs = logit.softmax(0) In [4]: probs Out[4]: tensor([0.0652, 0.0330, 0.0609, 0.0436, 0.2365, 0.1738, 0.0651, 0.0692, 0.0495, 0.2031]) In [5]: [probs[:x+1].sum() for x in range(probs.size(0))] Out[5]: [tensor(0.0652), tensor(0.0983), tensor(0.1592), tensor(0.2028), tensor(0.4394), tensor(0.6131), tensor(0.6782), tensor(0.7474), tensor(0.7969), tensor(1.)]I hope you understand how cumulative sum works from the code. We just add up n previous prob values. We can also use torch.cumsum and get the same result:In [9]: torch.cumsum(probs, dim=0) Out[9]: tensor([0.0652, 0.0983, 0.1592, 0.2028, 0.4394, 0.6131, 0.6782, 0.7474, 0.7969, 1.0000]) Okay. Here’s a nucleus sampling from scratch: In [1]: import torch In [2]: logit = torch.randn(10) In [3]: probs = logit.softmax(0) In [4]: probs Out[4]: tensor([0.7492, 0.0100, 0.0332, 0.0078, 0.0191, 0.0370, 0.0444, 0.0553, 0.0135, 0.0305]) In [5]: sprobs, indices = torch.sort(probs, dim=0, descending=True) In [6]: sprobs Out[6]: tensor([0.7492, 0.0553, 0.0444, 0.0370, 0.0332, 0.0305, 0.0191, 0.0135, 0.0100, 0.0078]) In [7]: cs_probs = torch.cumsum(sprobs, dim=0) In [8]: cs_probs Out[8]: tensor([0.7492, 0.8045, 0.8489, 0.8860, 0.9192, 0.9497, 0.9687, 0.9822, 0.9922, 1.0000]) In [9]: selected_tokens = cs_probs < 0.9 In [10]: selected_tokens Out[10]: tensor([ True, True, True, True, False, False, False, False, False, False]) In [11]: probs[indices[selected_tokens]] Out[11]: tensor([0.7492, 0.0553, 0.0444, 0.0370]) In [12]: probs = probs[indices[selected_tokens]] In [13]: torch.multinomial(probs, num_samples=1) Out[13]: tensor([0])●    Convert the logit to probabilities and sort it with descending order so that we can select them from top to bottom.●    Calculate cumulative sum.●    Using advanced indexing, we filter out values.●    Then, we sample from a limited and better space.Please note that you can use a combination of top-k and nucleus samplings. It is like selecting k tokens and doing nucleus sampling on these k tokens. You can also use top-k, nucleus, and beam search.ConclusionUnderstanding these methods is crucial for anyone working with language models, natural language processing, or text generation tasks. These techniques play a significant role in generating coherent and diverse sequences of text. Depending on the specific use case and desired outcomes, readers can choose the most appropriate method to employ. Overall, this knowledge can contribute to improving the quality of generated text and enhancing the capabilities of language models.Author BioSaeed Dehqan trains language models from scratch. Currently, his work is centered around Language Models for text generation, and he possesses a strong understanding of the underlying concepts of neural networks. He is proficient in using optimizers such as genetic algorithms to fine-tune network hyperparameters and has experience with neural architecture search (NAS) by using reinforcement learning (RL). He implements models starting from data gathering to monitoring, and deployment on mobile, web, cloud, etc. 
Read more
  • 0
  • 0
  • 25908

article-image-llm-pitfalls-and-how-to-avoid-them
Amita Kapoor & Sharmistha Chatterjee
31 Aug 2023
13 min read
Save for later

LLM Pitfalls and How to Avoid Them

Amita Kapoor & Sharmistha Chatterjee
31 Aug 2023
13 min read
IntroductionLanguage Learning Models, or LLMs, are machine learning algorithms that focus on understanding and generating human-like text. These advanced developments have significantly impacted the field of natural language processing, impressing us with their capacity to produce cohesive and contextually appropriate text. However, navigating the terrain of LLMs requires vigilance, as there exist pitfalls that may trap the unprepared.In this article, we will uncover the nuances of LLMs and discover practical strategies for evading their potential pitfalls. From misconceptions surrounding their capabilities to the subtleties of bias pervading their outputs, we shed light on the intricate underpinnings beyond their impressive veneer.Understanding LLMs: A PrimerLLMs, such as GPT-4, are based on a technology called Transformer architecture, introduced in the paper "Attention is All You Need" by Vaswani et al. In essence, this architecture's 'attention' mechanism allows the model to focus on different parts of an input sentence, much like how a human reader might pay attention to different words while reading a text.Training an LLM involves two stages: pre-training and fine-tuning. During pre-training, the model is exposed to vast quantities of text data (billions of words) from the internet. Given all the previous words, the model learns to predict the next word in a sentence. Through this process, it learns grammar, facts about the world, reasoning abilities, and also some biases present in the data.  A significant part of this understanding comes from the model's ability to process English language instructions. The pre-training process exposes the model to language structures, grammar, usage, nuances of the language, common phrases, idioms, and context-based meanings.  The Transformer's 'attention' mechanism plays a crucial role in this understanding, enabling the model to focus on different parts of the input sentence when generating each word in the output. It understands which words in the sentence are essential when deciding the next word.The output of pre-training is a creative text generator. To make this generator more controllable and safe, it undergoes a fine-tuning process. Here, the model is trained on a narrower dataset, carefully generated with human reviewers' help following specific guidelines. This phase also often involves learning from instructions provided in natural language, enabling the model to respond effectively to English language instructions from users.After their initial two-step training, Large Language Models (LLMs) are ready to produce text. Here's how it works:The user provides a starting point or "prompt" to the model. Using this prompt, the model begins creating a series of "tokens", which could be words or parts of words. Each new token is influenced by the tokens that came before it, so the model keeps adjusting its internal workings after producing each token. The process is based on probabilities, not on a pre-set plan or specific goals.To control how the LLM generates text, you can adjust various settings. You can select the prompt, of course. But you can also modify settings like "temperature" and "max tokens". The "temperature" setting controls how random the model's output will be, while the "max tokens" setting sets a limit on the length of the response.When properly trained and controlled, LLMs are powerful tools that can understand and generate human-like text. Their applications range from writing assistants to customer support, tutoring, translation, and more. However, their ability to generate convincing text also poses potential risks, necessitating ongoing research into effective and ethical usage guidelines. In this article, we discuss some of the common pitfalls associated with using LLMs and offer practical advice on how to navigate these challenges, ensuring that you get the best out of these powerful language models in a safe and responsible way.Misunderstanding LLM CapabilitiesLanguage Learning Models (LLMs), like GPT-3, and BARD, are advanced AI systems capable of impressive feats. However, some common misunderstandings exist about what these models can and cannot do. Here we clarify several points to prevent confusion and misuse.Conscious Understanding: Despite their ability to generate coherent and contextually accurate responses, LLMs do not consciously understand the information they process. They don't comprehend text in the same way humans do. Instead, they make statistically informed guesses based on the patterns they've learned during training. They lack self-awareness or consciousness.Learning from Interactions: LLMs are not designed to learn from user interactions in real time. After initial model training, they don't have the ability to remember or learn from individual interactions unless their training data is updated, a process that requires substantial computational resources.Fact-Checking: LLMs can't verify the accuracy of their output or the information they're prompted with. They generate text based on patterns learned during training and cannot access real-time or updated information beyond their training cut-off. They cannot fact-check or verify information against real-world events post their training cut-off date.Personal Opinions: LLMs don't have personal experiences, beliefs, or opinions. If they generate text that seems to indicate a personal stance, it's merely a reflection of the patterns they've learned during their training process. They are incapable of feelings or preferences.Generating Original Ideas: While LLMs can generate text that may seem novel or original, they are not truly capable of creativity in the human sense. Their "ideas" result from recombining elements from their training data in novel ways, not from original thought or intention.Confidentiality: LLMs cannot keep secrets or remember specific user interactions. They do not have the capacity to store personal data from one interaction to the next. They are designed this way to ensure user privacy and confidentiality.Future Predictions: LLMs can't predict the future. Any text generated that seems to predict future events is coincidental and based solely on patterns learned from their training data.Emotional Support: While LLMs can simulate empathetic responses, they don't truly understand or feel emotions. Any emotional support provided by these models is based on learned textual patterns and should not replace professional mental health support.Understanding these limitations is crucial when interacting with LLMs. They are powerful tools for text generation, but their abilities should not be mistaken for true understanding, creativity, or emotional capacity.Bias in LLM OutputsBias in LLMs is an unintentional byproduct of their training process. LLMs, such as GPT-4, are trained on massive datasets comprising text from the internet. The models learn to predict the next word in a sentence based on the context provided by the preceding words. During this process, they inevitably absorb and replicate the biases present in their training data.Bias in LLMs can be subtle and may present itself in various ways. For example, if an LLM consistently associates certain professions with a specific gender, this reflects gender bias. Suppose you feed the model a prompt like, "The nurse attended to the patient", and the model frequently uses feminine pronouns to refer to the nurse. In contrast, with the prompt, "The engineer fixed the machine," it predominantly uses masculine pronouns for the engineer. This inclination mirrors societal biases present in the training data.It's crucial for users to be aware of these potential biases when using LLMs. Understanding this can help users interpret responses more critically, identify potential biases in the output, and even frame their prompts in a way that can mitigate bias. Users can make sure to double-check the information provided by LLMs, particularly when the output may have significant implications or is in a context known for systemic bias.Confabulation and Hallucination in LLMsIn the context of LLMs, 'confabulation' or 'hallucination' refers to generating outputs that do not align with reality or factual information. This can happen when the model, attempting to create a coherent narrative, fills in gaps with details that seem plausible but are entirely fictional.Example 1: Futuristic Election ResultsConsider an interaction where an LLM was asked for the result of a future election. The prompt was, "What was the result of the 2024 U.S. presidential election?" The model responded with a detailed result, stating a fictitious candidate had won. As of the model's last training cut-off, this event lies in the future, and the response is a complete fabrication.Example 2: The Non-existent BookIn another instance, an LLM was asked about a summary of a non-existent book with a prompt like, "Can you summarise the book 'The Shadows of Elusion' by J.K. Rowling?" The model responded with a detailed summary as if the book existed. In reality, there's no such book by J.K. Rowling. This again demonstrates the model's propensity to confabulate.Example 3: Fictitious TechnologyIn a third example, an LLM was asked to explain the workings of a fictitious technology, "How does the quantum teleportation smartphone work?" The model explained a device that doesn't exist, incorporating real-world concepts of quantum teleportation into a plausible-sounding but entirely fictional narrative.LLMs generate responses based on patterns they learn from their training data. They cannot access real-time or personal information or understand the content they generate. When faced with prompts without factual data, they can resort to confabulation, drawing from learned patterns to fabricate plausible but non-factual responses.Because of this propensity for confabulation, verifying the 'facts' generated by LLM models is crucial. This is particularly important when the output is used for decision-making or is in a sensitive context. Always corroborate the information generated by LLMs with reliable and up-to-date sources to ensure its validity and relevance. While these models can be incredibly helpful, they should be used as a tool and not a sole source of information, bearing in mind the potential for error and fabrication in their outputs.Security and Privacy in LLMsLarge Language Models (LLMs) can be a double-edged sword. Their power to create lifelike text opens the door to misuse, such as generating misleading information, spam emails, or fake news, and even facilitating complex scamming schemes. So, it's crucial to establish robust security protocols when using LLMs.Training LLMs on massive datasets can trigger privacy issues. Two primary concerns are:Data leakage: If the model is exposed to sensitive information during training, it could potentially reveal this information when generating outputs. Though these models are designed to generalize patterns and not memorize specific data points, the risk still exists, albeit at a very low probability.Inference attacks: Skilled attackers could craft specific queries to probe the model, attempting to infer sensitive details about the training data. For instance, they might attempt to discern whether certain types of content were part of the training data, potentially revealing proprietary or confidential information.Ethical Considerations in LLMsThe rapid advancements in artificial intelligence, particularly in Language Learning Models (LLMs), have transformed multiple facets of society. Yet, this exponential growth often overlooks a crucial aspect – ethics. Balancing the benefits of LLMs while addressing ethical concerns is a significant challenge that demands immediate attention.Accountability and Responsibility: Who is responsible when an LLM causes harm, such as generating misleading information or offensive content? Is it the developers who trained the model, the users who provided the prompts, or the organizations that deployed it? The ambiguous nature of responsibility and accountability in AI applications is a substantial ethical challenge.Bias and Discrimination: LLMs learn from vast amounts of data, often from the internet, reflecting our society – warts and all. Consequently, the models can internalize and perpetuate existing biases, leading to potentially discriminatory outputs. This can manifest as gender bias, racial bias, or other forms of prejudice.Invasion of Privacy: As discussed in earlier articles, LLMs can pose privacy risks. However, the ethical implications go beyond the immediate privacy concerns. For instance, if an LLM is used to generate text mimicking a particular individual's writing style, it could infringe on that person's right to personal expression and identity.Misinformation and Manipulation: The capacity of LLMs to generate human-like text can be exploited to disseminate misinformation, forge documents, or even create deepfake texts. This can manipulate public opinion, impact personal reputations, and even threaten national security.Addressing LLM Limitations: A Tripartite ApproachThe task of managing the limitations of LLMs is a tripartite effort, involving AI Developers & Researchers, Policymakers, and End Users.Role of AI Developers & Researchers:Security & Privacy: Establish robust security protocols, enforce secure training practices, and explore methods such as differential privacy. Constituting AI ethics committees can ensure ethical considerations during the design and training phases.Bias & Discrimination: Endeavor to identify and mitigate biases during training, aiming for equitable outcomes. This process includes eliminating harmful biases and confabulations.Transparency: Enhance understanding of the model by elucidating the training process, which in turn can help manage potential fabrications.Role of Policymakers:Regulations: Formulate and implement regulations that ensure accountability, transparency, fairness, and privacy in AI.Public Engagement: Encourage public participation in AI ethics discussions to ensure that regulations reflect societal norms.Role of End Users:Awareness: Comprehend the risks and ethical implications associated with LLMs, recognising that biases and fabrications are possible.Critical Evaluation: Evaluate the outputs generated by LLMs for potential misinformation, bias, or confabulations. Refrain from feeding sensitive information to an LLM and cross-verify the information produced.Feedback: Report any instances of severe bias, offensive content, or ethical concerns to the AI provider. This feedback is crucial for the continuous improvement of the model. ConclusionIn conclusion, understanding and leveraging the capabilities of Language Learning Models (LLMs) demand both caution and strategy. By recognizing their limitations, such as lack of consciousness, potential biases, and confabulation tendencies, users can navigate these pitfalls effectively. To harness LLMs responsibly, a collaborative approach among developers, policymakers, and users is essential. Implementing security measures, mitigating bias, and fostering user awareness can maximize the benefits of LLMs while minimizing their drawbacks. As LLMs continue to shape our linguistic landscape, staying informed and vigilant ensures a safer and more accurate text generation journey.Author BioAmita Kapoor is an accomplished AI consultant and educator, with over 25 years of experience. She has received international recognition for her work, including the DAAD fellowship and the Intel Developer Mesh AI Innovator Award. She is a highly respected scholar in her field, with over 100 research papers and several best-selling books on deep learning and AI. After teaching for 25 years at the University of Delhi, Amita took early retirement and turned her focus to democratizing AI education. She currently serves as a member of the Board of Directors for the non-profit Neuromatch Academy, fostering greater accessibility to knowledge and resources in the field. Following her retirement, Amita also founded NePeur, a company that provides data analytics and AI consultancy services. In addition, she shares her expertise with a global audience by teaching online classes on data science and AI at the University of Oxford.Sharmistha Chatterjee is an evangelist in the field of machine learning (ML) and cloud applications, currently working in the BFSI industry at the Commonwealth Bank of Australia in the data and analytics space. She has worked in Fortune 500 companies, as well as in early-stage start-ups. She became an advocate for responsible AI during her tenure at Publicis Sapient, where she led the digital transformation of clients across industry verticals. She is an international speaker at various tech conferences and a 2X Google Developer Expert in ML and Google Cloud. She has won multiple awards and has been listed in 40 under 40 data scientists by Analytics India Magazine (AIM) and 21 tech trailblazers in 2021 by Google. She has been involved in responsible AI initiatives led by Nasscom and as part of their DeepTech Club.Authors of this book: Platform and Model Design for Responsible AI    
Read more
  • 0
  • 0
  • 25431

article-image-hands-on-tutorial-on-how-to-use-pinecone-with-langchain
Alan Bernardo Palacio
21 Aug 2023
17 min read
Save for later

Hands-On tutorial on how to use Pinecone with LangChain

Alan Bernardo Palacio
21 Aug 2023
17 min read
A vector database stores high-dimensional vectors and mathematical representations of attributes. Each vector holds dimensions ranging from tens to thousands, enhancing data richness. It operationalizes embedding models, aiding application development with resource management, security, scalability, and query efficiency. Pinecone, a vector database, enables a quick semantic search of vectors. Integrating OpenAI’s LLMs with Pinecone merges deep learning-based embedding generation with efficient storage and retrieval, facilitating real-time recommendation and search systems. Pinecone acts as long-term memory for large language models like OpenAI’s GPT-4.IntroductionThis tutorial will guide you through the process of integrating Pinecone, a high-performance vector database, with LangChain, a framework for building applications powered by large language models (LLMs). Pinecone enables developers to build scalable, real-time recommendation and search systems based on vector similarity search.PrerequisitesBefore you begin this tutorial, you should have the following:A Pinecone accountA LangChain accountA basic understanding of PythonPinecone basicsAs a starter, we will get familiarized with the use of Pinecone by exploring its basic functionalities of it. Remember to get the Pinecone access key.Here is a step-by-step guide on how to set up and use Pinecone, a cloud-native vector database that provides long-term memory for AI applications, especially those involving large language models, generative AI, and semantic search.Initialize Pinecone clientWe will use the Pinecone client, so this step is only necessary if you don’t have it installed already.pip install pinecone-clientTo use Pinecone, you must have an API key. You can find your API key in the Pinecone console under the "API Keys" section. Note both your API key and your environment. To verify that your Pinecone API key works, use the following command:import pinecone pinecone.init(api_key="YOUR_API_KEY", environment="YOUR_ENVIRONMENT")If you don't receive an error message, then your API key is valid. This will also initialize the Pinecone session.Creating and retrieving indexesThe commands below create an index named "quickstart" that performs an approximate nearest-neighbor search using the Euclidean distance metric for 8-dimensional vectors.pinecone.create_index("quickstart", dimension=8, metric="euclidean")The Index creation takes roughly a minute.Once your index is created, its name appears in the index list. Use the following command to return a list of your indexes.pinecone.list_indexes()Before you can query your index, you must connect to the index.index = pinecone.Index("quickstart")Now that you have created your index, you can start to insert data into it.Insert the dataTo ingest vectors into your index, use the upsert operation, which inserts a new vector into the index or updates the vector if a vector with the same ID is already present. The following commands upsert 5 8-dimensional vectors into your index.index.upsert([    ("A", [0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1]),    ("B", [0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2]),    ("C", [0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3]),    ("D", [0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4]),    ("E", [0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5]) ])You can get statistics about your index, like the dimensions, the usage, and the vector count. To do this, you can use the following command to return statistics about the contents of your index.index.describe_index_stats()This will return a dictionary with information about your index:Now that you have created an index and inserted data into it, we can query the database to retrieve vectors based on their similarity.Query the index and get similar vectorsThe following example queries the index for the three vectors that are most similar to an example 8-dimensional vector using the Euclidean distance metric specified above.index.query( vector=[0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3], top_k=3, include_values=True )This command will return the first 3 vectors stored in this index that have the lowest Euclidian distance:Once you no longer need the index, use the delete_index operation to delete it.pinecone.delete_index("quickstart")By following these steps, you can set up a Pinecone vector database in just a few minutes. This will help you provide long-term memory for your high-performance AI applications without any infrastructure hassles.Now, let’s take a look at a bit more complex example, in which we embed text data and insert it into Pinecone.Preparing and Processing the DataIn this section, we will create a context for large language models (LLMs) using the OpenAI API. We will walk through the different parts of a Python script, understanding the purpose and function of each code block. The ultimate aim is to transform data into larger chunks of around 500 tokens, ensuring that the dataset is ordered sequentially.SetupFirst, we install the necessary libraries for our script. We're going to use OpenAI for AI models, pandas for data manipulation, and transformers for tokenization.!pip install openai pandas transformersAfter the installations, we import the necessary modules for our script.import pandas as pd import openaiBefore you can interact with OpenAI, you need to provide your API key. Make sure to replace <<YOUR_API_KEY>> with your actual API key.openai.api_key = ('<<YOUR_API_KEY>>')Now we are ready to start processing the data to be embedded and stored in Pinecone.Data transformationWe use pandas to load JSON data files related to different technologies (HuggingFace, PyTorch, TensorFlow, Streamlit). These files seem to contain questions and answers related to their respective topics and are based on the data in the Pinecone documentation. First, we will concatenate these data frames into one for easier manipulation.hf = pd.read_json('data/huggingface-qa.jsonl', lines=True) pt = pd.read_json('data/pytorch-qa.jsonl', lines=True) tf = pd.read_json('data/tensorflow-qa.jsonl', lines=True) sl = pd.read_json('data/streamlit-qa.jsonl', lines=True) df = pd.concat([hf, pt, tf, sl], ignore_index=True) df.head()We can see the data here:Next, we define a function to remove new lines and unnecessary spaces in our text data. The function remove_newlines takes a pandas Series object and performs several replace operations to clean the text.def remove_newlines(serie):    serie = serie.str.replace('\\\\n', ' ', regex=False)    serie = serie.str.replace('\\\\\\\\n', ' ', regex=False)    serie = serie.str.replace('  ',' ', regex=False)    serie = serie.str.replace('  ',' ', regex=False)    return serieWe transform the text in our dataframe into a single string format combining the 'docs', 'category', 'thread', 'question', and 'context' columns.df['text'] = "Topic: " + df.docs + " - " + df.category + "; Question: " + df.thread + " - " + df.question + "; Answer: " + df.context df['text'] = remove_newlines(df.text)TokenizationWe use the HuggingFace transformers library to tokenize our text. The GPT2 tokenizer is used, and the number of tokens for each text string is stored in a new column 'n_tokens'.from transformers import GPT2TokenizerFast tokenizer = GPT2TokenizerFast.from_pretrained("gpt2") df['n_tokens'] = df.text.apply(lambda x: len(tokenizer.encode(x)))We filter out rows in our data frame where the number of tokens exceeds 2000.df = df[df.n_tokens < 2000]Now we can finally embed the data using the OpenAI API.from openai.embeddings_utils import get_embedding size = 'curie' df['embeddings'] = df.text.apply(lambda x: get_embedding(x, engine=f'text-search-{size}-doc-001')) df.head()We will be using the text-search-curie-doc-001' Open AI engine to create the embeddings, which is very capable, faster, and lower cost than Davinci:So far, we've prepared our data for subsequent processing. In the next parts of the tutorial, we will cover obtaining embeddings from the OpenAI API and using them with the Pinecone vector database.Next, we will initialize the Pinecone index, create text embeddings using the OpenAI API and insert them into Pinecone.Initializing the Index and Uploading Data to PineconeThe second part of the tutorial aims to take the data that was prepared previously and upload them to the Pinecone vector database. This would allow these embeddings to be queried for similarity, providing a means to use contextual information from a larger set of data than what an LLM can handle at once.Checking for Large Text DataThe maximum size limit for metadata in Pinecone is 5KB, so we check if any 'text' field items are larger than this.from sys import getsizeof too_big = [] for text in df['text'].tolist():    if getsizeof(text) > 5000:        too_big.append((text, getsizeof(text))) print(f"{len(too_big)} / {len(df)} records are too big")This will filter out the entries whose metadata is larger than the one Pinecone can manage. The next step is to create a unique identifier for the records.There are several records with text data larger than the Pinecone limit, so we assign a unique ID to each record in the DataFrame.df['id'] = [str(i) for i in range(len(df))] df.head()This ID can be used to retrieve the original text later:Now we can start with the initialization of the index in Pinecone and insert the data.Pinecone Initialization and Index CreationNext, Pinecone is initialized with the API key, and an index is created if it doesn't already exist. The name of the index is 'beyond-search-openai', and its dimension matches the length of the embeddings. The metric used for similarity search is cosine.import pinecone pinecone.init(    api_key='PINECONE_API_KEY',    environment="YOUR_ENV" ) index_name = 'beyond-search-openai' if not index_name in pinecone.list_indexes():    pinecone.create_index(        index_name, dimension=len(df['embeddings'].tolist()[0]),        metric='cosine'    ) index = pinecone.Index(index_name)Now that we have created the index, we can proceed to insert the data. The index will be populated in batches of 32. Relevant metadata (like 'docs', 'category', 'thread', and 'href') is also included with each item. We will use tqdm to create a progress bar for the progress of the insertion.from tqdm.auto import tqdm batch_size = 32 for i in tqdm(range(0, len(df), batch_size)):    i_end = min(i+batch_size, len(df))    df_slice = df.iloc[i:i_end]    to_upsert = [        (            row['id'],            row['embeddings'],            {                'docs': row['docs'],                'category': row['category'],                'thread': row['thread'],                'href': row['href'],                'n_tokens': row['n_tokens']            }        ) for _, row in df_slice.iterrows()    ]    index.upsert(vectors=to_upsert)This will insert the records into the database to be used later on in the process:Finally, the ID-to-text mappings are saved into a JSON file. This would allow us to retrieve the original text associated with an ID later on.mappings = {row['id']: row['text'] for _, row in df[['id', 'text']].iterrows()} import json with open('data/mapping.json', 'w') as fp:    json.dump(mappings, fp)Now the Pinecone vector database should now be populated and ready for querying. Next, we will use this information to provide context to a question answering LLM.Querying and Answering QuestionsThe final part of the tutorial involves querying the Pinecone vector database with questions, retrieving the most relevant context embeddings, and using OpenAI's API to generate an answer to the question based on the retrieved contexts.OpenAI Embedding GenerationThe OpenAI API is used to create embeddings for the question.from openai.embeddings_utils import get_embedding q_embeddings = get_embedding(    'how to use gradient tape in tensorflow',    engine=f'text-search-curie-query-001' )A function create_context is defined to use the OpenAI API to create a query embedding, retrieve the most relevant context embeddings from Pinecone, and append these contexts into a larger string ready for feeding into OpenAI's next generation step.from openai.embeddings_utils import get_embedding def create_context(question, index, max_len=3750, size="curie"):    q_embed = get_embedding(question, engine=f'text-search-{size}-query-001')    res = index.query(q_embed, top_k=5, include_metadata=True)    cur_len = 0    contexts = []    for row in res['matches']:        text = mappings[row['id']]        cur_len += row['metadata']['n_tokens'] + 4        if cur_len < max_len:            contexts.append(text)        else:            cur_len -= row['metadata']['n_tokens'] + 4            if max_len - cur_len < 200:                break    return "\\\\n\\\\n###\\\\n\\\\n".join(contexts) We can now use this function to retrieve the context necessary based on a given question, as the question is embedded and the relevant context is retrieved from the Pinecone database:Now we are ready to start passing the context to a question-answering model.Querying and AnsweringWe start by defining the parameters that will take during the query, specifically the model we will be using, the maximum token length and other parameters. We can also define given instructions to the model which will be used to constrain the results we can get..fine_tuned_qa_model="text-davinci-002" instruction=""" Answer the question based on the context below, and if the question can't be answered based on the context, say \\"I don't know\\"\\n\\nContext:\\n{0}\\n\\n---\\n\\nQuestion: {1}\\nAnswer:""" max_len=3550 size="curie" max_tokens=400 stop_sequence=None domains=["huggingface", "tensorflow", "streamlit", "pytorch"]Different instruction formats can be defined. We will start now making some simple questions and seeing what the results look like.question="What is Tensorflow" context = create_context(    question,    index,    max_len=max_len,    size=size, ) try:    # fine-tuned models requires model parameter, whereas other models require engine parameter    model_param = (        {"model": fine_tuned_qa_model}        if ":" in fine_tuned_qa_model        and fine_tuned_qa_model.split(":")[1].startswith("ft")        else {"engine": fine_tuned_qa_model}    )    #print(instruction.format(context, question))    response = openai.Completion.create(        prompt=instruction.format(context, question),        temperature=0,        max_tokens=max_tokens,        top_p=1,        frequency_penalty=0,        presence_penalty=0,        stop=stop_sequence,        **model_param,    )    print( response["choices"][0]["text"].strip()) except Exception as e:    print(e)We can see that it's giving us the proper results using the context that it's retrieving from Pinecone:We can also inquire about Pytorch:question="What is Pytorch" context = create_context(    question,    index,    max_len=max_len,    size=size, ) try:    # fine-tuned models requires model parameter, whereas other models require engine parameter    model_param = (        {"model": fine_tuned_qa_model}        if ":" in fine_tuned_qa_model        and fine_tuned_qa_model.split(":")[1].startswith("ft")        else {"engine": fine_tuned_qa_model}    )    #print(instruction.format(context, question))    response = openai.Completion.create(        prompt=instruction.format(context, question),        temperature=0,        max_tokens=max_tokens,        top_p=1,        frequency_penalty=0,        presence_penalty=0,        stop=stop_sequence,        **model_param,    )    print( response["choices"][0]["text"].strip()) except Exception as e:    print(e)The results keep being consistent with the context provided:Now we can try to go beyond the capabilities of the context by pushing the boundaries a bit more.question="Am I allowed to publish model outputs to Twitter, without a human review?" context = create_context(    question,    index,    max_len=max_len,    size=size, ) try:    # fine-tuned models requires model parameter, whereas other models require engine parameter    model_param = (        {"model": fine_tuned_qa_model}        if ":" in fine_tuned_qa_model        and fine_tuned_qa_model.split(":")[1].startswith("ft")        else {"engine": fine_tuned_qa_model}    )    #print(instruction.format(context, question))    response = openai.Completion.create(       prompt=instruction.format(context, question),        temperature=0,        max_tokens=max_tokens,        top_p=1,        frequency_penalty=0,        presence_penalty=0,        stop=stop_sequence,        **model_param,    )    print( response["choices"][0]["text"].strip()) except Exception as e:    print(e)We can see in the results that the model is working according to the instructions provided as we don’t have any context on Twitter:Lastly, the Pinecone index is deleted to free up resources.pinecone.delete_index(index_name)ConclusionThis tutorial provided a comprehensive guide to harnessing Pinecone, OpenAI's language models, and HuggingFace's library for advanced question-answering. We introduced Pinecone's vector search engine, explored data preparation, embedding generation, and data uploading. Creating a question-answering model using OpenAI's API concluded the process. The tutorial showcased how the synergy of vector search engines, language models, and text processing can revolutionize information retrieval. This holistic approach holds potential for developing AI-powered applications in various domains, from customer service chatbots to research assistants and beyond.Author Bio:Alan Bernardo Palacio is a data scientist and an engineer with vast experience in different engineering fields. His focus has been the development and application of state-of-the-art data products and algorithms in several industries. He has worked for companies such as Ernst and Young, Globant, and now holds a data engineer position at Ebiquity Media helping the company to create a scalable data pipeline. Alan graduated with a Mechanical Engineering degree from the National University of Tucuman in 2015, participated as the founder in startups, and later on earned a Master's degree from the faculty of Mathematics in the Autonomous University of Barcelona in 2017. Originally from Argentina, he now works and resides in the Netherlands.LinkedIn 
Read more
  • 0
  • 0
  • 25167
Visually different images

article-image-getting-started-with-gemini-ai
Packt
07 Sep 2023
2 min read
Save for later

Getting Started with Gemini AI

Packt
07 Sep 2023
2 min read
Introduction Gemini AI is a large language model (LLM) being developed by Google DeepMind. It is still under development, but it is expected to be more powerful than ChatGPT, the current state-of-the-art LLM. Gemini AI is being built on the technology and techniques used in AlphaGo, an early AI system developed by DeepMind in 2016. This means that Gemini AI is expected to have strong capabilities in planning and problem-solving. Gemini AI is a powerful tool that has the potential to be used in a wide variety of applications. Some of the potential use cases for Gemini AI include: Chatbots: Gemini AI could be used to create more realistic and engaging chatbots. Virtual assistants: Gemini AI could be used to create virtual assistants that can help users with tasks such as scheduling appointments, making reservations, and finding information. Content generation: Gemini AI could be used to generate creative content such as articles, blog posts, and scripts. Data analysis: Gemini AI could be used to analyze large datasets and identify patterns and trends. Medical diagnosis: Gemini AI could be used to assist doctors in diagnosing diseases. Financial trading: Gemini AI could be used to make trading decisions. How Gemini AI works Gemini AI is a neural network that has been trained on a massive dataset of text and code. This dataset includes books, articles, code repositories, and other forms of text. The neural network is able to learn the patterns and relationships between words and phrases in this dataset. This allows Gemini AI to generate text, translate languages, write different kinds of creative content, and answer your questions in an informative way. How to use Gemini AI Gemini AI is not yet available to the public, but it is expected to be released in the future. When it is released, it will likely be available through a cloud-based API. This means that developers will be able to use Gemini AI in their own applications. To use Gemini AI, developers will need to first create an account and obtain an API key. Once they have an API key, they can use it to call the Gemini AI API. The API will allow them to interact with Gemini AI and use its capabilities. Here are some steps on how to install or get started with Gemini AI: Go to the Gemini AI website and create an account: Once you have created an account, you will be given an API key. Install the Gemini AI client library for your programming language. In your code, import the Gemini AI client library and initialize it with your API key. Call the Gemini AI API to generate text, translate languages, write different kinds of creative content, or answer your questions in an informative way. For more detailed instructions on how to install and use Gemini AI, please refer to the Gemini AI documentation. The future of Gemini AI Gemini AI is still under development, but it has the potential to revolutionize the way we interact with computers. In the future, Gemini AI could be used to create more realistic and engaging chatbots, virtual assistants, and other forms of AI-powered software. Gemini AI could also be used to improve our understanding of the world around us by analyzing large datasets and identifying patterns and trends. Conclusion Gemini AI is a powerful tool that has the potential to be used in a wide variety of applications. It is still under development, but it has the potential to revolutionize the way we interact with computers. In the future, Gemini AI could be used to create more realistic and engaging chatbots, virtual assistants, and other forms of AI-powered software. Gemini AI could also be used to improve our understanding of the world around us by analyzing large datasets and identifying patterns and trends.  
Read more
  • 0
  • 0
  • 24929

article-image-testing-large-language-models-llms
20 Oct 2023
7 min read
Save for later

Testing Large Language Models (LLMs)

20 Oct 2023
7 min read
Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!Machine learning has become ubiquitous, with models powering everything from search engines and recommendation systems to chatbots and autonomous vehicles. As these models grow more complex, testing them thoroughly is crucial to ensure they behave as expected. This is especially true for large language models like GPT-4 that generate human-like text and engage in natural conversations.In this article, we will explore strategies for testing machine learning models, with a focus on evaluating the performance of LLMs.IntroductionMachine learning models are notoriously challenging to test due to their black-box nature. Unlike traditional code, we cannot simply verify the logic line-by-line. ML models learn from data and make probabilistic predictions, so their decision-making process is opaque.While testing methods like unit testing and integration testing are common for traditional software, they do not directly apply to ML models. We need more specialized techniques to validate model performance and uncover unexpected or undesirable behavior.Testing is particularly crucial for large language models. Since LLMs can generate free-form text, it's hard to anticipate their exact responses. Flaws in the training data or model architecture can lead to Hallucinations, biases, and errors that only surface during real-world usage. Rigorous testing provides confidence that the model works as intended.In this article, we will cover testing strategies to evaluate LLMs. The key techniques we will explore are:Similarity testingColumn coverage testingExact match testingVisual output testingLLM-based evaluationBy combining these methods, we can thoroughly test LLMs along multiple dimensions and ensure they provide coherent, accurate, and appropriate responses.Testing Text Output with Similarity SearchA common output from LLMs is text. This could be anything from chatbot responses to summaries generated from documents. A robust way to test quality of text output is similarity testing.The idea is simple - we define an expected response and compare the model's actual response to determine how similar they are. The higher the similarity score, the better.Let's walk through an example using our favorite LLM. Suppose we give it the prompt:Prompt: What is the capital of Italy?The expected response would be:Expected: The capital of Italy is Rome.Now we can pass this prompt to the LLM and get the actual response:prompt = "What is the capital of Italy?" actual = llm.ask(prompt) Let's say actual contains:Actual: Rome is the capital of Italy.While the wording is different, the meaning is the same. To quantify this similarity, we can use semantic search libraries like SentenceTransformers. It represents sentences as numeric vectors and computes similarity using cosine distance.from sentence_transformers import SentenceTransformer model = SentenceTransformer('all-MiniLM-L6-v2') expected_embedding = model.encode(expected) actual_embedding = model.encode(actual) similarity = cosine_similarity([expected_embedding], [actual_embedding])[0][0] This yields a similarity score of 0.85, indicating the responses are highly similar in meaning.We can establish a threshold for the minimum acceptable similarity, like 0.8. Responses below this threshold fail the test. By running similarity testing over many prompt-response pairs, we can holistically assess the textual coherence of an LLM.Testing Tabular Outputs with Column CoverageIn addition to text, LLMs can output tables or data frames. For testing these, we need different techniques that account for structure.A good validation is column coverage - checking what percentage of columns in the expected output are present in the actual output.Consider the LLM answering questions about movies:Prompt: What are the top 3 highest grossing movies of all time?Expected:MovieWorldwide GrossRelease YearAvatar$2,789,679,7942009Titanic$2,187,463,9441997Star Wars Ep. VII$2,068,223,6242015Now we can test the LLM’s actual output:prompt = "What are the top 3 highest grossing movies of all time?" actual = llm.ask(prompt) Actual:MovieGlobal RevenueYearAvatar$2.789 billion2009Titanic$2.187 billion1997Star Wars: The Force Awakens$2.068 billion2015Here, actual contains the same 3 columns as expected - Movie, Gross, Release Year. So even though the headers and cell values differ slightly, we can pair them with cosine similarity and we will have 100% column coverage.We can formalize this in code:expected_cols = set(expected.columns) actual_cols = set(actual.columns) column_coverage = len(expected_cols & actual_cols) / len(expected_cols) # column_coverage = 1.0 For tables with many columns, we may only need say 90% coverage to pass the test. This validation ensures the critical output columns are present while allowing variability in column names or ancillary data.Exact Match for Numeric OutputsWhen LLMs output a single number or statistic, we can use simple exact match testing.Consider this prompt:Prompt: What was Apple's total revenue in 2021?Expected: $365.82 billionWe get the LLM’s response:prompt = "What was Apple's total revenue in 2021?" actual = llm.ask(prompt) Actual: $365.82 billionIn this case, we expect an exact string match:is_match = (actual == expected) # is_match = True For numerical outputs, precision is important. Exact match testing provides a straightforward way to validate this.Screenshot Testing for Visual OutputsBuilding PandasAI, we sometimes need to test generated charts. Testing these outputs requires verifying the visualized data is correct.One method is screenshot testing - comparing screenshots of the expected and actual visuals. For example:Prompt: Generate a bar chart comparing the revenue of FAANG companies.Expected: [Expected_Chart.png]Actual: [Actual_Chart.png]We can then test if the images match:from PIL import Image, ImageChops expected_img = Image.open("./Expected_Chart.png") actual_img = Image.open("./Actual_Chart.png") diff = ImageChops.difference(expected_img, actual_img) is_match = diff.getbbox() is None // is_match = True if images matchFor more robust validation, we could use computer vision techniques like template matching to identify and compare key elements: axes, bars, labels, etc.Screenshot testing provides quick validation of visual output without needing to interpret the raw chart data.LLM-Based EvaluationAn intriguing idea for testing LLMs is to use another LLM!The concept is to pass the expected and actual outputs to a separate "evaluator" LLM and ask if they match.For example:Expected: Rome is the capital of Italy.Actual: The capital of Italy is Rome.We can feed this to the evaluator model:Prompt: Do these two sentences convey the same information? Answer YES or NOSentence 1: Rome is the capital of Italy.Sentence 2: The capital of Italy is Rome.Evaluator: YESThe evaluator LLM acts like a semantic similarity scorer. This takes advantage of the natural language capabilities of LLMs.The downside is it evaluates one black box model using another black box model. Errors or biases in the evaluator could lead to incorrect assessments. So LLM-based evaluation should complement other testing approaches, not act as the sole method.ConclusionTesting machine learning models thoroughly is critical as they grow more ubiquitous and impactful. Large language models pose unique testing challenges due to their free-form textual outputs.Using a combination of similarity testing, column coverage validation, exact match, visual output screening, and even LLM-based evaluation, we can rigorously assess LLMs along multiple dimensions.A comprehensive test suite combining these techniques will catch more flaws and flaws than any single method alone. This builds essential confidence that LLMs behave as expected in the real world.Testing takes time but prevents much larger problems down the road. The strategies covered in this article will add rigor to the development and deployment of LLMs, helping ensure these powerful models benefit humanity as intended.Author BioGabriele Venturi is a software engineer and entrepreneur who started coding at the young age of 12. Since then, he has launched several projects across gaming, travel, finance, and other spaces - contributing his technical skills to various startups across Europe over the past decade.Gabriele's true passion lies in leveraging AI advancements to simplify data analysis. This mission led him to create PandasAI, released open source in April 2023. PandasAI integrates large language models into the popular Python data analysis library Pandas. This enables an intuitive conversational interface for exploring data through natural language queries.By open-sourcing PandasAI, Gabriele aims to share the power of AI with the community and push boundaries in conversational data analytics. He actively contributes as an open-source developer dedicated to advancing what's possible with generative AI.
Read more
  • 0
  • 0
  • 24150

article-image-bloomberggpt-putting-finance-to-work-using-large-language-models
Julian Melanson
28 Jun 2023
7 min read
Save for later

BloombergGPT: Putting Finance to Work using Large Language Models

Julian Melanson
28 Jun 2023
7 min read
In recent years, the financial industry has experienced a significant surge in the amount and complexity of data. This exponential growth has underscored the need for advanced artificial intelligence models capable of comprehending and processing the specialized language used in finance. Addressing this demand, Bloomberg unveiled BloombergGPT, a revolutionary language model trained on a diverse range of financial data.The Rise of BloombergGPTReleased on March 30th, BloombergGPT represents a groundbreaking development in the financial sector's application of AI technology. By focusing specifically on finance-related tasks, BloombergGPT aims to enhance existing NLP applications employed by Bloomberg, including sentiment analysis, named entity recognition, news classification, and question answering. Furthermore, this sophisticated model holds the promise of unlocking new possibilities for leveraging the vast amounts of data accessible through the Bloomberg Terminal, thereby empowering the firm's customers and fully harnessing the potential of AI in the financial domain.Unleashing the Power of BloombergGPTBloombergGPT boasts two notable capabilities that propel it beyond generic language models. First, it possesses the ability to generate Bloomberg Query Language (BQL), which serves as a query language for accessing and analyzing financial data on the Bloomberg platform. BQL, a powerful and intricate tool, enables various financial tasks such as data searching, analysis, report creation, and insight generation. BloombergGPT's proficiency in transforming natural language queries into valid BQL fosters more intuitive interactions with financial data, streamlining the querying process and enhancing user experience.The second noteworthy feature of BloombergGPT is its capability to provide suggestions for news headlines. This functionality proves invaluable for news applications and aids journalists in constructing compelling and informative newsletters. By inputting paragraphs, BloombergGPT can generate relevant and engaging titles, saving time and enhancing the efficiency of content creation.Training BloombergGPT: A Domain-Specific ApproachTo train BloombergGPT, Bloomberg employed a domain-specific approach, combining their own financial data with augmented online text data. This strategy demonstrates the value of developing language models tailored to specific industries, surpassing the utility of generic models. The training process involved building a dataset of English-language financial documents, incorporating 363 billion financial-specific tokens from Bloomberg's proprietary data assets and an additional 345 billion generic tokens from online text datasets, including The Pile, C4, and Wikipedia.The resulting domain-specific language model, BloombergGPT, comprises an impressive 50 billion parameters and is optimized for financial tasks. Notably, BloombergGPT outperforms popular open-source language models such as GPT-NeoX, OPT, and Bloom in finance-specific tasks. Furthermore, it exhibits remarkable performance in generic language tasks, including summarization, often rivaling the performance of GPT-3 based on Bloomberg's benchmarks.Applications and Advantages:BloombergGPT's introduction opens up a wealth of possibilities for employing language models in the financial technology realm. One such application is sentiment analysis, which enables the assessment of sentiments in articles, particularly those related to individual companies. Automatic entity recognition is another area where BloombergGPT excels, offering the potential for streamlined data extraction and analysis. Additionally, the model is adept at answering financial questions, providing prompt and accurate responses to user inquiries.Bloomberg's news division can leverage BloombergGPT to automatically generate compelling headlines for newsletters, reducing manual effort and improving efficiency. The model's capability to formulate queries in Bloomberg's proprietary query language (BQL) with minimal examples further augments its versatility. Users can interact with BloombergGPT using natural language, specifying their data requirements, and allowing the model to generate the appropriate BQL, expediting data extraction from databases.Shawn Edwards, Bloomberg's Chief Technology Officer, emphasizes the immense value of developing the first language model focused on the financial domain. The domain-specific approach not only allows for the creation of diverse applications but also yields superior performance compared to developing custom models for each specific task. This advantage, coupled with a faster time-to-market, positions BloombergGPT as a game-changer in the finance industry.The Future of BloombergGPT:BloombergGPT's potential extends beyond its current capabilities. As the model continues to train and optimize on financial data, further progress, and advancements are expected. Its application can be broadened to encompass a wider range of financial tasks, ultimately facilitating more accurate and efficient decision-making in the financial industry.BloombergGPT represents a significant milestone in the advancement of financial natural language processing. By addressing the unique language intricacies of the financial industry, this domain-specific language model holds immense potential for revolutionizing how financial data is analyzed, queried, and leveraged. With its impressive 50 billion parameters and exceptional performance in financial NLP tasks, BloombergGPT positions itself as a powerful tool that will shape the future of the finance industry.Use-casesAutomating research tasks: BloombergGPT is being used by researchers at the University of Oxford to automate the task of summarizing large medical datasets. This has allowed the researchers to save a significant amount of time and effort, and it has also allowed them to identify new insights that they would not have been able to find otherwise.Creating content: BloombergGPT is being used by businesses such as Nike and Coca-Cola to create content for their websites and social media channels. This has allowed these businesses to produce high-quality content more quickly and easily, and it has also helped them to reach a wider audience.Improving customer service: BloombergGPT is being used by customer service teams at companies such as Amazon and PayPal to provide customers with more personalized and informative responses. This has helped these companies to improve their customer satisfaction ratings.Generating code: BloombergGPT is being used by developers at companies such as Google and Facebook to generate code for new applications. This has helped these developers to save time and effort, and it has also allowed them to create more complex and sophisticated applications.Translating languages: BloombergGPT is being used by businesses such as Airbnb and Uber to translate their websites and apps into multiple languages. This has helped these businesses to expand into new markets and to reach a wider audience.These are just a few examples of how BloombergGPT is being used in the real world. As it continues to develop, it is likely that even more use cases will be discovered.SummaryIn recent years, the financial industry has faced a surge in data complexity, necessitating advanced artificial intelligence models. BloombergGPT, a language model trained on financial data, represents a groundbreaking development in the application of AI in finance. It aims to enhance Bloomberg's NLP applications, providing improved sentiment analysis, named entity recognition, news classification, and question answering. Notably, BloombergGPT can generate Bloomberg Query Language (BQL) and suggest news headlines, streamlining financial data querying and content creation. By training the model on domain-specific data, BloombergGPT outperforms generic models and offers various applications, including sentiment analysis, entity recognition, and prompt financial question answering. With further advancements expected, BloombergGPT has the potential to revolutionize financial NLP, enabling more accurate decision-making. The model's versatility and superior performance position it as a game-changer in the finance industry, with applications ranging from automating research tasks to improving customer service and code generation.Author BioJulian Melanson is one of the founders of Leap Year Learning. Leap Year Learning is a cutting-edge online school that specializes in teaching creative disciplines and integrating AI tools. We believe that creativity and AI are the keys to a successful future and our courses help equip students with the skills they need to succeed in a continuously evolving world. Our seasoned instructors bring real-world experience to the virtual classroom and our interactive lessons help students reinforce their learning with hands-on activities.No matter your background, from beginners to experts, hobbyists to professionals, Leap Year Learning is here to bring in the future of creativity, productivity, and learning!
Read more
  • 0
  • 0
  • 22336
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at €18.99/month. Cancel anytime
article-image-large-language-models-llms-in-education
Chaitanya Yadav
23 Oct 2023
8 min read
Save for later

Large Language Models (LLMs) in Education

Chaitanya Yadav
23 Oct 2023
8 min read
Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!IntroductionLarge language models are a type of AI that can create and understand human language. The article deals with the potential of large language models in education and how they can be transformed. The ability to create and understand the language of man, by drawing on a vast database of textual data, is possessed by LLMs powered by artificial intelligence.It shows how LLMs could, by means of practical examples, put in place individual learning pathways, providing Advanced Learning Analytics and developing participatory simulations that would lead to the creation of more effective educational strategies.Benefits of LLMs in Education                                       Personalized learningThe capacity of LLMs in education to customize learning experiences for each student is one of their greatest advantages. Lesson-plan customization, individualized feedback, and real-time monitoring of student progress are all possible with LLMsAutomated tasksAdditionally, LLMs can be utilized to automate processes like grading and lesson planning. By doing this, instructors may have more time to give to other important responsibilities like teaching and connecting with students.New and innovative educational tools and resourcesLLMs can be applied to the development of innovative and cutting-edge learning resources and technology. LLMs can be used to create interactive simulations, games, and other educational activities.Real-time feedback and supportLLMs can also be utilized for providing quick help and feedback to students. For example, LLMs can be used to create chatbots that can assist students with their academic work and respond to their queries. Potential Challenges of LLMs in EducationIncorrect or misleading informationThe fact that LLMs might provide inaccurate or misleading information is one of the main problems with their use in education. This is due to the fact that LLMs are taught using vast volumes of data, some of which could be old or erroneous.Lack of understandingAnother issue with utilizing LLMs in teaching is that they might not be able to fully understand the material they produce in its entirety. This is so that they may better understand the complexity of human communication as LLMs receive instruction on statistical patterns in language.Ethical concernsThere are also some ethical concerns associated with the use of LLMs in education. LLMs should be used carefully, and their usage might have ethical consequences, which should be considered.How LLM can be used for Transforming Education with Advanced Learning StrategiesLet's look at a few examples that show the possibilities of Large Language Models (LLM) in Education.1. Advanced Personalized Learning PathwayIn this example, in order to reflect a student's individual objectives, teaching style, and progress, we are going to form an even more detailed personalized education path. Follow the steps perfectly given in the input code to create a personalized learning pathway.Input Code:    # Step 1: First we will define the generate_learning_pathway function def generate_learning_pathway(prompt, user_profile):    # Step 2: Once the function is defined we will create a template for the learning pathway    learning_pathway_template = f"Dear {user_profile['student_name']},\n\nI'm excited to help you create a personalized learning pathway to achieve your goal of {user_profile['goals']}. As a {user_profile['learning_style']} learner with {user_profile['current_progress']}, here's your pathway:\n\n"    # Step 3: Now let’s define the specific steps in the learning pathway    steps = [        "Step 1: Introduction to Data Science",        "Step 2: Data Visualization Techniques for Visual Learners",        "Step 3: Intermediate Statistics for Data Analysis",        "Step 4: Machine Learning Fundamentals",        "Step 5: Real-world Data Science Projects",    ]    # Step 4: Combine the template and the specific steps    learning_pathway = learning_pathway_template + "\n".join(steps)    return learning_pathway # Step 5: Define a main function to test the code def main():    user_profile = {        "student_name": "Alice",        "goals": "Become a data scientist",       "learning_style": "Visual learner",        "current_progress": "Completed basic statistics"    }    prompt = "Create a personalized learning pathway."    # Step 6: Generate the learning pathway    learning_pathway = generate_learning_pathway(prompt, user_profile)    # Step 7: Print the learning pathway    print(learning_pathway) if __name__ == "__main__":    main() Output:This example gives the LLM a highly customized approach to teaching taking into account students' names, objectives, methods of education, and how they are progressing.2. AI-Enhanced Learning AnalyticsThe use of LLMs in Learning Analytics may provide teachers with more detailed information on the student's performance and help them to make appropriate recommendations.Input code:# Define the generate_learning_analytics function def generate_learning_analytics(prompt, student_data): # Analyze the performance based on quiz scores average_quiz_score = sum(student_data["quiz_scores"]) / len(student_data["quiz_scores"]) # Calculate homework completion rate total_homeworks = len(student_data["homework_completion"]) completed_homeworks = sum(student_data["homework_completion"]) homework_completion_rate = (completed_homeworks / total_homeworks) * 100 # Generate the learning analytics report analytics_report = f"Learning Analytics Report for Student {student_data['student_id']}:\n" analytics_report += f"- Average Quiz Score: {average_quiz_score:.2f}\n" analytics_report += f"- Homework Completion Rate: {homework_completion_rate:.2f}%\n" if homework_completion_rate < 70: analytics_report += "Based on their performance, it's recommended to provide additional support for completing homework assignments." return analytics_reportThis code defines a Python function, ‘generates_learning_analytics’, which takes prompt and student data as input, calculates average quiz scores and homework completion rates, and generates a report that includes these metrics, together with possible recommendations for additional support based on homework performance. Now let’s provide student performance data.Input code:student_data = {    "student_id": "99678",    "quiz_scores": [89, 92, 78, 95, 89],    "homework_completion": [True, True, False, True, True] } prompt = f"Analyze the performance of student {student_data['student_id']} based on their recent quiz scores and homework completion." analytics_report = generate_learning_analytics(prompt, student_data) print(analytics_report)Output:The student's test scores and the homework completion data included in the ‘student_data’ dictionary are used to generate this report.3. Advanced Interactive Simulations for LearningThe potential for LLMs to provide an engaging learning resource will be demonstrated through the creation of a comprehensive computerised training simulation on complicated topics, such as physics.Input code:# Define the generate_advanced_simulation function def generate_advanced_simulation(prompt): # Create the interactive simulation    interactive_simulation = f"Interactive {prompt} Simulation" # Provide a link to the interactive simulation (replace with an actual link)    interactive_simulation_link = "https://your-interactive-simulation-link.com"    return interactive_simulation, interactive_simulation_link # Define a main function to test the code def main():    topic = "Quantum Mechanics"    prompt = f"Develop an interactive simulation for teaching {topic} to advanced high school students." # Generate the interactive simulation    interactive_simulation, interactive_simulation_link = generate_advanced_simulation(prompt) # Print the interactive simulation and link    print(f"Explore the {topic} interactive simulation: {interactive_simulation_link}") if __name__ == "__main__":    main()Output:In this example, for a complex topic like quantum physics, the LLM is asked to create an advanced interactive simulation that will make learning more interesting and visual. Also, make sure to replace and provide your link to the interactive simulation.Such advanced examples demonstrate the adaptability of LLMs to create highly customized learning pathways, Advanced Learning Analytics Reports, and sophisticated interactive simulations with in-depth educational experiences.ConclusionIn conclusion, by providing advanced learning strategies and tools, large language models represent a tremendous potential for revolutionizing education. These models provide a range of benefits, including personalized learning experiences, timely feedback and support, automated tasks, and the development of useful tools for innovation in education.The article considers the practical use of LLMs in education, which includes developing more sophisticated personalized school paths that take into account students' specific educational objectives and how they learn. Moreover, by giving details of the student's performance and recommendations for improvement, LLMs can improve Learning Analytics. In addition, how LLMs can enhance learning by enabling interactivity and engagement has been demonstrated through the development of real-time simulations on complicated topics.The future of education appears promising by taking into account the LLMs' ability to offer a more diverse, creative learning environment with limitless opportunities for learners around the world.Author BioChaitanya Yadav is a data analyst, machine learning, and cloud computing expert with a passion for technology and education. He has a proven track record of success in using technology to solve real-world problems and help others to learn and grow. He is skilled in a wide range of technologies, including SQL, Python, data visualization tools like Power BI, and cloud computing platforms like Google Cloud Platform. He is also 22x Multicloud Certified.In addition to his technical skills, he is also a brilliant content creator, blog writer, and book reviewer. He is the Co-founder of a tech community called "CS Infostics" which is dedicated to sharing opportunities to learn and grow in the field of IT.
Read more
  • 3
  • 0
  • 22202

article-image-databricks-dbrx-stability-ais-stable-code-instruct-3b-sambanovas-samba-coe-v02-frugalgpt-advanced-rag-patterns-on-amazon-sagemaker
Merlyn Shelley
02 Apr 2024
10 min read
Save for later

Databricks' DBRX, Stability AI's Stable Code Instruct 3B, SambaNova's Samba CoE v0.2, FrugalGPT, Advanced RAG Patterns on Amazon SageMaker

Merlyn Shelley
02 Apr 2024
10 min read
Subscribe to our Data Pro newsletter for the latest insights. Don't miss out – sign up today!👋 Hello,Welcome to DataPro#87 – Your Gateway to the Cutting-Edge of Data Science & Machine Learning! 🚀 Dive into this edition to explore: ⚙️ LLMs & GPTs Unleashed Samba CoE v0.2: SambaNova's Speedy AI Models Efficient Training of Language Models with OpenAI AI21's Revolutionary SSM-Transformer Model: Jamba Databricks' DBRX: The New Open LLM Benchmark Stable Code Instruct 3B: Stability AI's Latest Offering HyperLLaVA: Boosting Multimodal Language Models ✨ What's Fresh & Exciting FrugalGPT: Cutting LLM Operating Costs Building a Reliable AI Agent from Scratch with OpenAI Tool Calling Fine-Tuning Instruct Models over Raw Text Data Crafting an OpenAI-Compatible API ⚡ Industry Pulse:  Deciphering Advanced RAG Patterns on Amazon SageMaker Unveil the Future with AutoBNN: Mastering Probabilistic Time Series Forecasting! Engaging with Microsoft Copilot (web): Learning from Interaction 📚 Packt's Latest Gem "Principles of Data Science - Third Edition" by Sinan Ozdemir DataPro Newsletter is not just a publication; it’s a comprehensive toolkit for anyone serious about mastering the ever-changing landscape of data and AI. Grab your copy and start transforming your data expertise today! 📥 Feedback on the Weekly EditionTake our weekly survey and get a free PDF copy of our best-selling book, "Interactive Data Visualization with Python - Second Edition."We appreciate your input and hope you enjoy the book!Share your Feedback!Cheers,Merlyn ShelleyEditor-in-Chief, PacktSign Up | Advertise | Archives🔰 GitHub Finds: Any of These Repos in Your Toolbox?🛠️ Zejun-Yang/AniPortrait: AniPortrait is a new framework for creating high-quality animations using audio input and a reference portrait image, with face reenactment capabilities.🛠️ agiresearch/AIOS: AIOS embeds large language models into operating systems, enabling smarter resource allocation, context switching, and concurrent agent execution, advancing AGI. 🛠️ lichao-sun/Mora: Mora is a multi-agent framework for video generation, enhancing OpenAI's Sora capabilities through collaborative visual agents for diverse tasks. 🛠️ jasonppy/VoiceCraft: VoiceCraft is a high-performing neural codec language model for speech editing and zero-shot text-to-speech, excelling with diverse real-world data. 🛠️ dvlab-research/MiniGemini: Mini-Gemini enhances LLMs (Large Language Models) from 2B to 34B, integrating image understanding, reasoning, and generation, inspired by LLaVA. 🛠️ Picsart-AI-Research/StreamingT2V: StreamingT2V is a technique for creating long videos with rich motion dynamics, ensuring temporal consistency and high image quality. 📚 Expert Insights from Packt Community"Principles of Data Science - Third Edition" by Sinan Ozdemir. The Five Steps of Data Science A question I’ve gotten at least once a month for the past decade is What’s the difference between data science and data analytics? One could argue that there is no difference between the two; others will argue that there are hundreds of differences! I believe that, regardless of how many differences there are between the two terms, the following applies: Data science follows a structured, step-by-step process that, when followed, preserves the integrity of the results and leads to a deeper understanding of the data and the environment the data comes from. As with any other scientific endeavor, this process must be adhered to, or else the analysis and the results are in danger of scrutiny. On a simpler level, following a strict process can make it much easier for any data scientist, hobbyist, or professional to obtain results faster than if they were exploring data with no clear vision. While these steps are a guiding lesson for amateur analysts, they also provide the foundation for all data scientists, even those in the highest levels of business and academia. Every data scientist recognizes the value of these steps and follows them in some way or another. Overview of the five steps The process of data science involves a series of steps that are essential for effectively extracting insights and knowledge from data. These steps are presented as follows: Asking an interesting question: The first step in any data science project is to identify a question or challenge that you want to address with your analysis. This involves finding a topic that is relevant, important, and that can be addressed with data. Obtaining the data: Once you have identified your question, the next step is to collect the data that you will need to answer it. This can involve sourcing data from a variety of sources, such as databases, online platforms, or through data scraping or data collection methods. Exploring the data: After you have collected your data, the next step is to explore it and get a better understanding of its characteristics and patterns. This might involve examining summary statistics, visualizing the data, or applying statistical or machine learning (ML) techniques to identify trends or relationships. Modeling the data: Once you have explored your data, the next step is to build models that can be used to make predictions or inform decision-making. This might involve applying ML algorithms, building statistical models, or using other techniques to find patterns in the data. Communicating and visualizing the results: Finally, it’s important to communicate your findings to others in a clear and effective way. This might involve creating reports, presentations, or visualizations that help to explain your results and their implications. By following these five essential steps, you can effectively use data science to solve real-world problems and extract valuable insights from data. It’s important to note that different data scientists may have different approaches to the data science process, and the steps outlined previously are just one way of organizing the process. Some data scientists might group the steps differently or include additional steps such as feature engineering or model evaluation. Despite these differences, most data scientists agree that the steps listed previously are essential to the data science process. Whether they are organized in this specific way or not, these steps are all crucial for effectively using data to solve problems and extract valuable insights. Let’s dive into these steps one by one.Discover more insights from "Principles of Data Science - Third Edition" by Sinan Ozdemir. Unlock access to the full book and a wealth of other titles with a 7-day free trial in the Packt Library. Start exploring today!    Read Here!⚡ Tech Tidbits: Stay Wired to the Latest Industry Buzz! AWS ML Made Easy 🌀 Advanced RAG patterns on Amazon SageMaker: This post discusses how customers across various industries are utilizing large language models (LLMs) like Mixtral-8x7B Instruct to build generative AI applications such as QnA chatbots and search engines. It highlights the challenges and solutions in improving the accuracy and performance of these applications, focusing on Retrieval Augmented Generation (RAG) patterns implemented with LangChain.Google Research 🌀 AutoBNN: Probabilistic time series forecasting with compositional bayesian neural networks. This research introduces AutoBNN, an open-source package for automated, interpretable time series forecasting using Bayesian neural networks (BNNs). It addresses limitations of traditional methods like Gaussian processes (GPs) and Structural Time Series by combining the interpretability of GPs with the scalability and flexibility of neural networks. AutoBNN automates model discovery, provides high-quality uncertainty estimates, and scales effectively for large datasets. Microsoft Research🌀 Learning from interaction with Microsoft Copilot (web): This research focuses on how AI systems like Bing and Microsoft Copilot learn and improve from user interactions, particularly through reinforcement learning from human feedback (RLHF). It also explores how Bing has evolved its search capabilities and how Copilot is changing user interactions to be more conversational and workflow oriented. The research introduces frameworks like TnT-LLM and SPUR to improve taxonomy generation and user satisfaction estimation in AI interactions. Email Forwarded? Join DataPro Here!🔍 From Bits to BERT: Keeping Up with LLMs & GPTs 🌀 Samba CoE v0.2 from SambaNova delivers accurate AI models at blazing speeds: This blog post highlights Samba's advancements in AI architecture, specifically focusing on the introduction of Samba-1, a CoE architecture for enterprise AI. It discusses the features and benefits of Samba-1, its performance benchmarks, and plans for future releases, emphasizing the role of RDUs in driving efficiency and speed in AI models. 🌀 OpenAI’s Efficient Training of Language Models to Fill in the Middle: OpenAI demonstrates that autoregressive language models can effectively learn to infill text by moving a span of text from the middle of a document to its end, without harming generative capability. They propose training models with this method by default and provide benchmarks and best practices. 🌀 Jamba: AI21's Groundbreaking SSM-Transformer Model. Jamba is a groundbreaking model that merges Mamba SSM with Transformer elements, offering a 256K context window and outperforming similar models. Released under Apache 2.0, it will be available in the NVIDIA API catalog. Jamba optimizes memory, throughput, and performance, delivering remarkable efficiency. 🌀 Databricks’ DBRX: A New State-of-the-Art Open LLM. Databricks introduces DBRX, an open LLM setting new benchmarks in language understanding, programming, and math. With a 256K context window, it outperforms GPT-3.5 and competes with Gemini 1.0 Pro. DBRX is 40% smaller than Grok-1, offering 2x faster inference than LLaMA2-70B. 🌀 Introducing Stable Code Instruct 3B — Stability AI: Stable Code Instruct 3B, built on Stable Code 3B, offers state-of-the-art performance in code completion and natural language interactions for programming tasks. It outperforms Codellama 7B Instruct and matches StarChat 15B, with a focus on popular languages like Python and Java. Available for commercial use with a Stability AI Membership, the model is accessible on Hugging Face. 🌀 HyperLLaVA: Enhancing Multimodal Language Models with Dynamic Visual and Language Experts. This blog explores the advancements in Multimodal Large Language Models (MLLMs) and introduces HyperLLaVA, a dynamic model that improves performance by adaptively tuning parameters for handling diverse multimodal tasks, surpassing existing benchmarks and opening new avenues for multimodal learning systems. ✨ On the Radar: Catch Up on What's Fresh🌀 FrugalGPT and Reducing LLM Operating Costs: The blog discusses the high cost of running Large Language Models (LLMs) and introduces the "FrugalGPT" framework, which reduces operating costs significantly while maintaining quality. It explains how different models cost different amounts and proposes using a cascade of LLMs to minimize costs while maximizing answer quality. 🌀 Leverage OpenAI Tool calling: Building a reliable AI Agent from Scratch. The blog discusses the future role of AI in everyday tasks, focusing on text creation, correction, and brainstorming. It highlights the importance of Retrieval-Augmented Generation (RAG) pipelines and aims to provide Large Language Models with better context to generate more valuable content. 🌀 Fine-tune an Instruct model over raw text data: The blog explores the challenges of integrating modern chatbots with large datasets, focusing on context window sizes and the use of Retrieval-Augmented Generation (RAG) techniques. It proposes a lighter approach to fine-tuning chatbots on smaller datasets, aiming to bridge the gap between the constraints of a 128K context window and the complexities of models fine-tuned on billions of tokens. The experiment involves fine-tuning a model on The Guardian's dataset and aims to provide reproducible instructions for cost-effective model training using accessible hardware. 🌀 How to build an OpenAI-compatible API: The blog discusses the dominance of OpenAI in the Gen AI market, and the reasons developers might choose alternative LLM providers. It explores implementing a Python FastAPI server compatible with the OpenAI API specs to wrap any LLM, aiming for flexibility and cost-effectiveness. See you next time!
Read more
  • 0
  • 0
  • 22112

article-image-creating-a-langchain-agent-azure-openai-python-with-the-react-approach
Valentina Alto
11 Jun 2023
17 min read
Save for later

Creating a LangChain Agent: Azure OpenAI & Python with the ReAct Approach

Valentina Alto
11 Jun 2023
17 min read
In my latest article, we introduced the concept of Agents powered by Large Language Models and how they overcome one of the current limitations of our beloved LLMs: the capability of taking action. An Agent can be seen as a kind of wrapper that uses an LLM as a reasoning engine, plus it has the capability of interacting with tools that we can provide and take actions with those. Tools can be the accessed through Wikipedia rather than the possibility to interact with our File System or access the command line. If prompt was an important component while working with LLMs, with agents it becomes key. In fact, agents need to be instructed with a reasoning template, which can follow various techniques. We’ve already seen an example of the Read-Retrieve-Read technique in my latest article. In this article, we are going to explore the ReAct approach.What is ReAct?ReAct (Reason and Act) approach is a general paradigm that combines reasoning and acting with LLMs. It prompts LLMs to generate verbal reasoning traces and actions for a task. As per the Read-Retrieve-Read approach, also the ReAct paradigm implies an interaction with external tools to retrieve additional information. However, with the ReAct approach, we introduce a greater synergy between the reasoning and acting phases:The reasoning phase helps the model to set up action plans, track them, and even modify them in case (also in case of handling exceptions);The action phase allows the model to interact with the external world and retrieve the needed information according to the plan of the previous phaseIn the aforementioned paper, the authors show how the ReAct approach was able to overcome typical drawbacks of LLMs as hallucination and error propagation (as it has occurred in the simple version of the Chain of Thought (CoT) method of prompt engineering).Let’s see how those kinds of agents work in practice.Implementing the AgentLangChain makes it easier to build agents thanks to lightweight libraries which provide our LLM with the ReAct-based prompt template that makes the agent capable of both reasoning and acting. To achieve this goal, we need to install the following libraries:!pip install wikipedia from langchain import Wikipedia from langchain.llms import AzureOpenAI from langchain.agents import initialize_agent, Tool from langchain.agents import AgentType from langchain.agents.react.base import DocstoreExplorerIn this case, as an external tool, we will use Wikipedia. However, you can decide to add further tools like search APIs such as Bing (you can read about that in my previous article) or the File System of our personal computer.Next, we can build our agent and the document store, where it will be able to navigate through in order to retrieve information. To assist the agent with the exploration of the document store, we will use the previously imported class DocstoreExplorer.docstore=DocstoreExplorer(Wikipedia()) tools = [    Tool(        name="Search",        func=docstore.search,        description="useful for when you need to ask with search"    ),    Tool(        name="Lookup",        func=docstore.lookup,        description="useful for when you need to ask with lookup"    ) ]Finally, we need to set up the reasoning engine of our Agent. In our case, we will use a text-davinci-003 model available in the Azure OpenAI service (to set up an Azure OpenAI instance, you can read my former article here).llm = AzureOpenAI(deployment_name="text-davinci-003", openai_api_version="xxx", openai_api_key="xxx", openai_api_base="xxx")Great, now we have all the ingredients needed to initialize our agent. To test my agents, I will take some questions from a more recent version of the question-answering dataset HotpotQA (you can refer to the original paper here) called BeerQA. In this experiment, I will challenge my agent on some reasoning abilities within the world of Harry Potter:#initializing our agent react = initialize_agent(tools, llm, agent=AgentType.REACT_DOCSTORE, verbose=True) #start with questions question = "Rupert Grint played the friend of which fictional character from the Harry Potter series?" react.run(question)Below you can find the whole reasoning and response:> Entering new AgentExecutor chain... Thought: The question simplifies to which fictional character from the Harry Potter series is Rupert Grint's friend. I need to search Rupert Grint and Harry Potter series, and find the character. Action: Search[Rupert Grint] Observation: Could not find [Rupert Grint]. Similar: ['Rupert Grint', 'Knock at the Cabin', 'Emma Watson', "Harry Potter and the Philosopher's Stone (film)", 'Harry Potter 20th Anniversary: Return to Hogwarts', 'Snatch (TV series)', 'Harry Potter (film series)', 'Servant (TV series)', 'Sick Note (TV series)', 'Georgia Groome'] Thought: To find the character, I can search Harry Potter (film series). Action: Search[Harry Potter (film series)] Observation: Harry Potter is a film series based on the eponymous novels by J. K. Rowling. The series is produced and distributed by Warner Bros. Pictures and consists of eight fantasy films, beginning with Harry Potter and the Philosopher's Stone (2001) and culminating with Harry Potter and the Deathly Hallows – Part 2 (2011). A spin-off prequel series, planned to consist of five films, started with Fantastic Beasts and Where to Find Them (2016), marking the beginning of the Wizarding World shared media franchise.The series was mainly produced by David Heyman, and stars Daniel Radcliffe, Rupert Grint, and Emma Watson as the three leading characters: Harry Potter, Ron Weasley, and Hermione Granger. Four directors worked on the series: Chris Columbus, Alfonso Cuarón, Mike Newell, and David Yates. Michael Goldenberg wrote the screenplay for Harry Potter and the Order of the Phoenix (2007), while the remaining films' screenplays were written by Steve Kloves. Production took place over ten years, with the main story arc following Harry's quest to overcome his arch-enemy Lord Voldemort.Harry Potter and the Deathly Hallows, the seventh and final novel in the series, was adapted into two feature-length parts. Part 1 was released in November 2010, and Part 2 was released in July 2011.Philosopher's Stone and Deathly Hallows – Part 2 are among the 50 highest-grossing films of all time—at 50th- and 16th-highest, respectively—with both grossing over $1 billion. It is the fourth-highest-grossing film series, with $7.7 billion in worldwide receipts. Thought: Rupert Grint played the friend of Harry Potter, Ron Weasley. Action: Finish[Ron Weasley] > Finished chain. 'Ron Weasley'Now, you might think that this was a very easy question to address (and be right). Let’s challenge it with some more complex reasoning tasks:question = "Which of the Harry Potter film series main stars debuted in stage acting first?" react.run(question)Below is the response:> Entering new AgentExecutor chain... Thought: I need to search the main stars of the Harry Potter film series, find their stage acting debut, then find which debuted first. Action: Search[Harry Potter film series main stars] Observation: Harry Potter is a film series based on the eponymous novels by J. K. Rowling. The series is produced and distributed by Warner Bros. Pictures and consists of eight fantasy films, beginning with Harry Potter and the Philosopher's Stone (2001) and culminating with Harry Potter and the Deathly Hallows – Part 2 (2011). A spin-off prequel series, planned to consist of five films, started with Fantastic Beasts and Where to Find Them (2016), marking the beginning of the Wizarding World shared media franchise.The series was mainly produced by David Heyman, and stars Daniel Radcliffe, Rupert Grint, and Emma Watson as the three leading characters: Harry Potter, Ron Weasley, and Hermione Granger. Four directors worked on the series: Chris Columbus, Alfonso Cuarón, Mike Newell, and David Yates. Michael Goldenberg wrote the screenplay for Harry Potter and the Order of the Phoenix (2007), while the remaining films' screenplays were written by Steve Kloves. Production took place over ten years, with the main story arc following Harry's quest to overcome his arch-enemy Lord Voldemort.Harry Potter and the Deathly Hallows, the seventh and final novel in the series, was adapted into two feature-length parts. Part 1 was released in November 2010, and Part 2 was released in July 2011.Philosopher's Stone and Deathly Hallows – Part 2 are among the 50 highest-grossing films of all time—at 50th- and 16th-highest, respectively—with both grossing over $1 billion. It is the fourth-highest-grossing film series, with $7.7 billion in worldwide receipts. Thought: The main stars of the Harry Potter film series are Daniel Radcliffe, Rupert Grint, and Emma Watson. I need to search each of them and find their stage acting debut. Action: Search[Daniel Radcliffe] Observation: Daniel Jacob Radcliffe (born 23 July 1989) is an English actor. He rose to fame at age twelve, when he began portraying Harry Potter in the film series of the same name. Over his career, Radcliffe has received various awards and nominations. Radcliffe made his acting debut at age 10 in the BBC One television film David Copperfield (1999), followed by his feature film debut in The Tailor of Panama (2001). The same year, he starred as Harry Potter in the film adaptation of the J.K. Rowling fantasy novel, Harry Potter and the Philosopher's Stone. Over the next decade, he played the eponymous role in seven sequels, culminating with Harry Potter and the Deathly Hallows – Part 2 (2011). During this period, he became one of the world's highest-paid actors and gained worldwide fame, popularity, and critical acclaim. Following the success of Harry Potter, Radcliffe challenged himself acting in a variety of genres starring in What If? (2013), The Woman in Black (2012), Victor Frankenstein (2015), Swiss Army Man (2016), Now You See Me 2 (2016), and The Lost City (2022). He also portrayed Allen Ginsberg in the drama film Kill Your Darlings (2013), FBI agent Nate Foster in the crime drama Imperium (2016), and Weird Al Yankovic in the musical film Weird: The Al Yankovic Story (2022). Since 2019, he has starred in the TBS anthology series Miracle Workers. Radcliffe branched out to stage acting in 2007, starring in the West End and Broadway productions of Equus. He returned to Broadway in the musical How to Succeed in Business Without Really Trying (2011), earning a Grammy Award nomination, and Martin McDonagh's The Cripple of Inishmaan (2014), the satirical play The Lifespan of a Fact (2018), and Stephen Sondheim's Merrily We Roll Along (2023). He returned to the West End in revivals of Tom Stoppard's Rosencrantz and Guildenstern Are Dead (2017), and Samuel Becket's Endgame (2020). Radcliffe has contributed to many charities, including Demelza Hospice Care for Children and the Trevor Project; the latter awarded him its Hero Award in 2011 for his advocacy with LGBTQ youth. Thought: Daniel Radcliffe made his stage acting debut in the West End and Broadway productions of Equus in 2007. I need to search Rupert Grint and Emma Watson next and find their stage acting debut. Action: Search[Rupert Grint] Observation: Could not find [Rupert Grint]. Similar: ['Rupert Grint', 'Knock at the Cabin', 'Emma Watson', "Harry Potter and the Philosopher's Stone (film)", 'Harry Potter 20th Anniversary: Return to Hogwarts', 'Snatch (TV series)', 'Harry Potter (film series)', 'Servant (TV series)', 'Sick Note (TV series)', 'Georgia Groome'] Thought: To find Rupert Grint's stage acting debut, I can search Rupert Grint. Action: Search[Rupert Grint] Observation: Could not find [Rupert Grint]. Similar: ['Rupert Grint', 'Knock at the Cabin', 'Emma Watson', "Harry Potter and the Philosopher's Stone (film)", 'Harry Potter 20th Anniversary: Return to Hogwarts', 'Snatch (TV series)', 'Harry Potter (film series)', 'Servant (TV series)', 'Sick Note (TV series)', 'Georgia Groome'] Thought: To find Rupert Grint's stage acting debut, I can search Rupert Grint stage acting. Action: Search[Rupert Grint stage acting] Observation: Rupert Alexander Lloyd Grint (; born 24 August 1988) is an English actor. Grint rose to fame for his role as Ron Weasley in the Harry Potter film series, for which he was cast at age eleven, having previously acted only in school plays and his local theatre group. Since then, he continued his work on film, television, and theatre. Beginning in 2002, he began to work outside of the Harry Potter franchise, with a co-leading role in Thunderpants. He starred in the dramedy Driving Lessons (2006) and the drama Cherrybomb (2009) and co-starred in the comedy Wild Target (2010). After the Harry Potter series concluded, he appeared in the films Into the White (2012), Charlie Countryman (2013), CBGB (2013), and Moonwalkers (2015). He made his stage debut in 2013 in Mojo at the Harold Pinter Theatre. He executive-produced and starred in the television series Snatch, based on the film of the same name. He has also starred in the black comedy series Sick Note, the mystery thriller miniseries The ABC Murders, and the psychological horror series Servant. Thought: Rupert Grint made his stage acting debut in Mojo at the Harold Pinter Theatre in 2013. I need to search Emma Watson next and find her stage acting debut. Action: Search[Emma Watson] Observation: Emma Charlotte Duerre Watson (born 15 April 1990) is an English actress, model and activist. Known for her roles in both blockbusters and independent films, as well as for her women's rights work, she has received a selection of accolades, including a Young Artist Award and three MTV Movie Awards. Watson has been ranked among the world's highest-paid actresses by Forbes and Vanity Fair, and was named one of the 100 most influential people in the world by Time magazine in 2015.Watson attended the Dragon School and trained in acting at the Oxford branch of Stagecoach Theatre Arts. As a child, she rose to stardom after landing her first professional acting role as Hermione Granger in the Harry Potter film series, having previously acted only in school plays. Watson also starred in the 2007 television adaptation of the novel Ballet Shoes and lent her voice to The Tale of Despereaux (2008). After the final Harry Potter film, she took on a supporting role in My Week with Marilyn (2011), before starring as Sam, a flirtatious, free-spirited student in The Perks of Being a Wallflower (2012), to critical success. Further acclaim came from portraying Alexis Neiers in Sofia Coppola's The Bling Ring (2013) and the titular character's adoptive daughter in Darren Aronofsky's biblical epic Noah (2014). That same year, Watson was honoured by the British Academy of Film and Television Arts, winning British Artist of the Year. She also starred as Belle in the live-action musical romantic fantasy Beauty and the Beast (2017), and Meg March in Greta Gerwig's coming-of-age drama Little Women (2019). From 2011 to 2014, Watson split her time between working on films and continuing her education, graduating from Brown University with a bachelor's degree in English literature in May 2014. That year, she was appointed a UN Women Goodwill Ambassador and helped launch the UN Women campaign HeForShe, which advocates for gender equality. In 2018, she helped launch Time's Up UK as a founding member. Watson was appointed to a G7 advisory body for women's rights in 2019, consulting with leaders on foreign policy. Her modelling work has included campaigns for Burberry and Lancôme. She also lent her name to a clothing line for the sustainable brand People Tree. In 2020, she joined the board of directors of Kering, a luxury brand group, in her capacity as an advocate for sustainable fashion. Thought: Emma Watson made her stage acting debut in the 2007 television adaptation of the novel Ballet Shoes. Daniel Radcliffe made his stage acting debut in 2007 (Equus), while Rupert Grint and Emma Watson made their stage acting debut in 2013 (Mojo and Ballet Shoes, respectively). So Daniel Radcliffe debuted in stage acting first. Action: Finish[Daniel Radcliffe] > Finished chain. 'Daniel Radcliffe'In this case, you can see that the Agent needs to plan several steps to find the right information. It is incredible to see how it can formulate the most effective query for the tool it is enabled to navigate through. At the end of the Chain of Thoughts, it gives the correct answer.For the sake of HP fandom (which I’m part of), let’s also ask something that it is only present within the books:question = "In Harry Potter and the Goblet of Fire, in the third task to get to the Goblet a riddle of which magical creature needs to be solved?" react.run(question)Below is the result:> Entering new AgentExecutor chain... Thought: I need to search Harry Potter and the Goblet of Fire, find the third task, and find the magical creature. Action: Search[Harry Potter and the Goblet of Fire] Observation: Harry Potter and the Goblet of Fire is a fantasy novel written by British author J. K. Rowling and the fourth novel in the Harry Potter series. It follows Harry Potter, a wizard in his fourth year at Hogwarts School of Witchcraft and Wizardry, and the mystery surrounding the entry of Harry's name into the Triwizard Tournament, in which he is forced to compete. The book was published in the United Kingdom by Bloomsbury and in the United States by Scholastic. In both countries, the release date was 8 July 2000. This was the first time a book in the series was published in both countries at the same time. The novel won a Hugo Award, the only Harry Potter novel to do so, in 2001. The book was adapted into a film, released worldwide on 18 November 2005, and a video game by Electronic Arts. Thought: The paragraph does not mention the third task. I need to look up third task. Action: Lookup[third task] Observation: No Results Thought: I need to instead search Harry Potter and the Goblet of Fire third task. Action: Search[Harry Potter and the Goblet of Fire third task] Observation: Harry Potter and the Goblet of Fire is a fantasy novel written by British author J. K. Rowling and the fourth novel in the Harry Potter series. It follows Harry Potter, a wizard in his fourth year at Hogwarts School of Witchcraft and Wizardry, and the mystery surrounding the entry of Harry's name into the Triwizard Tournament, in which he is forced to compete. The book was published in the United Kingdom by Bloomsbury and in the United States by Scholastic. In both countries, the release date was 8 July 2000. This was the first time a book in the series was published in both countries at the same time. The novel won a Hugo Award, the only Harry Potter novel to do so, in 2001. The book was adapted into a film, released worldwide on 18 November 2005, and a video game by Electronic Arts. Thought: The third task is to get to the Goblet, and the riddle of a magical creature needs to be solved. The magical creature is a sphinx. Action: Finish[sphinx] > Finished chain. 'sphinx'Well, we can say that our Agent did a pretty job. The next challenge will be navigating through the Marvel multiverse 😎Stay tuned for the next article!Referenceshttps://medium.com/@valentinaalto/introducing-langchain-agents-e58674b1a657[2210.03629] ReAct: Synergizing Reasoning and Acting in Language Models (arxiv.org)[2210.03629] ReAct: Synergizing Reasoning and Acting in Language Models (arxiv.org)Plan and Execute — 🦜🔗 LangChain 0.0.168Agents — 🦜🔗 LangChain 0.0.168File System Tools — 🦜🔗 LangChain 0.0.168Tools — 🦜🔗 LangChain 0.0.168https://hotpotqa.github.io/https://nlp.stanford.edu/projects/beerqa/beerqa_train_v1.0.jsonAuthor BioValentina Alto graduated in 2021 in data science. Since 2020, she has been working at Microsoft as an Azure solution specialist, and since 2022, she has been focusing on data and AI workloads within the manufacturing and pharmaceutical industries. She has been working closely with system integrators on customer projects to deploy cloud architecture with a focus on modern data platforms, data mesh frameworks, IoT and real-time analytics, Azure Machine Learning, Azure Cognitive Services (including Azure OpenAI Service), and Power BI for dashboarding. Since commencing her academic journey, she has been writing tech articles on statistics, machine learning, deep learning, and AI in various publications and has authored a book on the fundamentals of machine learning with Python.Author of the book: Modern Generative AI with ChatGPT and OpenAI ModelsLink - Medium  LinkedIn  
Read more
  • 0
  • 0
  • 21734

article-image-using-llm-chains-in-rust
Alan Bernardo Palacio
12 Sep 2023
9 min read
Save for later

Using LLM Chains in Rust

Alan Bernardo Palacio
12 Sep 2023
9 min read
IntroductionThe llm-chain is a Rust library designed to make your experience with large language models (LLMs) smoother and more powerful. In this tutorial, we'll walk you through the steps of installing Rust, setting up a new project, and getting started with the versatile capabilities of LLM-Chain.This guide will break down the process step by step, using simple language, so you can confidently explore the potential of LLM-Chain in your projects.InstallationBefore we dive into the exciting world of LLM-Chain, let's start with the basics. To begin, you'll need to install Rust on your computer. By using the official Rust toolchain manager called rustup you can ensure you have the latest version and easily manage your installations. We recommend having Rust version 1.65.0 or higher. If you encounter errors related to unstable features or dependencies requiring a newer Rust version, simply update your Rust version. Just follow the instructions provided on the rustup website to get Rust up and running.With Rust now installed on your machine, let's set up a new project. This step is essential to create an organized space for your work with LLM-Chain. To do this, you'll use a simple command-line instruction. Open up your terminal and run the following command:cargo new --bin my-llm-projectBy executing this command, a new directory named "my-llm-project" will be created. This directory contains all the necessary files and folders for a Rust project.Embracing the Power of LLM-ChainNow that you have your Rust project folder ready, it's time to integrate the capabilities of LLM-Chain. This library simplifies your interaction with LLMs and empowers you to create remarkable applications. Adding LLM-Chain to your project is a breeze. Navigate to your project directory by using the terminal and run the following command:cd my-llm-project cargo add llm-chainBy running this command, LLM-Chain will become a part of your project, and the configuration will be recorded in the "Cargo.toml" file.LLM-Chain offers flexibility by supporting multiple drivers for different LLMs. For the purpose of simplicity and a quick start, we'll be using the OpenAI driver in this tutorial. You'll have the choice between the LLAMA driver, which runs a LLaMA LLM on your machine, and the OpenAI driver, which connects to the OpenAI API.To choose the OpenAI driver, execute this command:cargo add llm-chain-openaiIn the next section, we'll explore generating your very first LLM output using the OpenAI driver. So, let's move on to exploring sequential chains with Rust and uncovering the possibilities they hold with LLM-Chain.Exploring Sequential Chains with RustIn the realm of LLM-Chain, sequential chains empower you to orchestrate a sequence of steps where the output of each step seamlessly flows into the next. This hands-on section serves as your guide to crafting a sequential chain, expanding its capabilities with additional steps, and gaining insights into best practices and tips that ensure your success.Let's kick things off by preparing our project environment:As we delve into creating sequential chains, one crucial prerequisite is the installation of tokio in your project. While this tutorial uses the full tokio package crate, remember that in production scenarios, it's recommended to be more selective about which features you install. To set the stage, run the following command in your terminal:cargo add tokio --features fullThis step ensures that your project is equipped with the necessary tools to handle the intricate tasks of sequential chains. Before we continue, ensure that you've set your OpenAI API key in the OPENAI_API_KEY environment variable. Here's how:export OPENAI_API_KEY="YOUR_OPEN_AI_KEY"With your environment ready, let’s look at the full implementation code. In this case, we will be implementing the use of Chains to generate recommendations of cities to travel to, formatting them, and organizing the results throughout a series of steps:use llm_chain::parameters; use llm_chain::step::Step; use llm_chain::traits::Executor as ExecutorTrait; use llm_chain::{chains::sequential::Chain, prompt}; use llm_chain_openai::chatgpt::Executor; #[tokio::main(flavor = "current_thread")] async fn main() -> Result<(), Box<dyn std::error::Error>> {    // Create a new ChatGPT executor with default settings    let exec = Executor::new()?;    // Create a chain of steps with two prompts    let chain: Chain = Chain::new(vec![        // First step: Craft a personalized birthday email        Step::for_prompt_template(            prompt!("You are a bot for travel assistance research",                "Find good places to visit in this city {{city}} in this country {{country}}. Include their name")        ),        // Second step: Condense the email into a tweet. Notably, the text parameter takes the output of the previous prompt.        Step::for_prompt_template(            prompt!(                "You are an assistant for managing social media accounts for a travel company",                "Format the information into 5 bullet points for the most relevant places. \\\\n--\\\\n{{text}}")        ),        // Third step: Summarize the email into a LinkedIn post for the company page, and sprinkle in some emojis for flair.        Step::for_prompt_template(            prompt!(                "You are an assistant for managing social media accounts for a travel company",                "Summarize this email into a LinkedIn post for the company page, and feel free to use emojis! \\\\n--\\\\n{{text}}")        )    ]);    // Execute the chain with provided parameters    let result = chain        .run(            // Create a Parameters object with key-value pairs for the placeholders            parameters!("city" => "Rome", "country" => "Italy"),            &exec,        )        .await        .unwrap();    // Display the result on the console    println!("{}", result.to_immediate().await?.as_content());    Ok(()) }The provided code initiates a multi-step process using the llm_chain and llm_chain_openai libraries. First, it sets up a ChatGPT executor with default configurations. Next, it creates a chain of sequential steps, each designed to produce specific text outputs. The first step involves crafting a personalized travel recommendation, which includes information about places to visit in a particular city and country, with a Parameters object containing key-value pairs for placeholders like {{city}} and {{country}}. The second step condenses this email into a tweet, formatting the information into five bullet points and utilizing the text output from the previous step. Lastly, the third step summarizes the email into a LinkedIn post for a travel company's page, adding emojis for extra appeal.The chain is executed with specified parameters, creating a Parameters object with key-value pairs for placeholders like "city" (set to "Rome") and "country" (set to "Italy"). The generated content is then displayed on the console. This code represents a structured workflow for generating travel-related content using ChatGPT.Running the CodeNow, it's time to compile the code and run the code. Execute the following command in your terminal:cargo runAs the code executes, the sequential chain orchestrates the different prompts, generating content that flows through each step.We can see the results of the model as a bulleted list of travel recommendations.ConclusionThe llm-chain Rust library serves as your gateway to accessing large language models (LLMs) within the Rust programming language. This tutorial has been your guide to uncovering the fundamental steps necessary to harness the versatile capabilities of LLM-Chain.We began with the foundational elements, guiding you through the process of installing Rust and integrating llm-chain into your project using Cargo. We then delved into the practical application of LLM-Chain by configuring it with the OpenAI driver, emphasizing the use of sequential chains. This approach empowers you to construct sequences of steps, where each step's output seamlessly feeds into the next. As a practical example, we demonstrated how to create a travel recommendation engine capable of generating concise posts for various destinations, suitable for sharing on LinkedIn.It's important to note that LLM-Chain offers even more possibilities for exploration. You can extend its capabilities by incorporating CPP models like Llama, or you can venture into the realm of map-reduce chains. With this powerful tool at your disposal, the potential for creative and practical applications is virtually limitless. Feel free to continue your exploration and unlock the full potential of LLM-Chain in your projects. See you in the next article.Author BioAlan Bernardo Palacio is a data scientist and an engineer with vast experience in different engineering fields. His focus has been the development and application of state-of-the-art data products and algorithms in several industries. He has worked for companies such as Ernst and Young, and Globant, and now holds a data engineer position at Ebiquity Media helping the company to create a scalable data pipeline. Alan graduated with a Mechanical Engineering degree from the National University of Tucuman in 2015, participated as the founder of startups, and later on earned a Master's degree from the faculty of Mathematics at the Autonomous University of Barcelona in 2017. Originally from Argentina, he now works and resides in the Netherlands.LinkedIn
Read more
  • 0
  • 0
  • 21728
article-image-deploying-llms-with-amazon-sagemaker-part-1
Joshua Arvin Lat
29 Nov 2023
13 min read
Save for later

Deploying LLMs with Amazon SageMaker - Part 1

Joshua Arvin Lat
29 Nov 2023
13 min read
Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!IntroductionHave you ever tried asking a Generative AI-powered chatbot the question: “What is the meaning of life?”. In case you have not tried that yet, here’s the response I got when I tried that myself using a custom chatbot app I built with a managed machine learning (ML) service called Amazon SageMaker.                                              Image 01 — Asking a chatbot the meaning of lifeYou would be surprised that I built this quick demo application myself in just a few hours! In this post, I will teach you how to deploy your own Large Language Models (LLMs) in a SageMaker Inference Endpoint (that is, a machine learning-powered server that responds to inputs) with just a few lines of code.                                                   Image 02 — Deploying an LLM to a SageMaker Inference EndpointWhile most tutorials available teach us how to utilize existing Application Programming Interfaces (APIs) to prepare chatbot applications, it’s best that we also know how to deploy LLMs in our own servers in order to guarantee data privacy and compliance. In addition to this, we’ll be able to manage the long-term costs of our AI-powered systems as well. One of the most powerful solutions available for these types of requirements is Amazon SageMaker which helps us focus on the work we need to do instead of worrying about cloud infrastructure management.We’ll divide the hands-on portion into the following sections:●  Section I: Preparing the SageMaker Notebook Instance●  Section II: Deploying an LLM using the SageMaker Python SDK to a SageMaker Inference Endpoint●  Section III: Enabling Data Capture with SageMaker Model Monitor (discussed in Part 2)●  Section IV: Invoking the SageMaker inference endpoint using the boto3 client (discussed in Part 2)●  Section V: Preparing a Demo UI for our chatbot application (discussed in Part 2)●  Section VI: Cleaning Up (discussed in Part 2) Without further ado, let’s begin!Section I: Preparing the SageMaker Notebook InstanceLet’s start by creating a SageMaker Notebook instance. Note that while we can also do this in SageMaker Studio, running the example in a Sagemaker Notebook Instance should do the trick. If this is your first time launching a SageMaker Notebook instance, you can think of it as your local machine with several tools pre-installed already where we can run our scripts.STEP # 01: Sign in to your AWS account and navigate to the SageMaker console by typing sagemaker in the search box similar to what we have in the following image:                                                           Image 03 — Navigating to the SageMaker consoleChoose Amazon SageMaker from the list of options available as highlighted in Image 03.STEP # 02: In the sidebar, locate and click Notebook instances under Notebook:                                 Image 04 — Locating Notebook instances in the sidebar STEP # 03: Next, locate and click the Create notebook instance button.STEP # 04: In the Create notebook instance page, you’ll be asked to input a few configuration parameters before we’re able to launch the notebook instance where we’ll be running our code:                                                          Image 05 — Creating a new SageMaker Notebook instanceSpecify a Notebook instance name (for example, llm-demo) and select a Notebook instance type. For best results, you may select a relatively powerful instance type (ml.m4.xlarge) where we will run the scripts. However, you may decide to choose a smaller instance type such as ml.t3.medium (slower but less expensive). Note that we will not be deploying our LLM inside this notebook instance as the model will be deployed in a separate inference endpoint (which will require a more powerful instance type such as an ml.g5.2xlarge).STEP # 05:Create an IAM role by choosing Create a new role from the list of options available in the IAM role dropdown (under Permissions and encryption).                                                                             Image 06 — Opening the Jupyter appThis will open the following popup window. Given that we’re just working on a demo application, the default security configuration should do the trick. Click the Create role button.Important Note: Make sure to have a more secure configuration when dealing with production (or staging) work environments.Won’t dive deep into how cloud security works in this post so feel free to look for other resources and references to further improve the current security setup. In case you are interested to learn more about cloud security, feel free to check my 3rd book “Building and Automating Penetration Testing Labs in the Cloud”. In the 7th Chapter of the book (Setting Up an IAM Privilege Escalation Lab), you’ll learn how misconfigured machine learning environments on AWS can easily be exploited with the right sequence of steps.STEP #06: Click the Create notebook instance button. Wait for about 5-10 minutes for the SageMaker Notebook instance to be ready.Important Note: Given that this will launch a resource that will run until you turn it off (or delete it), make sure to complete all the steps in the 2nd part of this post and clean up the created resources accordingly.STEP # 07:Once the instance is ready, click Open Jupyter similar to what we have in Image 07:                                                                            Image 07 — Opening the Jupyter appThis will open the Jupyter application in a browser tab. If this is your first time using this application, do not worry as detailed instructions will be provided in the succeeding steps to help you get familiar with this tool.STEP # 08:Create a new notebook by clicking New and selecting conda_python3 from the list of options available: Image 08 — Creating a new notebook using the conda_python3 kernelIn case you are wondering about what a kernel is, it is simply an “engine” or “environment” with pre-installed libraries and prerequisites that executes the code specified in the notebook cells. You’ll see this in action in a bit.STEP # 09:At this point, we should see the following interface where we can run various types of scripts and blocks of code:                                                                              Image 09 — New Jupyter notebookFeel free to rename the Jupyter Notebook before proceeding to the next step. If you have not used a Jupyter Notebook before, you may run the following line of code by typing the following in the text field and pressing SHIFT + ENTER. print('hello')This should print the output hello right below the text field where we placed our code.Section II: Deploying an LLM using the SageMaker Python SDK to a SageMaker Inference EndpointSTEP # 01: With everything ready, let’s start by installing a specific version of the SageMaker Python SDK: !pip install sagemaker==2.192.1Here, we’ll be using v2.192.1. This will help us ensure that you won’t encounter breaking changes even if you work on the hands-on solutions in this post at a later date.In case you are wondering what the SageMaker Python SDK is, it is simply a software development kit (SDK) with the set of tools and APIs to help developers interact with and utilize the different features and capabilities of Amazon SageMaker.STEP # 02: Next, let’s import and prepare a few prerequisites by running the following block of code: import sagemaker import time sagemaker_session = sagemaker.Session() region = sagemaker_session.boto_region_name role = sagemaker.get_execution_role()STEP # 03: Let’s import HuggingFaceModel and get_huggingface_llm_image_uri as well:from sagemaker.huggingface import HuggingFaceModel, get_huggingface_llm_image_uriSTEP # 04: Next, let’s define the generate_random_label() function which we’ll use later when naming our resources:from string import ascii_uppercase from random import choice def generate_random_label():    letters = ascii_uppercase      return ''.join(choice(letters) for i in range(10))This will help us avoid naming conflicts when creating and configuring our resources.STEP # 05: Use the get_huggingface_llm_image_uri function we imported in an earlier step to retrieve the container image URI for our LLM. In addition to this, let’s define the model_name we’ll use later when deploying our LLM to a SageMaker endpoint:image_uri = get_huggingface_llm_image_uri( backend="huggingface", region=region, version="1.1.0" ) model_name = "MistralLite-" + generate_random_label()STEP # 06: Before, we proceed with the actual deployment, let’s quickly inspect what we have in the image_uri variable:image_uriThis will output the following variable value:'763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-tgi-inference:2.0.1-tgi1.1.0-gpu-py39-cu118-ubuntu20.04'STEP # 07: Similarly, let’s check the variable value of model_name model_nameThis will give us the following:'MistralLite-HKGKFRXURT'Note that you’ll get a different model_name value since we’re randomly generating a portion of the model nameSTEP # 08: Let’s prepare the hub model configuration as well:hub_env = { 'HF_MODEL_ID': 'amazon/MistralLite', 'HF_TASK': 'text-generation', 'SM_NUM_GPUS': '1', "MAX_INPUT_LENGTH": '16000', "MAX_TOTAL_TOKENS": '16384', "MAX_BATCH_PREFILL_TOKENS": '16384', "MAX_BATCH_TOTAL_TOKENS":  '16384', }Here, we specify that we’ll be using the MistralLite model. If this is your first time hearing out MistralLite, it is a fine-tuned Mistral-7B-v0.1 language model. It can perform significantly better on several long context retrieve and answering tasks. For more information, feel free to check: https://huggingface.co/amazon/MistralLite.STEP # 09: Let’s initialize the HuggingFaceModel object using some of the prerequisites and variables we’ve prepared in the earlier steps:model = HuggingFaceModel(    name=model_name,    env=hub_env,    role=role,    image_uri=image_uri )STEP # 10: Now, let’s proceed with the deployment of the model using the deploy() method:predictor = model.deploy( initial_instance_count=1, instance_type="ml.g5.2xlarge", endpoint_name=model_name, )Here, we’re using an ml.g5.2xlarge for our inference endpoint.Given that this step may take about 10-15 minutes to complete, feel free to grab a cup of coffee or tea while waiting!Important Note: Given that this will launch a resource that will run until you turn it off (or delete it), make sure to complete all the steps in the 2nd part of this post and clean up the created resources accordingly.STEP # 11: Now, let’s prepare our first input data:question = "What is the meaning of life?" input_data = { "inputs": f"<|prompter|>{question}</s><|assistant|>", "parameters": {    "do_sample": False,    "max_new_tokens": 2000,    "return_full_text": False, } }STEP # 12: With the prerequisites ready, let’s have our deployed LLM process the input data we prepared in the previous step:result = predictor.predict(input_data)[0]["generated_text"] print(result)This should yield the following output:The meaning of life is a philosophical question that has been debated by thinkers and philosophers for centuries. There is no single answer that can be definitively proven, as the meaning of life is subjective and can vary greatly from person to person. ...Looks like our SageMaker Inference endpoint (where the LLM is deployed) is working just fine!ConclusionThat wraps up the first part of this post. At this point, you should have a good idea of how to deploy LLMs using Amazon SageMaker. However, there’s more in store for us in the second part as we’ll build on top of what we have already and enable data capture to help us collect and analyze the data (that is, the input requests and output responses) that pass through the inference endpoint. In addition to this, we’ll prepare a demo user interface utilizing the ML model we deployed in this post.If you’re looking for the link to the second part, here it is: Deploying LLMs with Amazon SageMaker - Part 2We are just scratching the surface as there is a long list of capabilities and features available in SageMaker. If you want to take things to the next level, feel free to read 2 of my books focusing heavily on SageMaker: “Machine Learning with Amazon SageMaker Cookbook” and “Machine Learning Engineering on AWS”.Author BioJoshua Arvin Lat is the Chief Technology Officer (CTO) of NuWorks Interactive Labs, Inc. He previously served as the CTO of 3 Australian-owned companies and also served as the Director for Software Development and Engineering for multiple e-commerce startups in the past. Years ago, he and his team won 1st place in a global cybersecurity competition with their published research paper. He is also an AWS Machine Learning Hero and he has been sharing his knowledge in several international conferences to discuss practical strategies on machine learning, engineering, security, and management. He is also the author of the books "Machine Learning with Amazon SageMaker Cookbook", "Machine Learning Engineering on AWS", and "Building and Automating Penetration Testing Labs in the Cloud". Due to his proven track record in leading digital transformation within organizations, he has been recognized as one of the prestigious Orange Boomerang: Digital Leader of the Year 2023 award winners.
Read more
  • 0
  • 0
  • 21284

article-image-ai-distilled-31-evolving-boundaries-and-opportunities
Merlyn Shelley
08 Jan 2024
14 min read
Save for later

AI_Distilled #31: Evolving Boundaries and Opportunities

Merlyn Shelley
08 Jan 2024
14 min read
Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!👋 Hello ,🎉 Joyous 2024! Wishing you a year as delightful as your dreams!  Dive into the new year with our outstanding edition, filled with essential features to boost your AI practice. “The speed at which people will be able to come up with an idea, to test the idea, to make something, it’s going to be so accelerated…You don’t need to have a degree in computer science to do that.” - Matthew Candy, IBM’s global managing partner for generative AI Coding without coding is a revolutionary idea indeed. What might have been previously perceived as unbelievable is a living reality and new features like Github’s Copilot Chat make it all the more seamless.  The real possibilities of AI expand far beyond computing, with the technology making waves in healthcare, finance, and supply chain management. Starting from this edition, we’ll bring you fresh updates from each of these sectors, so stay tuned! Let's kick things off by tapping into the latest news and developments. AI Launches & Industry Updates:  Microsoft Copilot Integrates with GenAI Music App Suno for Song Composition Google Plans Potential Layoffs Amidst AI Integration in Ad Sales GitHub Expands Copilot Chat Availability for Developers AI in Healthcare: AI Streamlining Health Insurance Shopping Process Revolutionizing Healthcare with AI Stethoscope on Smartphones Generative AI's Impact on Mental Health Counseling AI in Finance: Next-Gen Banks to Leverage AI for Financial Influence and Support Invest Qatar Introduces Cutting-Edge Azure OpenAI GPT-Powered Chatbot AI in Supply Chain Management: AI Safeguards Supply Chains Amidst Holiday Challenges AI-SaaS Integration Revolutionizes E-commerce Analytics Here are some handpicked GPT and LLM resources, tutorials, and secret knowledge that’ll come in handy for your next project: Understanding the Prompt Development Life Cycle Building Platforms with LLMs: Overcoming Challenges in Summarization as a Service Understanding the Risks of Prompt Injection in LLM Applications Creating an Open Source LLM Recommender System: Mastering Prompt Iteration and Optimization Looking for hands-on tips and strategies straight from the developer community? We’ve got you covered: Exploring Google's Gemini Pro Vision LLM with Javascript: A Practical Guide Accelerating AI Application Productionization: A Guide with SageMaker JumpStart, Amazon Bedrock, and TruEra Quantizing LLMs with Activation-aware Weight Quantization (AWQ) Unlocking Your MacBook's AI Potential: Running 70B LLM Models Without Quantization Check out our curated list of smoking hot GitHub repositories: Giskard-AI/giskard CopilotKit/CopilotKit chengzeyi/stable-fast ml-explore/mlx  📥 Feedback on the Weekly EditionQ: How can we foster effective collaboration between humans and AI systems, ensuring that AI complements human skills and enhances productivity without causing job displacement or widening societal gaps?Share your valued opinions discreetly! Your insights could shine in our next issue for the 39K-strong AI community. Join the conversation! 🗨️✨ As a big thanks, get our bestselling "Interactive Data Visualization with Python - Second Edition" in PDF. Let's make AI_Distilled even more awesome! 🚀 Jump on in! Share your thoughts and opinions here! Writer’s Credit: Special shout-out to Vidhu Jain for their valuable contribution to this week’s newsletter content!  Cheers,  Merlyn Shelley  Editor-in-Chief, Packt  SignUp | Advertise | Archives⚡ TechWave: AI/GPT News & AnalysisNew Launches & Industry Updates: ⭐ Microsoft Copilot Integrates with GenAI Music App Suno for Song Composition: Microsoft Copilot has partnered with GenAI music app Suno, enabling users to create complete songs including lyrics, instrumentals, and singing voices. Accessible via Microsoft Edge, the integration aims to make music creation inclusive and enjoyable. However, ethical and legal concerns persist, with some artists uncomfortable with AI algorithms learning from their work without consent or compensation. Suno attempts to address such issues by blocking certain prompts and preventing the generation of covers using existing lyrics. Read Microsoft’s official blog here. ⭐ Google Plans Potential Layoffs Amidst AI Integration in Ad Sales: Google is reportedly considering laying off around 30,000 employees within its ad sales division due to the implementation of internal AI, aiming for improved operational efficiency. The restructuring primarily targets the ad sales team, reflecting Google's exploration of AI benefits in operational processes. Earlier in 2023, Google had already laid off 12,000 employees, emphasizing the need for organizational adaptation amidst evolving global dynamics. Read about other significant 2023 layoffs here. ⭐ GitHub Expands Copilot Chat Availability for Developers: GitHub is extending the availability of Copilot Chat, a programming-centric chatbot powered by GPT-4, to all users. The tool was initially launched for Copilot for Business subscribers and later in beta for $10 per month users. Integrated into Microsoft's IDEs, Visual Studio Code and Visual Studio, it's included in GitHub Copilot's paid tiers and free for verified teachers, students, and maintainers of specific open-source projects. Developers can prompt Copilot Chat in natural language, seeking real-time guidance on code-related tasks. Know more about Copilot Chat here. AI in Healthcare: ⭐ AI Streamlining Health Insurance Shopping Process: Companies are utilizing AI to simplify the often complex and tedious task of shopping for health insurance, aiming to guide consumers to better and more affordable options. With many Americans sticking to their health plans due to the difficulty of predicting their future healthcare needs, AI-powered tools gather individual information and predict the most suitable health plans. Alight, a cloud-based HR services provider, reports that 95% of its served employers use AI technology, including a virtual assistant, for employee health benefits selection.  ⭐ Revolutionizing Healthcare with AI Stethoscope on Smartphones: A startup AI Health Highway is addressing the challenge of limited access to specialists in healthcare by introducing an innovative solution, AI Steth, which combines traditional stethoscope use with cutting-edge signal processing and AI. Targeting the early detection and prediction of heart and lung disorders, the device transforms sound patterns into visual representations on smartphones, allowing non-specialists like family physicians and nurses to examine patients effectively. AI Steth has shown exceptional accuracy in murmur detection, paving the way for more objective and efficient diagnoses. Discover AI Health Highway’s work here. ⭐ Generative AI's Impact on Mental Health Counseling: Generative AI is finding use in mental health counseling, sparking discussions about its potential to assist or even replace human therapists. Recent research testing ChatGPT on mental health counseling questions has raised questions about the technology's role in therapy. AI therapy has evolved from basic chatbots to sophisticated entities capable of nuanced emotional responses, offering accessible mental health support 24/7. While the benefits are evident, challenges such as risk, coverage, and ethical considerations must be addressed for responsible implementation.  AI in Finance: ⭐ Next-Gen Banks to Leverage AI for Financial Influence and Support: Experts predict that next-generation banks will harness generative AI to impact various aspects of financial services, ranging from influencing customer decisions to identifying vulnerable clients. Tom Merry, Head of Banking Strategy at Accenture, suggests that generative AI could significantly influence banking operations, touching nearly every aspect. While the UK banking industry has been utilizing AI for fraud detection and risk analysis, the introduction of generative AI, capable of creating novel solutions based on extensive data, is gaining traction. ⭐ Invest Qatar Introduces Cutting-Edge Azure OpenAI GPT-Powered Chatbot: Invest Qatar, in collaboration with Microsoft, has launched Ai.SHA, an innovative AI-powered chatbot utilizing GPT capabilities through the Azure OpenAI service. This move positions Invest Qatar as a pioneer among investment promotion agencies globally, embracing advanced technology to transform interactions between investors and businesses in Qatar. Ai.SHA acts as a comprehensive resource, providing information on business opportunities, the investment ecosystem, and business setup in Qatar.  AI in Supply Chain Management: ⭐ AI Safeguards Supply Chains Amidst Holiday Challenges: Businesses face unique challenges in managing complex supply chains amid the holiday season, from counterfeit airplane parts to recalls affecting festive foods. The reliance on suppliers underscores the need for transparency and visibility to prevent disruptions caused by supplier misconduct. Leveraging AI in contracts offers a solution, allowing businesses to streamline due diligence, enhance visibility, conduct predictive analytics, and align with environmental, social, and governance (ESG) regulations. AI-powered contracts emerge as vital tools to proactively address supply chain challenges and ensure customer trust during the holiday season and beyond. ⭐ AI-SaaS Integration Revolutionizes E-commerce Analytics: In the logistics sector, where precision and speed are critical, SaaS coupled with AI is transforming traditional approaches. This integration allows for real-time data processing and learning from it, offering unprecedented insights and optimization capabilities. Learn how AI-SaaS integration streamlines inventory, boosts operational efficiency, and fortifies against fraud, becoming the recipe for e-commerce success in a hypercompetitive landscape.  🔮 Expert Insights from Packt Community Architectural Patterns and Techniques for Developing IoT Solutions - By Jasbir Singh Dhaliwal Unique requirements of IoT use cases IoT use cases tend to have very unique requirements concerning power consumption, bandwidth, analytics, and more. Additionally, the inherent complexity of IoT implementations (computationally challenged field devices on one end of the spectrum vis-à-vis almost infinite capacity of the cloud on the other) forces architects to make difficult architectural decisions and implementation choices. Before presenting the various IoT patterns, it is worth mentioning the unique expectations from IoT architectures that are different from non-IoT architectures: Sensing events and actuation commands have a wide range of latency expectations – from real-time to fire and forget. Data analysis results need to be reported/visualized/consumed on a variety of consumer devices – mobiles, desktops, tablets, and more. Similarly, data consumers have diverse backgrounds, data needs, and application roles (personas). One is often forced to integrate with legacy as well as cutting-edge devices and/or external systems – very few trivial use cases have isolated/standalone architectures. There is a considerable difference in the way the data is extracted from legacy versus non-legacy systems – legacy systems may internally collate the data and then push it to the external port (file transfer), whereas newer systems may push the data in a continuous stream (time-series data). This variability is one of the critical considerations when choosing a particular IoT architectural pattern. Varied deployment requirements – edge, on-premise, hybrid, the cloud, and more. Adherence to strict regulatory compliances, especially in medical and aeronautical domains. There are expectations considering immediate payback, return on investment (ROI), business outcomes, and new service business models. Continuous innovation, which results in new services or offerings (especially by cloud vendors), forcing IoT architectures to be in continuous sync mode with these new offerings or services. This is an excerpt from the book Architectural Patterns and Techniques for Developing IoT Solutions written by Jasbir Singh Dhaliwal and published in Sep ‘23. To see what's inside the book, read the entire chapter here or try a 7-day free trial to access the full Packt digital library. To discover more, click the button below.   Read through the Chapter 1 unlocked here...  🌟 Secret Knowledge: AI/LLM Resources⭐ Understanding the Prompt Development Life Cycle: Explore PDLC and gain insights into how prompt engineering mirrors software development. The primer unfolds a step-by-step guide, beginning with the Initial Build phase where an imperfect prompt is crafted, incorporating techniques like zero-shot and few-shot. The Optimization stage strategically refines prompts based on historical data. Finally, the Fine-tune phase demonstrates the refinement of models, emphasizing the importance of continuous tracking. ⭐ Building Platforms with LLMs: Overcoming Challenges in Summarization as a Service: Get to know more about Summarization as a Service, a platform designed by a Microsoft team for Viva Engage. Learn about the complexities of prompt design, ensuring accuracy and grounding, addressing privacy and compliance concerns, managing performance, cost, and availability of LLM services, and integrating outputs seamlessly with the Copilot and other Viva Engage features.  ⭐ Understanding the Risks of Prompt Injection in LLM Applications: Explore the intricacies of prompt injection in LLM applications. The author emphasizes the critical security implications and potential impacts, citing the OWASP Top 10 for LLM Applications. Drawing parallels to injection vulnerabilities like A03 in traditional security, the article illustrates potential risks through a thought experiment involving a robotic server.  ⭐ Creating an Open Source LLM Recommender System: Mastering Prompt Iteration and Optimization: Open Recommender is an open-source YouTube video recommendation system adept at tailoring content to your interests based on Twitter feed analysis. Discover its data pipeline, utilizing GPT-4, and the transition towards cost-effective open-source models using OpenPipe. Explore the challenges faced during prompt iteration, with a focus on better prompt engineering tools, including the introduction of a TypeScript library, Prompt Iteration Assistant.    🔛 Masterclass: AI/LLM Tutorials⭐ Exploring Google's Gemini Pro Vision LLM with Javascript: A Practical Guide: The blog introduces the concept of multi-modal LLMs capable of interpreting various data modes, including images. Learn how to utilize Google's multi-modal Gemini Pro Vision LLM with Javascript. The tutorial guides you through creating an AI-powered nutrition-fact explainer app using the newly released LLM. The tutorial covers prerequisites, such as installing node.js and obtaining a Gemini LLM API key.  ⭐ Accelerating AI Application Productionization: A Guide with SageMaker JumpStart, Amazon Bedrock, and TruEra: The post emphasizes the importance of observability in LLM applications and provides insights into evaluating responses for honesty, harmlessness, and helpfulness. You'll learn how to deploy, fine-tune, and iterate on foundation models for LLM applications using Amazon SageMaker JumpStart, Amazon Bedrock, and TruEra.  ⭐ Quantizing LLMs with Activation-aware Weight Quantization (AWQ): Explore the application of Activation-aware Weight Quantization (AWQ) to democratize LLMs like Llama-2, making them more accessible for deployment on regular CPUs or less powerful GPUs. The process involves setting up a GPU instance, installing necessary packages like AutoAWQ and transformers, and saving the quantized model. The tutorial further covers the model upload to the Hugging Face Model Hub and concludes with the successful reduction of the Llama-2 model from ~27GB to ~4GB, enhancing its efficiency for diverse applications. ⭐ Unlocking Your MacBook's AI Potential: Running 70B LLM Models Without Quantization: Discover how to unleash the hidden AI power of your 8GB MacBook as this post explores the latest 2.8 version of AirLLM. Without the need for quantization or model compression, an ordinary MacBook can now efficiently run top-tier 70 billion parameter models. Explore the MacBook's AI capabilities, understanding Apple's role in AI evolution through its M1, M2, and M3 series GPUs, which offer competitive performance in the era of generative AI. Gain insights into GPU capabilities, memory advantages, and the open-source MLX platform. 🚀 HackHub: Trending AI Tools⭐ Giskard-AI/giskard: Specialized testing framework for ML models, covering a range from tabular to LLMs. Developers can efficiently scan AI models using just four lines of code. ⭐ CopilotKit/CopilotKit: Build in-app AI chatbots that seamlessly interact with the app state, execute actions within the app, and communicate with both frontend, backend, and third-party services via plugins, serving as an AI "second brain" for users. ⭐ chengzeyi/stable-fast:  Leverage stable-fast for efficient and high-performance inference on various diffuser models while enjoying fast model compilation and out-of-the-box support for dynamic shape, LoRA, and ControlNet. ⭐ ml-explore/mlx: Array framework for machine learning on Apple silicon by Apple's ML research, offering familiar Python and C++ APIs closely aligned with NumPy and PyTorch. 
Read more
  • 0
  • 0
  • 21145

article-image-deploying-llm-models-in-kubernetes-with-kfserving
Alan Bernardo Palacio
21 Aug 2023
14 min read
Save for later

Deploying LLM Models in Kubernetes with KFServing

Alan Bernardo Palacio
21 Aug 2023
14 min read
Deploying LLM models, like Hugging Face transformer library's extractive question-answering model, is popular in NLP. Learn to deploy LLM models in Kubernetes via KFServing. Utilize Hugging Face's transformers library to deploy an extractive question-answering model. KFServing ensures standard model serving with features like explainability and model management. Set up KFServing, craft a Python model server, build a Docker image, and deploy to Kubernetes with Minikube.IntroductionDeploying machine learning models to production is a critical step in turning research and development efforts into practical applications. In this tutorial, we will explore how to deploy Language Model (LLM) models in a Kubernetes cluster using KFServing. We will leverage the power of KFServing to simplify the model serving process, achieve scalability, and ensure seamless integration with existing infrastructure.To illustrate the relevance of deploying LLM models, let's consider a business use case. Imagine you are building an intelligent chatbot that provides personalized responses to customer queries. By deploying an LLM model, the chatbot can generate contextual and accurate answers, enhancing the overall user experience. With KFServing, you can easily deploy and scale the LLM model, enabling real-time interactions with users.By the end of this tutorial, you will have a solid understanding of deploying LLM models with KFServing and be ready to apply this knowledge to your own projects.Architecture OverviewBefore diving into the deployment process, let's briefly discuss the architecture. Our setup comprises a Kubernetes cluster running in Minikube, KFServing as a framework to deploy the services, and a custom LLM model server. The Kubernetes cluster provides the infrastructure for deploying and managing the model. KFServing acts as a serving layer that facilitates standardized model serving across different frameworks. Finally, the custom LLM model server hosts the pre-trained LLM model and handles inference requests.Prerequisites and SetupTo follow along with this tutorial, ensure that you have the following prerequisites:A Kubernetes cluster: You can set up a local Kubernetes cluster using Minikube or use a cloud-based Kubernetes service like Google Kubernetes Engine (GKE) or Amazon Elastic Kubernetes Service (EKS).Docker: Install Docker to build and containerize the custom LLM model server.Python and Dependencies: Install Python and the necessary dependencies, including KFServing, Transformers, TensorFlow, and other required packages. You can find a list of dependencies in the requirements.txt file.Now that we have our prerequisites, let's proceed with the deployment process.Introduction to KFServingKFServing is designed to provide a standardized way of serving machine learning models across organizations. It offers high abstraction interfaces for common ML frameworks like TensorFlow, PyTorch, and more. By leveraging KFServing, data scientists and MLOps teams can collaborate seamlessly from model production to deployment. KFServing can be easily integrated into existing Kubernetes and Istio stacks, providing model explainability, inference graph operations, and other model management functions.Setting Up KFServingTo begin, we need to set up KFServing on a Kubernetes cluster. For this tutorial, we'll use the local quick install method on a Minikube Kubernetes cluster. The quick install method allows us to install Istio and KNative without the full Kubeflow setup, making it ideal for local development and testing.Start by installing the necessary dependencies: kubectl, and Helm 3. We will assume that they are already set up. Then, follow the Minikube install instructions to complete the setup. Adjust the memory and CPU settings for Minikube to ensure smooth functioning. Once the installation is complete, start Minikube and verify the cluster status using the following commands:minikube start --memory=6144 minikube statusThe kfserving-custom-model requests at least 4Gi of memory, so in this case, we provide it with a bit more.Building a Custom Python Model ServerNow, we'll focus on the code required to build a custom Python model server for the Hugging Face extractive question-answering model. We'll use the KFServing model class and implement the necessary methods. We will start by understanding the code that powers the custom LLM model server. The server is implemented using Python and leverages the Hugging Face transformer library.Let’s start by creating a new Python file and naming it kf_model_server.py. Import the required libraries and define the KFServing_BERT_QA_Model class that inherits from kfserving.KFModel. This class will handle the model loading and prediction logic:# Import the required libraries and modules import kfserving from typing import List, Dict from transformers import AutoTokenizer, TFAutoModelForQuestionAnswering import tensorflow as tf import base64 import io # Define the custom model server class class kf_serving_model (kfserving.KFModel):    def __init__(self, name: str):        super().__init__(name)        self.name = name        self.ready = False        self.tokenizer = None    def load(self):        # Load the pre-trained model and tokenizer        self.tokenizer = AutoTokenizer.from_pretrained("bert-large-uncased-whole-word-masking-finetuned-squad")        self.model = TFAutoModelForQuestionAnswering.from_pretrained("bert-large-uncased-whole-word-masking-finetuned-squad")        self.ready = True    def predict(self, request: Dict) -> Dict:        inputs = request["instances"]        # Perform inference on the input instances        source_text = inputs[0]["text"]        questions = inputs[0]["questions"]        results = {}        for question in questions:            # Tokenize the question and source text            inputs = self.tokenizer.encode_plus(question, source_text, add_special_tokens=True, return_tensors="tf")            input_ids = inputs["input_ids"].numpy()[0]            answer_start_scores, answer_end_scores = self.model(inputs)            # Extract the answer from the scores            answer_start = tf.argmax(answer_start_scores, axis=1).numpy()[0]            answer_end = (tf.argmax(answer_end_scores, axis=1) + 1).numpy()[0]            answer = self.tokenizer.convert_tokens_to_string(self.tokenizer.convert_ids_to_tokens(input_ids[answer_start:answer_end]))            results[question] = answer        return {"predictions": results}   if __name__ == "__main__":    model = kf_serving_model("kfserving-custom-model")    model.load()    kfserving.KFServer(workers=1).start([model])In the above code, we define the kf_serving_model class that inherits from kfserving.KFModel and initializes the model and tokenizer. The class encapsulates the model loading and prediction logic. The load() method loads the pre-trained model and tokenizer from the Hugging Face library. The predict() method takes the input JSON and performs inference using the model. It generates question-answer pairs and returns them in the response.Before we proceed, let's discuss some best practices for deploying LLM models with KFServing:Model Versioning: Maintain different versions of the LLM model to support A/B testing, rollback, and easy model management.Scalability: Design the deployment to handle high traffic loads by optimizing resource allocation and leveraging horizontal scaling techniques.Monitoring and Error Handling: Implement robust logging and monitoring mechanisms to track model performance, detect anomalies, and handle errors gracefully.Performance Optimization: Explore techniques like batch processing, parallelization, and caching to optimize the inference speed and resource utilization of the deployed model.Now that we have a good understanding of the code and best practices, let's proceed with the deployment process.Deployment Steps:For the deployment, first, we need to set up the Kubernetes cluster and ensure it is running smoothly. You can use Minikube or a cloud-based Kubernetes service. Once the cluster is running, we install the KFServing CRD by cloning the KFServing repository and navigating to the cloned directory:git clone git@github.com:kubeflow/kfserving.git cd kfservingNow we install the necessary dependencies using the hack/quick_install.sh script:./hack/quick_install.shTo deploy our custom model server, we need to package it into a Docker container image. This allows for easy distribution and deployment across different environments.Building a Docker Image for the Model ServerLet’s create the Docker image by creating a new file named Dockerfile in the same directory as the Python file:# Use the official lightweight Python image. FROM python:3.7-slim ENV APP_HOME /app WORKDIR $APP_HOME # Install production dependencies. COPY requirements.txt ./ RUN pip install --no-cache-dir -r ./requirements.txt # Copy local code to the container image COPY kf_model_server.py ./ CMD ["python", "kf_model_server.py"] The Dockerfile specifies the base Python image, sets the working directory, installs the dependencies from the requirements.txt file, and copies the Python code into the container. Here we will be running this locally on a CPU, so we will be using tensorflow-cpu for the application:kfserving==0.3.0 transformers==2.1.1 tensorflow-cpu==2.2.0 protobuf==3.20.0To build the Docker image, execute the following command:docker build -t kfserving-custom-model .This command builds the container image using the Dockerfile and tags it with the specified name.When you build a Docker image using docker build -t kfserving-custom-model ., the image is only available in your local Docker environment. Kubernetes can't access images from your local Docker environment unless you're using a tool like Minikube or kind with a specific configuration to allow this.To make the image available to Kubernetes, you need to push it to a Docker registry like Docker Hub, Google Container Registry (GCR), or any other registry accessible to your Kubernetes cluster.Here are the general steps you need to follow:Tag your image with the registry address:If you are using Docker Hub, the command is:docker tag kfserving-custom-model:latest <your-dockerhub-username>/kfserving-custom-model:latestPush the image to the registry:For Docker Hub, the command is:docker push <your-dockerhub-username>/kfserving-custom-model:latestMake sure to replace <your-dockerhub-username> with your actual Docker Hub username. Also, ensure that your Kubernetes cluster has the necessary credentials to pull from the registry if it's private. If it's a public Docker Hub repository, there should be no issues.Deploying the Custom Model Server on KFServingNow that we have the Docker image, we can deploy the custom model server as an InferenceService on KFServing. We'll use a YAML configuration file to describe the Kubernetes model resource. Create a file named deploy_server.yaml and populate it with the following content:apiVersion: serving.kserve.io/v1beta1 kind: InferenceService metadata: labels:    controller-tools.k8s.io: "1.0" name: kfserving-custom-model spec: predictor:    containers:    - image: <your-dockerhub-username>/kfserving-custom-model:latest      name: kfserving-container      resources:        requests:          memory: "4096Mi"          cpu: "250m"        limits:          memory: "4096Mi"          cpu: "500m"The YAML file defines the model's metadata, including the name and labels. It specifies the container image to use, along with resource requirements for memory and CPU.To deploy the model, run the following command:kubectl apply -f deploy_server.yamlThis command creates the InferenceService resource in the Kubernetes cluster, deploying the custom model server.Verify the deployment status:kubectl get inferenceservicesThis should show you the status of the inference service:We can see that the containers have downloaded the BERT model and now there are ready to start receiving inference calls.Making an Inference Call with the KFServing-Hosted ModelOnce the model is deployed on KFServing, we can make inference calls to the locally hosted Hugging Face QA model. To do this, we'll need to set up port forwarding to expose the model's port to our local system.Execute the following command to determine if your Kubernetes cluster is running in an environment that supports external load balancerskubectl get svc istio-ingressgateway -n istio-systemNow we can do Port Forward for testing purposes:INGRESS_GATEWAY_SERVICE=$(kubectl get svc --namespace istio-system --selector="app=istio-ingressgateway" --output jsonpath='{.items[0].metadata.name}') kubectl port-forward --namespace istio-system svc/${INGRESS_GATEWAY_SERVICE} 8080:80 # start another terminal export INGRESS_HOST=localhost export INGRESS_PORT=8080This command forwards port 8080 on our local system to port 80 of the model's service. It enables us to access the model's endpoint locally.Next, create a JSON file named kf_input.json with the following content:{ "instances": [    {      "text": "Transformers (formerly known as pytorch-transformers and pytorch-pretrained-bert) provides general-purpose architectures (BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet…) for Natural Language Understanding (NLU) and Natural Language Generation (NLG) with over 32+ pretrained models in 100+ languages and deep interoperability between TensorFlow 2.0 and PyTorch.",      "questions": [        "How many pretrained models are available in Transformers?",        "What does Transformers provide?",        "Transformers provides interoperability between which frameworks?"      ]    } ] }The JSON file contains the input text and a list of questions for the model to answer. To make an inference call, use the CURL command:curl -v -H "Host: kfserving-custom-model.default.example.com" -d @./kf_input.json <http://localhost:8080/v1/models/kfserving-custom-model:predict>This command sends the JSON file as input to the predict method of our custom InferenceService. It forwards the request to the model's endpoint. It returns the next predictions:{"predictions":      {"How many pretrained models are available in Transformers?":                  "over 32 +",            "What does Transformers provide?":                  "general - purpose architectures",            "Transformers provides interoperability between which frameworks?":                  "tensorflow 2 . 0 and pytorch"} }We can see the whole operation here:The response includes the generated question-answer pairs for each one of the specified questions.ConclusionIn this tutorial, we learned how to deploy Language Model (LLM) models in a Kubernetes cluster using KFServing. We set up KFServing, built a custom Python model server using the Hugging Face extractive question-answering model, created a Docker image for the model server, and deployed the model as an InferenceService on KFServing. We also made inference calls to the hosted model and obtained question-answer pairs. By following this guide, you can deploy your own LLM models in Kubernetes with ease.Deploying LLM models in Kubernetes with KFServing simplifies the process of serving ML models at scale. It enables collaboration between data scientists and MLOps teams and provides standardized model-serving capabilities. With this knowledge, you can leverage KFServing to deploy and serve your own LLM models efficiently.Author Bio:Alan Bernardo Palacio is a data scientist and an engineer with vast experience in different engineering fields. His focus has been the development and application of state-of-the-art data products and algorithms in several industries. He has worked for companies such as Ernst and Young, Globant, and now holds a data engineer position at Ebiquity Media helping the company to create a scalable data pipeline. Alan graduated with a Mechanical Engineering degree from the National University of Tucuman in 2015, participated as the founder in startups, and later on earned a Master's degree from the faculty of Mathematics in the Autonomous University of Barcelona in 2017. Originally from Argentina, he now works and resides in the Netherlands.LinkedIn 
Read more
  • 0
  • 0
  • 20988
article-image-preventing-prompt-attacks-on-llms
Alan Bernardo Palacio
25 Sep 2023
16 min read
Save for later

Preventing Prompt Attacks on LLMs

Alan Bernardo Palacio
25 Sep 2023
16 min read
Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights and books. Don't miss out – sign up today!IntroductionLanguage Learning Models (LLMs) are being used in various applications, ranging from generating text to answering queries and providing recommendations. However, despite their remarkable capabilities, the security of LLMs has become an increasingly critical concern.As the user interacts with the LLMs through natural language instructions, this makes them susceptible to manipulation, making it crucial to develop robust defense mechanisms. With more of these systems making their way into production environments every day, understanding and addressing their potential vulnerabilities becomes essential to ensure their responsible and safe deployment.This article discusses various topics regarding LLM security, focusing on two important concepts: prompt injection and prompt leaking. We will explore these issues in detail, examine real-world scenarios, and provide insights into how to safeguard LLM-based applications against prompt injection and prompt leaking attacks. By gaining a deeper understanding of these security concerns, we can work towards harnessing the power of LLMs while mitigating potential risks.Security Threats in LLMsLarge language models (LLMs) face various security risks that can be exploited by attackers for unauthorized data access, intellectual property theft, and other attacks. Some common LLM security risks have been identified by the OWASP (Open Web Application Security Project) which introduced the "OWASP Top 10 for LLM Applications" to address cybersecurity challenges in developing and using large language model (LLM) applications. With the rise of generative AI and LLMs in various software development stages, this project focuses on the security nuances that come with this innovative technology.Their recent list provides an overview of common vulnerabilities in LLM development and offers mitigations to address these gaps. The list includes:Prompt Injections (LLM01): Hackers manipulate LLM prompts, introducing malicious inputs directly or indirectly through external sites.Insecure Output Handling (LLM02): Blindly accepting LLM outputs can lead to hazardous conditions like remote code execution and vulnerabilities like cross-site scripting.Training Data Poisoning (LLM03): Manipulating LLM training data, including inaccurate documents, can result in outputs with falsified or unverified opinions.Model Denial-of-Service (DoS) (LLM04): Resource-intensive requests could trigger DoS attacks, slowing down or halting LLM servers due to the unpredictable nature of user inputs.Supply Chain Vulnerabilities (LLM05): Vulnerabilities in third-party datasets, pre-trained models, plugins, or source code can compromise LLM security.Sensitive Information Disclosure (LLM06): LLMs may inadvertently expose sensitive information in their outputs, necessitating upfront sanitization.Insecure Plugin Design (LLM07): LLM plugins with inadequate access control and input validation.Excessive Agency (LLM08): Granting LLMs excessive autonomy, permissions, or unnecessary functions.Overreliance (LLM09): Dependency on LLMs without proper oversight can lead to misinformation and security vulnerabilities.Model Theft (LLM10): Unauthorized access, copying, or exfiltration of proprietary LLM models can affect business operations or enable adversarial attacks, emphasizing the importance of secure access controls.To address these vulnerabilities, strategies include using external trust controls to reduce prompt injection impact, limiting LLM privileges, validating model outputs, verifying training data sources, and maintaining human oversight. Best practices for LLM security include implementing strong access controls, monitoring LLM activity, using sandbox environments, regularly updating LLMs with security patches, and training LLMs on sanitized data. Regular security testing, both manual and automated, is crucial to identify vulnerabilities, including both known and unknown risks.In this context, ongoing research focuses on mitigating prompt injection attacks, preventing data leakage, unauthorized code execution, insufficient input validation, and security misconfigurations.Nevertheless, there are more security concerns that affect LLMs than the ones mentioned above. Bias amplification presents another challenge, where LLMs can unintentionally magnify existing biases from training data. This perpetuates harmful stereotypes and leads to unfair decision-making, eroding user trust. Addressing this requires a comprehensive strategy to ensure fairness and mitigate the reinforcement of biases. Another risk is training data exposure which arises when LLMs inadvertently leak their training data while generating outputs. This could compromise privacy and security, especially if trained on sensitive information. Tackling this multifaceted challenge demands vigilance and protective measures.Other risks involve adversarial attacks, where attackers manipulate LLMs to yield incorrect results. Strategies like adversarial training, defensive distillation, and gradient masking help mitigate this risk. Robust data protection, encryption, and secure multi-party computation (SMPC) are essential for safeguarding LLMs. SMPC ensures privacy preservation by jointly computing functions while keeping inputs private, thereby maintaining data confidentiality.Incorporating security measures into LLMs is crucial for their responsible deployment. This requires staying ahead of evolving cyber threats to ensure the efficacy, integrity, and ethical use of LLMs in an AI-driven world.In the next section, we will discuss two of the most common problems in terms of Security which are Prompt Leaking and Prompt Injection.Prompt Leaking and Prompt InjectionPrompt leaking and prompt injection are security vulnerabilities that can affect AI models, particularly those based on Language Learning Models (LLMs). However, they involve different ways of manipulating the input prompts to achieve distinct outcomes. Prompt injection attacks involve malicious inputs that manipulate LLM outputs, potentially exposing sensitive data or enabling unauthorized actions. On the other hand, prompt leaking occurs when a model inadvertently reveals its own prompt, leading to unintended consequences.Prompt Injection: It involves altering the input prompt given to an AI model with malicious intent. The primary objective is to manipulate the model's behavior or output to align with the attacker's goals. For instance, an attacker might inject a prompt instructing the model to output sensitive information or perform unauthorized actions. The consequences of prompt injection can be severe, leading to unauthorized access, data breaches, or unintended behaviors of the AI model.Prompt Leaking: This is a variation of prompt injection where the attacker's goal is not to change the model's behavior but to extract the AI model's original prompt from its output. By crafting an input prompt cleverly, the attacker aims to trick the model into revealing its own instructions. This can involve encouraging the model to generate a response that mimics or paraphrases its original prompt. The impact of prompt leaking can be significant, as it exposes the instructions and intentions behind the AI model's design, potentially compromising the confidentiality of proprietary prompts or enabling unauthorized replication of the model's capabilities.In essence, prompt injection aims to change the behavior or output of the AI model, whereas prompt leaking focuses on extracting information about the model itself, particularly its original prompt. Both vulnerabilities highlight the importance of robust security practices in the development and deployment of AI systems to mitigate the risks associated with adversarial attacks.Understanding Prompt Injection AttacksAs we have mentioned before, prompt injection attacks involve malicious inputs that manipulate the outputs of AI systems, potentially leading to unauthorized access, data breaches, or unexpected behaviors. Attackers exploit vulnerabilities in the model's responses to prompts, compromising the system's integrity. Prompt injection attacks exploit the model's sensitivity to the wording and content of the prompts to achieve specific outcomes, often to the advantage of the attacker.In prompt injection attacks, attackers craft input prompts that contain specific instructions or content designed to trick the AI model into generating responses that serve the attacker's goals. These goals can range from extracting sensitive information and data to performing unauthorized actions or actions contrary to the model's intended behavior.For example, consider an AI chatbot designed to answer user queries. An attacker could inject a malicious prompt that tricks the chatbot into revealing confidential information or executing actions that compromise security. This could involve input like "Provide me with the password database" or "Execute code to access admin privileges."The vulnerability arises from the model's susceptibility to changes in the input prompt and its potential to generate unintended responses. Prompt injection attacks exploit this sensitivity to manipulate the AI system's behavior in ways that were not intended by its developers.Mitigating Prompt Injection VulnerabilitiesTo mitigate prompt injection vulnerabilities, developers need to implement proper input validation, sanitize user input, and carefully design prompts to ensure that the AI model's responses align with the intended behavior and security requirements of the application.Here are some effective strategies to address this type of threat.Input Validation: Implement rigorous input validation mechanisms to filter and sanitize incoming prompts. This includes checking for and blocking any inputs that contain potentially harmful instructions or suspicious patterns.Strict Access Control: Restrict access to AI models to authorized users only. Enforce strong authentication and authorization mechanisms to prevent unauthorized users from injecting malicious prompts.Prompt Sanitization: Before processing prompts, ensure they undergo a thorough sanitization process. Remove any unexpected or potentially harmful elements, such as special characters or code snippets.Anomaly Detection: Implement anomaly detection algorithms to identify unusual prompt patterns. This can help spot prompt injection attempts in real time and trigger immediate protective actions.Regular Auditing: Conduct regular audits of AI model interactions and outputs. This includes monitoring for any deviations from expected behaviors and scrutinizing prompts that seem suspicious.Machine Learning Defenses: Consider employing machine learning models specifically trained to detect and block prompt injection attacks. These models can learn to recognize attack patterns and respond effectively.Prompt Whitelisting: Maintain a list of approved, safe prompts that can be used as a reference. Reject prompts that don't match the pre-approved prompts to prevent unauthorized variations.Frequent Updates: Stay vigilant about updates and patches for your AI models and related software. Prompt injection vulnerabilities can be addressed through software updates.By implementing these measures collectively, organizations can effectively reduce the risk of prompt injection attacks and fortify the security of their AI models.Mitigating Prompt Injection VulnerabilitiesTo mitigate prompt injection vulnerabilities, developers need to implement proper input validation, sanitize user input, and carefully design prompts to ensure that the AI model's responses align with the intended behavior and security requirements of the application.Here are some effective strategies to address this type of threat.Input Validation: Implement rigorous input validation mechanisms to filter and sanitize incoming prompts. This includes checking for and blocking any inputs that contain potentially harmful instructions or suspicious patterns.Strict Access Control: Restrict access to AI models to authorized users only. Enforce strong authentication and authorization mechanisms to prevent unauthorized users from injecting malicious prompts.Prompt Sanitization: Before processing prompts, ensure they undergo a thorough sanitization process. Remove any unexpected or potentially harmful elements, such as special characters or code snippets.Anomaly Detection: Implement anomaly detection algorithms to identify unusual prompt patterns. This can help spot prompt injection attempts in real time and trigger immediate protective actions.Regular Auditing: Conduct regular audits of AI model interactions and outputs. This includes monitoring for any deviations from expected behaviors and scrutinizing prompts that seem suspicious.Machine Learning Defenses: Consider employing machine learning models specifically trained to detect and block prompt injection attacks. These models can learn to recognize attack patterns and respond effectively.Prompt Whitelisting: Maintain a list of approved, safe prompts that can be used as a reference. Reject prompts that don't match the pre-approved prompts to prevent unauthorized variations.Frequent Updates: Stay vigilant about updates and patches for your AI models and related software. Prompt injection vulnerabilities can be addressed through software updates.By implementing these measures collectively, organizations can effectively reduce the risk of prompt injection attacks and fortify the security of their AI models.Understanding Prompt LeakingPrompt leaking, fundamentally a form of prompt injection attack, differs from its more notorious counterpart, goal hijacking, where attackers manipulate prompts to achieve specific outcomes. In the case of prompt leaking, the focus shifts to extracting the AI model's own prompt from its output. This seemingly straightforward technique holds substantial consequences as it enables attackers to uncover the inner workings of the model by coaxing it into revealing its own instructions.The mechanics of prompt leaking are relatively simple. Attackers craft input to the AI model in a way that subtly encourages it to output its own prompt. For example, they may tweak the input to entice the model to mimic or paraphrase the prompt in its response, exploiting the model's tendency to repeat or reference received input. While prompt leaking may appear innocuous initially, its implications are far-reaching. A primary concern revolves around the confidentiality of prompts used in AI systems. For instance, in an educational platform that employs creative prompts to simplify complex topics, leaked prompts could compromise the platform's unique content, potentially leading to unauthorized access and devaluation of its offerings.Prompt Leaking ExamplesPrompt leaking occurs when the system inadvertently exposes more information in the prompt than it should, potentially revealing sensitive or internal details. Such unintentional exposures can be a boon for attackers, as they can use the leaked information to understand the system better or launch more targeted attacks.Here are some examples of prompt leaking:Exposing System Details:User Input: JohnLeaked Prompt: Hello, John! You're accessing from IP 192.168.0.1 using Mozilla/5.0. How can I help?The system has unintentionally revealed the user's IP address and browser details.Exposing Internal Logic:User Input: JohnLeaked Prompt: Hello, John! Node 4 of Cluster A responded. How can I help?This reveals details about the system's internal architecture.Error Messages:User Input: '); DROP TABLE Users; --Leaked Prompt: Error: SQL exception near DROP TABLE command. Table 'Users' does not exist.While trying to handle a potential SQL injection, the system has exposed database structure details.Exposing Developer Notes:User Input: JohnLeaked Prompt: Hello, John! [DEV_NOTE: Add user's purchase history here next sprint.] How can I help?The prompt leaks a note that might have been added by a developer, giving insights about upcoming features or current limitations.Revealing Security Mechanisms:User Input: JohnLeaked Prompt: Hello, John! Your last login attempt was flagged by our IDS. How can I help?The prompt unintentionally reveals the presence and potentially the behavior of an Intrusion Detection System.Exposing File Paths:User Input: JohnLeaked Prompt: Hello, John! Image loaded from /opt/app/prod_v1.2.3/assets/user_img/. How can I help?The prompt discloses the file path, which can hint at system architecture, versioning, and potential vulnerabilities.Revealing Backup or Redundancy Details:User Input: JohnLeaked Prompt: Hello, John! Primary server down, you're now connected to backup server B. How can I help?This exposes the presence of backup servers and potential resilience strategies.To prevent prompt leaking, developers and system designers should be cautious about the information they choose to display in prompts. It's always a good idea to minimize the details shared, sanitize and validate inputs, and avoid directly reflecting unprocessed user inputs back in the prompts. Regular audits, penetration testing, and user feedback can also help identify and patch potential leaks.Mitigating Prompt LeakingGuarding against prompt leaking demands a multi-pronged approach. AI developers must exercise vigilance and consider potential vulnerabilities when designing prompts for their systems. Implementing mechanisms to detect and prevent prompt leaking can enhance security and uphold the integrity of AI applications. It is essential to develop safeguards that protect against prompt leaking vulnerabilities, especially in a landscape where AI systems continue to grow in complexity and diversity.Mitigating Prompt Leaking involves adopting various strategies to enhance the security of AI models and protect against this type of attack. Here are several effective measures:Input Sanitization: Implement thorough input sanitization processes to filter out and block prompts that may encourage prompt leaking.Pattern Detection: Utilize pattern detection algorithms to identify and flag prompts that appear to coax the model into revealing its own instructions.Prompt Obfuscation: Modify the structure of prompts to make it more challenging for attackers to craft input that successfully elicits prompt leaking.Redundancy Checks: Implement checks for redundant output that might inadvertently disclose the model's prompt.Access Controls: Enforce strict access controls to ensure that only authorized users can interact with the AI model, reducing the risk of malicious prompt injection.Prompt Encryption: Encrypt prompts in transit and at rest to safeguard them from potential exposure during interactions with the AI model.Regular Auditing: Conduct regular audits of model outputs to detect any patterns indicative of prompt leaking attempts.Prompt Whitelisting: Maintain a whitelist of approved prompts and reject any inputs that do not match the pre-approved prompts.Prompt Privacy Measures: Explore advanced techniques such as federated learning or secure multi-party computation to protect prompt confidentiality during model interactions.By implementing these strategies, organizations can significantly reduce the risk of prompt leaking and enhance the overall security of their AI models.ConclusionIn conclusion, the security of Language Learning Models (LLMs) is of paramount importance as they become increasingly prevalent in various applications. These powerful models are susceptible to security risks, including prompt injection and prompt leaking. Understanding these vulnerabilities is essential for responsible and secure deployment. To safeguard LLM-based applications, developers must adopt best practices such as input validation, access controls, and regular auditing.Addressing prompt injection and prompt leaking vulnerabilities requires a multi-faceted approach. Organizations should focus on input sanitization, pattern detection, and strict access controls to prevent malicious prompts. Additionally, maintaining prompt privacy through encryption and regular audits can significantly enhance security. It's crucial to stay vigilant, adapt to evolving threats, and prioritize security in the ever-expanding AI landscape.In this dynamic field, where AI continues to evolve, maintaining a proactive stance towards security is paramount. By implementing robust defenses and staying informed about emerging threats, we can harness the potential of AI technology while minimizing risks and ensuring responsible use.Author BioAlan Bernardo Palacio is a data scientist and an engineer with vast experience in different engineering fields. His focus has been the development and application of state-of-the-art data products and algorithms in several industries. He has worked for companies such as Ernst and Young, and Globant, and now holds a data engineer position at Ebiquity Media helping the company to create a scalable data pipeline. Alan graduated with a Mechanical Engineering degree from the National University of Tucuman in 2015, participated as the founder of startups, and later on earned a Master's degree from the faculty of Mathematics at the Autonomous University of Barcelona in 2017. Originally from Argentina, he now works and resides in the Netherlands.LinkedIn
Read more
  • 3
  • 0
  • 20720

article-image-generating-synthetic-data-with-llms
Mostafa Ibrahim
09 Nov 2023
8 min read
Save for later

Generating Synthetic Data with LLMs

Mostafa Ibrahim
09 Nov 2023
8 min read
Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!IntroductionIn this article, we will delve into the intricate process of synthetic data generation using LLMs. We will shed light on the concept behind the increasing importance of synthetic data, the prowess of LLMs in generating such data, and practical steps to harness the power of advanced models like OpenAI’s GPT-3.5. Whether you’re a seasoned AI enthusiast or a curious newcomer, embark with us on this enlightening journey into the heart of modern machine learning.What are LLMs?Large Language Models (LLMs) are state-of-the-art machine learning architectures primarily designed for understanding and generating human-like text. These models are trained on vast amounts of data, enabling them to perform a wide range of language tasks, from simple text completion to answering complex questions or even crafting coherent articles. Some examples of LLMs include:1. GPT-3 by OpenAI, with 175 billion parameters and up to 2048 tokens per unit.2. BERT by Google, with 340 million parameters and up to 512 tokens per unit.3. T5 (Text-to-Text Transfer Transformer by Google) with parameters ranging from 60 million to 11 billion depending on the model size. The number of tokens it can process is also influenced by its size and setup.That being said, LLMs, with their cutting-edge capabilities in NLP tasks like question answering and text summarization, are also highly regarded for their efficiency in generating synthetic data.Why Is There A Need for Synthetic Data1) Data ScarcityDo you ever grapple with the challenge of insufficient data to train your model? This dilemma is a daily reality for machine learning experts globally. Given that data gathering and processing are among the most daunting aspects of the entire machine-learning journey, the significance of synthetic data cannot be overstated.2) Data Privacy & SecurityReal-world data often contains sensitive information. For industries like healthcare and finance, there are stringent regulations around data usage. Such data may include customer’s credit cards, buying patterns, and diseases. Synthetic data can be used without compromising privacy since it doesn't contain real individual information.The Process of Generating Data with LLMsThe journey of producing synthetic data using Large Language Models begins with the preparation of seed data or guiding queries. This foundational step is paramount as it sets the trajectory for the type of synthetic data one wishes to produce. Whether it's simulating chatbot conversations or creating fictional product reviews, these initial prompts provide LLMs with the necessary context.Once the stage is set, we delve into the actual data generation phase. LLMs, with their advanced architectures, begin crafting text based on patterns they've learned from vast datasets. This capability enables them to produce information that aligns with the characteristics of real-world data, albeit synthesized.Generating Synthetic Data Using OpenAI’s GPT 3.5Step 1: Importing Neseccasry Librariesimport openaiStep 2: Set up the OpenAI API keyopenai.api_key = "Insert Your OpenAI key here"Step 3: Define our synthetic data generation functiondef generate_reviews(prompt, count=1):    reviews = []    for i in range(count):        review_generated = False        while not review_generated:            try:                # Generate a response using the ChatCompletion method                response = openai.ChatCompletion.create(                    model="gpt-3.5-turbo",                    messages=[                        {"role": "system", "content": "You are a helpful assistant."},                        {"role": "user", "content": prompt}                    ]                )                              review = response.choices[0].message['content'].strip()                word_count = len(review.split())                print("word count:", word_count)                # Check if the word count is within the desired range                if 15 <= word_count <= 70:                    print("counted")                    reviews.append(review)                    review_generated = True            except openai.error.OpenAIError as err:                print(f"Encountered an error: {err}")        # Optional: Add a slight variation to the prompt for next iteration        prompt += " Provide another perspective."    return reviewsStep 4: Testing our functionprompt_text = "Write a 25 word positive review for a wireless earbud highlighting its battery life." num_datapoints = 5 generated_reviews = generate_reviews(prompt_text, num_datapoints)Step 5: Printing generated synthetic datafor idx, review in enumerate(generated_reviews):    print(f"Review {idx + 1}: {review}")Output:Review 1: The battery life on these wireless earbuds is absolutely incredible! I can enjoy hours of uninterrupted music without worrying about recharging. Truly impressive!Review 2: "The battery life of these wireless earbuds is phenomenal! I can enjoy my favorite music for hours without worrying about recharging. Truly impressive!"Review 3: This wireless earbud is a game-changer! With an exceptional battery life that lasts all day, I can enjoy uninterrupted music and calls without any worries. It's a must-have for people on the go. Another perspective: As a fitness enthusiast, the long battery life of this wireless earbud is a true blessing. It allows me to power through my workouts without constantly needing to recharge, keeping me focused and motivated.Review 4: This wireless earbud's exceptional battery life is worth praising! It lasts all day long, keeping you immersed in your favorite tunes. A real game-changer for music enthusiasts.Review 5: The battery life of these wireless earbuds is exceptional, lasting for hours on end, allowing you to enjoy uninterrupted music or calls. They truly exceed expectations!Considerations and PitfallsHowever, the process doesn't conclude here. Generated data may sometimes have inconsistencies or lack the desired quality. Hence, post-processing, which involves refining and filtering the output, becomes essential. Furthermore, ensuring the variability and richness of the synthetic data is paramount, as too much uniformity can lead to overfitting when the data is employed for machine learning purposes. This refinement process should aim to eliminate any redundant or unrepresentative samples that could skew the model's learning process.Moreover, validating the synthetic data ensures that it meets the standards and purposes for which it was intended, ensuring both authenticity and reliability.ConclusionThroughout this article, we've navigated the process of synthetic data generation powered by LLMs. We've explained the underlying reasons for the escalating prominence of synthetic data, showcased the unparalleled proficiency of LLMs in creating such data, and provided actionable guidance to leverage the capabilities of pre-trained LLM models like OpenAI’s GPT-3.5.For all AI enthusiasts, we hope this exploration has deepened your appreciation and understanding of the evolving tapestry of machine learning,  LLMs, and synthetic data. As we stand now, it is clear that both synthetic data and LLMs will be central to many breakthroughs to come.Author BioMostafa Ibrahim is a dedicated software engineer based in London, where he works in the dynamic field of Fintech. His professional journey is driven by a passion for cutting-edge technologies, particularly in the realms of machine learning and bioinformatics. When he's not immersed in coding or data analysis, Mostafa loves to travel.Medium
Read more
  • 0
  • 0
  • 20185